NLP classifier for misinformation detection. Powered by Truth Over Tone Technology™, TruStorE™ blends linguistic heuristics, sentiment drift analysis, and Word Pair Logic™ to flag manipulative tone in news articles, a tell-tale sign of fake news as emotions are elicited to replace facts. Built for reproducibility, modular deployment, and impact.
- Modular NLP pipeline design with multilingual scaling
- Truth Over Tone Technology™ for tone and bias detection
- Word Pair Logic™ for linguistic signal extraction
- Strategic sampling to bypass NLTK constraints
- Visual clarity through histogram plots
- Artifact-first branding and footer evolution
| Layer | Tools Used |
|---|---|
| ETL & Wrangling | pandas, numpy, google.colab |
| Preprocessing | nltk, strategic sampling to bypass NLTK constraints |
| Modeling | sentiment_analysis, ngram_analysis, td-idf vectorization, Manipulative Tactic Detector™ |
| Feature Extraction | Word Pair Logic™ |
| Certification | TrueStorE™ Certification Engine |
| Visualization | matplotlib, seaborn, histogram plots |
📦 Dependencies: See requirements.txt for install stack
TruStorE/
├── modules/
│ ├── etl.py # Data loading and cleaning
│ ├── preprocessing.py # Labeling, tokenization, sampling
│ ├── tone_detector.py # Manipulative Tactic Detector™
│ ├── sentiment_analysis.py # Sentiment scoring and hypothesis testing
│ ├── word_pair_logic.py # Linguistic signal extraction
│ ├── certification_engine.py # Final decision logic
│ └── viz.py # Histogram plotting utilities
├── notebook/
│ └── TruStorE_classifier.ipynb
├── README.md
├── LICENSE
├── requirements.txt
└── .gitignore
-
Truth Over Tone Technology™
Flags tonal asymmetry and sentiment drift using a proprietary blend of heuristics and NLP scoring logic. -
Word Pair Logic™
Extracts linguistic signals from bigram patterns using SME-informed heuristics. Visualized via histogram plots to highlight frequency and semantic asymmetry. -
Sentiment Analysis
Tests emotional polarity hypotheses. Visualized with histogram distributions and mean score overlays. -
TrueStorE™ Certification Engine
Final decision logic that certifies article integrity based on cumulative linguistic, tonal, and semantic signals.
Think of it as the final stamp—deployable, explainable, and recruiter-facing. -
Strategic Sampling
Bypasses NLTK constraints by curating datasets that preserve semantic diversity and tone variance.
This classifier was trained and tested on a curated dataset of true and fake news articles.
The CSVs were going to be included in /data/ for reproducibility until I realized they exceeded GitHub's upload limit:
See Dataset Downloads below
Source: Provided as part of a Ground.News simulation task.
No proprietary data used. All preprocessing and labeling logic is visible in the notebook.
📐 TF-IDF: Textbook vs Codebook Logic
📚 Textbook Definition
TF-IDF (Term Frequency–Inverse Document Frequency) evaluates word importance across documents.
-
Term Frequency (TF):
TF(t) = (# of times term t appears in a document) / (total terms in the document) -
Inverse Document Frequency (IDF):
IDF(t) = log(total documents / documents containing term t) -
TF-IDF Score:
TF-IDF(t, d) = TF(t, d) × IDF(t)
💻 Codebook Logic in TruStorE™
In this classifier, TF-IDF is signal extraction—layered with heuristics and tone detection.
-
Strategic Sampling:
Curated multilingual payloads preserve tone variance and semantic diversity. -
Weighted TF-IDF:
Terms weighted by emotional polarity and drift, not just frequency. -
Word Pair Logic™ Overlay:
Bigram patterns extracted post-vectorization to flag manipulative phrasing. -
Sentiment Drift Index™ Integration:
TF-IDF vectors cross-referenced with tonal asymmetry. -
Certification Engine Input:
Final TF-IDF vectors feed into TruStorE™ Certification Engine for integrity scoring.
From textbook to codebook, TF-IDF becomes a linguistic scalpel—powered by Truth Over Tone Technology™.
Due to GitHub’s upload limit and Google Drive’s preview threshold, both datasets are hosted externally. Download directly below:
- Notebook-first walkthrough is ready for Jupyter download
- Modular logic blocks are structured for RESTful API deployment
- Multilingual pipeline supports expansion to non-English datasets
- Dashboard wiring with recruiter-facing metrics and footer lore is next on deck
- 🧠 Save and version the trained model
- 🔌 Create an API endpoint for real-time classification
- 🌐 Integrate with Ground News website backend
- 📱 Extend functionality into the mobile app interface
- 🌍 Add multilingual translation tech for global scalability
- 📊 Leverage balanced F1 harmonics (0.96 sweep) for EVALS benchmarking
This isn’t just a model—it’s the TruStorE™ Certification Engine, powered by Word Pair Logic™, Sentiment Drift Index™, and Truth Over Tone Technology™.
MIT © 2025 Garrick Piñón. TruStorE™, Word Pair Logic™, Sentiment Drift Index™, and Truth Over Tone Technology™ are trademarked methodologies.
Commercial use requires attribution and explicit permission.
If you’re reading this, you’re not just curious—you’re committed. These badges are for you. Let's be friends.