Skip to content

NLP classifier for misinformation detection. TruStorE™ blends linguistic heuristics, sentiment drift analysis, and Word Pair Logic™ to flag manipulative tone in news articles, a tell-tale sign of fake news as emotions are elicited to replace facts. Built for reproducibility, modular deployment, and artifact-grade impact.

License

GarrickPinon/TruStorE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📰 TruStorE™: Sifting Fact From Fiction

NLP classifier for misinformation detection. Powered by Truth Over Tone Technology™, TruStorE™ blends linguistic heuristics, sentiment drift analysis, and Word Pair Logic™ to flag manipulative tone in news articles, a tell-tale sign of fake news as emotions are elicited to replace facts. Built for reproducibility, modular deployment, and impact.



🧪 Skills Demonstrated

  • Modular NLP pipeline design with multilingual scaling
  • Truth Over Tone Technology™ for tone and bias detection
  • Word Pair Logic™ for linguistic signal extraction
  • Strategic sampling to bypass NLTK constraints
  • Visual clarity through histogram plots
  • Artifact-first branding and footer evolution

🧰 Tech Stack

Layer Tools Used
ETL & Wrangling pandas, numpy, google.colab
Preprocessing nltk, strategic sampling to bypass NLTK constraints
Modeling sentiment_analysis, ngram_analysis, td-idf vectorization, Manipulative Tactic Detector™
Feature Extraction Word Pair Logic™
Certification TrueStorE™ Certification Engine
Visualization matplotlib, seaborn, histogram plots

📦 Dependencies: See requirements.txt for install stack


📁 Repo Structure

TruStorE/
├── modules/
│   ├── etl.py                  # Data loading and cleaning
│   ├── preprocessing.py        # Labeling, tokenization, sampling
│   ├── tone_detector.py        # Manipulative Tactic Detector™
│   ├── sentiment_analysis.py   # Sentiment scoring and hypothesis testing
│   ├── word_pair_logic.py      # Linguistic signal extraction
│   ├── certification_engine.py # Final decision logic
│   └── viz.py                  # Histogram plotting utilities
├── notebook/
│   └── TruStorE_classifier.ipynb
├── README.md
├── LICENSE
├── requirements.txt
└── .gitignore
 

⚙️ How It Works

  • Truth Over Tone Technology™
    Flags tonal asymmetry and sentiment drift using a proprietary blend of heuristics and NLP scoring logic.

  • Word Pair Logic™
    Extracts linguistic signals from bigram patterns using SME-informed heuristics. Visualized via histogram plots to highlight frequency and semantic asymmetry.

  • Sentiment Analysis
    Tests emotional polarity hypotheses. Visualized with histogram distributions and mean score overlays.

  • TrueStorE™ Certification Engine
    Final decision logic that certifies article integrity based on cumulative linguistic, tonal, and semantic signals.
    Think of it as the final stamp—deployable, explainable, and recruiter-facing.

  • Strategic Sampling
    Bypasses NLTK constraints by curating datasets that preserve semantic diversity and tone variance.


📥 Data Provenance

This classifier was trained and tested on a curated dataset of true and fake news articles.
The CSVs were going to be included in /data/ for reproducibility until I realized they exceeded GitHub's upload limit: See Dataset Downloads below

Source: Provided as part of a Ground.News simulation task.
No proprietary data used. All preprocessing and labeling logic is visible in the notebook.


📐 TF-IDF: Textbook vs Codebook Logic

📚 Textbook Definition
TF-IDF (Term Frequency–Inverse Document Frequency) evaluates word importance across documents.

  1. Term Frequency (TF):
    TF(t) = (# of times term t appears in a document) / (total terms in the document)

  2. Inverse Document Frequency (IDF):
    IDF(t) = log(total documents / documents containing term t)

  3. TF-IDF Score:
    TF-IDF(t, d) = TF(t, d) × IDF(t)

💻 Codebook Logic in TruStorE™
In this classifier, TF-IDF is signal extraction—layered with heuristics and tone detection.

  1. Strategic Sampling:
    Curated multilingual payloads preserve tone variance and semantic diversity.

  2. Weighted TF-IDF:
    Terms weighted by emotional polarity and drift, not just frequency.

  3. Word Pair Logic™ Overlay:
    Bigram patterns extracted post-vectorization to flag manipulative phrasing.

  4. Sentiment Drift Index™ Integration:
    TF-IDF vectors cross-referenced with tonal asymmetry.

  5. Certification Engine Input:
    Final TF-IDF vectors feed into TruStorE™ Certification Engine for integrity scoring.

From textbook to codebook, TF-IDF becomes a linguistic scalpel—powered by Truth Over Tone Technology™.


📦 Dataset Downloads

Due to GitHub’s upload limit and Google Drive’s preview threshold, both datasets are hosted externally. Download directly below:


✅ Deployment Strategy

  • Notebook-first walkthrough is ready for Jupyter download
  • Modular logic blocks are structured for RESTful API deployment
  • Multilingual pipeline supports expansion to non-English datasets
  • Dashboard wiring with recruiter-facing metrics and footer lore is next on deck

👣 Next Steps for Real-World Integration

  • 🧠 Save and version the trained model
  • 🔌 Create an API endpoint for real-time classification
  • 🌐 Integrate with Ground News website backend
  • 📱 Extend functionality into the mobile app interface
  • 🌍 Add multilingual translation tech for global scalability
  • 📊 Leverage balanced F1 harmonics (0.96 sweep) for EVALS benchmarking

This isn’t just a model—it’s the TruStorE™ Certification Engine, powered by Word Pair Logic™, Sentiment Drift Index™, and Truth Over Tone Technology™.


🧾 License

MIT © 2025 Garrick Piñón. TruStorE™, Word Pair Logic™, Sentiment Drift Index™, and Truth Over Tone Technology™ are trademarked methodologies.
Commercial use requires attribution and explicit permission.


🏷️ Loyalty Badges – Faithful to the Footer

If you’re reading this, you’re not just curious—you’re committed. These badges are for you. Let's be friends.

📜 README Loyalist 🌀 Doomscroll Certified 🧠 Brain-Fried but Brilliant 🍪 Cookie for the Curious 🧭 Markdown Pilgrim 🫀 Faithful to the End Quantsultant™ Sugar-Free ML 🧪 Badge of Experimental Brilliance 🛡️ Built with Integrity 🧬 Signal Over Fluff 🚨 BS Button Created 🦴 Code Chiropractor™


About

NLP classifier for misinformation detection. TruStorE™ blends linguistic heuristics, sentiment drift analysis, and Word Pair Logic™ to flag manipulative tone in news articles, a tell-tale sign of fake news as emotions are elicited to replace facts. Built for reproducibility, modular deployment, and artifact-grade impact.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published