A Sentiment-Based A/B Testing Project with Real API Data
By Toluwalase Taiwo
📅 June 14, 2025
Ever wondered why some headlines grab more attention than others? In this project, I explored how sentiment (positive vs. negative) in news headlines affects user engagement. Using real-time articles from The Guardian News API, I analyzed the emotional tone of headlines and simulated reader behavior through clicks and shares.
To test whether positive or negative sentiment in headlines leads to higher user engagement — measured through simulated clicks and shares.
- Python
Requests
– For fetching data from The Guardian APIPandas
– Data cleaning & manipulationTextBlob
– Sentiment analysisNumPy
– Simulating clicks and sharesMatplotlib
&Seaborn
– Data visualizationSciPy
– T-tests for statistical significanceWordCloud
– Visualizing language patterns
- Source: The Guardian API
- Fields Extracted:
- Headline
- Summary (
trailText
) - Section/category
- Published date
- Full article text
Data was saved as a .csv
file for easy analysis.
- Cleaned and preprocessed the text data.
- Used TextBlob to calculate sentiment polarity.
- Filtered out neutral headlines to keep the test focused.
- Grouped articles into Positive and Negative.
- Simulated engagement:
- Negative: Higher clicks/shares (200–350 clicks, 150–300 shares)
- Positive: Lower clicks/shares (50–150 clicks, 40–120 shares)
- Ran a T-test to compare average engagement across sentiment groups.
This barplot shows the number of articles classified as positive vs. negative after filtering out neutral ones. It sets the stage for the A/B test.
This boxplot compares simulated clicks and shares across sentiment groups. It highlights how negative headlines tend to get more consistent and higher engagement.
A simple barplot showing average clicks and shares for each sentiment group. It gives a quick view of performance based on emotional tone.
A word cloud that visually represents the most common words across all headlines and summaries. Words appearing more frequently are displayed larger.
- Negative headlines outperformed positive ones in both clicks and shares.
- T-test results showed a statistically significant difference in engagement between sentiment groups (p < 0.0001).
- Outlier Injection: Added extreme values to demonstrate how a few viral articles can skew average metrics.
- Emotionally intense headlines (especially negative ones) draw more engagement.
- Balancing tone is important: while negative headlines pull readers in, too much negativity can affect brand perception.
- Simulating engagement helped mimic real-world dynamics, even without actual user data.
which-headline-works-better/
├── data/
│ └── guardian_articles.csv
├── notebook/
│ └── headline_ab_test.ipynb
├── visuals/
│ ├── wordcloud_positive.png
│ ├── wordcloud_negative.png
│ ├── boxplot_engagement.png
│ └── sentiment_barplot.png
├── README.md
├── requirements.txt
└── .gitignore
For the full walkthrough of this project, please check this document:
👉 Google Docs – Project Breakdown
It contains:
- ✅ Step-by-step breakdown of the analysis
- 📊 Visualizations and interpretations
- 💡 Key insights and recommendations
-
Clone the repo
-
Set up your environment:
pip install -r requirements.txt
-
Run the notebook step by step.
-
Optionally, register for an API key from The Guardian Open Platform to fetch new data.
This was more than a technical exercise — it was a lesson in how data reflects human behavior. It helped me connect Python, NLP, and stats to real-world questions in media and communication.