📊 Which Headline Works Better?

A Sentiment-Based A/B Testing Project with Real API Data
By Toluwalase Taiwo
📅 June 14, 2025

🧠 Project Summary

Ever wondered why some headlines grab more attention than others? In this project, I explored how sentiment (positive vs. negative) in news headlines affects user engagement. Using real-time articles from The Guardian News API, I analyzed the emotional tone of headlines and simulated reader behavior through clicks and shares.

🎯 Objective

To test whether positive or negative sentiment in headlines leads to higher user engagement — measured through simulated clicks and shares.

🛠️ Tools & Libraries Used

Python
Requests – For fetching data from The Guardian API
Pandas – Data cleaning & manipulation
TextBlob – Sentiment analysis
NumPy – Simulating clicks and shares
Matplotlib & Seaborn – Data visualization
SciPy – T-tests for statistical significance
WordCloud – Visualizing language patterns

📥 Data Collection

Source: The Guardian API
Fields Extracted:
- Headline
- Summary (trailText)
- Section/category
- Published date
- Full article text

Data was saved as a .csv file for easy analysis.

🧪 Sentiment Analysis + A/B Testing

Cleaned and preprocessed the text data.
Used TextBlob to calculate sentiment polarity.
Filtered out neutral headlines to keep the test focused.
Grouped articles into Positive and Negative.
Simulated engagement:
- Negative: Higher clicks/shares (200–350 clicks, 150–300 shares)
- Positive: Lower clicks/shares (50–150 clicks, 40–120 shares)
Ran a T-test to compare average engagement across sentiment groups.

📸 Visuals & Exploratory Insights

🎯 Sentiment Distribution

This barplot shows the number of articles classified as positive vs. negative after filtering out neutral ones. It sets the stage for the A/B test.

📊 Engagement Boxplot by Sentiment

This boxplot compares simulated clicks and shares across sentiment groups. It highlights how negative headlines tend to get more consistent and higher engagement.

📈 Barplot – Engagement Metrics

A simple barplot showing average clicks and shares for each sentiment group. It gives a quick view of performance based on emotional tone.

☁️ WordCloud – Combined

A word cloud that visually represents the most common words across all headlines and summaries. Words appearing more frequently are displayed larger.

📈 Results

Negative headlines outperformed positive ones in both clicks and shares.
T-test results showed a statistically significant difference in engagement between sentiment groups (p < 0.0001).
Outlier Injection: Added extreme values to demonstrate how a few viral articles can skew average metrics.

💡 Insights & Takeaways

Emotionally intense headlines (especially negative ones) draw more engagement.
Balancing tone is important: while negative headlines pull readers in, too much negativity can affect brand perception.
Simulating engagement helped mimic real-world dynamics, even without actual user data.

📁 File Structure

which-headline-works-better/
├── data/
│ └── guardian_articles.csv
├── notebook/
│ └── headline_ab_test.ipynb
├── visuals/
│ ├── wordcloud_positive.png
│ ├── wordcloud_negative.png
│ ├── boxplot_engagement.png
│ └── sentiment_barplot.png
├── README.md
├── requirements.txt
└── .gitignore

🔗 Notebook & Dataset

📓 View the full Jupyter Notebook here
📂 Download cleaned dataset

📄 Full Project Walkthrough

For the full walkthrough of this project, please check this document:
👉 Google Docs – Project Breakdown

It contains:

✅ Step-by-step breakdown of the analysis
📊 Visualizations and interpretations
💡 Key insights and recommendations

🚀 How to Reproduce

Clone the repo
Set up your environment:
```
pip install -r requirements.txt
```
Run the notebook step by step.
Optionally, register for an API key from The Guardian Open Platform to fetch new data.

🪞 Personal Reflection

This was more than a technical exercise — it was a lesson in how data reflects human behavior. It helped me connect Python, NLP, and stats to real-world questions in media and communication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly