Skip to content

This repository showcases my project on wrangling and analyzing the WeRateDogs Twitter dataset as part of Udacity's Data Analyst Nanodegree. The project focuses on gathering, assessing, cleaning, and analyzing Twitter data to uncover trends and patterns in dog ratings and engagement metrics.

Notifications You must be signed in to change notification settings

noora-a/Wrangle-and-Analyze-Data

Repository files navigation

Wrangle and Analyze Data Project

Table of Contents

Project Overview

This project is part of the Data Analytics Nanodegree program on Coursera. The goal of the project is to wrangle and analyze the WeRateDogs Twitter data to derive meaningful insights. The project involves gathering data from multiple sources, assessing its quality and tidiness, cleaning it, and then performing analysis and visualization.

Files

  • wrangle_act.ipynb: Notebook containing the data wrangling process.
  • wrangle_report.ipynb: Notebook summarizing the data wrangling steps.
  • act_report.ipynb: Notebook containing the analysis and visualizations.
  • image-predictions.tsv: File containing the image predictions data.
  • tweet_df.csv: Cleaned and combined tweet data.
  • tweet-json.txt: Raw JSON data from Twitter API.
  • twitter_archive_master.csv: Final cleaned dataset used for analysis.

Datasets

The project uses three main datasets:

  1. Twitter Archive: Provided by Udacity, containing basic tweet data.
  2. Image Predictions: Generated by a neural network, containing predictions of dog breeds from tweet images.
  3. Tweet JSON: Additional tweet data extracted using the Twitter API.

Steps

Data Gathering

  • Downloaded the WeRateDogs Twitter archive.
  • Programmatically downloaded the image predictions file.
  • Used Tweepy to query the Twitter API for each tweet's JSON data.

Data Assessing

  • Performed both visual and programmatic assessment to identify data quality and tidiness issues in the gathered datasets.

Data Cleaning

  • Addressed issues identified during assessment, including handling missing data, correcting erroneous data, and merging datasets into a master dataframe.

Data Visualization and Analysis

  • Conducted exploratory data analysis to uncover patterns and trends.
  • Visualized key insights using various plotting techniques.

Results

The analysis of the WeRateDogs dataset revealed several interesting insights, including:

  • Most Common Dog Breeds: Labrador Retriever, Golden Retriever, and Pembroke Welsh Corgi were frequently rated.
  • Distribution of Ratings: The majority of the dogs received high ratings, typically above 10 out of 10.
  • Engagement Metrics: Higher ratings correlated with more favorites and retweets.
  • Dog Stage Distribution: Pupper was the most common stage, followed by doggo.
  • Tweet Source: Most tweets were posted using Twitter for iPhone

Conclusion

This project demonstrates the end-to-end process of data wrangling and analysis, from gathering data from multiple sources to performing cleaning and analysis. The insights derived from the WeRateDogs Twitter data provide a deeper understanding of the trends and patterns in the dataset.

About

This repository showcases my project on wrangling and analyzing the WeRateDogs Twitter dataset as part of Udacity's Data Analyst Nanodegree. The project focuses on gathering, assessing, cleaning, and analyzing Twitter data to uncover trends and patterns in dog ratings and engagement metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published