Skip to content

This project demonstrates the complete analytics lifecycle, culminating in a machine learning model that classifies TikTok user reports as claims or opinions to support content moderation.

Notifications You must be signed in to change notification settings

DataDaneHQ/TikTok_Capstone_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

TikTok-Capstone-Project

TikTok Logo

Welcome!

This repository showcases my completed project, developed as part of the Google Advanced Data Analytics Professional Certificate. The initiative reflects a structured and methodical approach to solving real-world challenges in content moderation using data analytics and machine learning.

The project spans the entire analytics lifecycle, including:

  • Defining Objectives: Deriving tasks and deliverables from stakeholder communications and emails.
  • Planning: Creating project proposals, workflows, and timelines.
  • Development: Building Jupyter Notebooks to clean data, develop models, and evaluate performance.
  • Visualization: Designing Tableau dashboards to present key insights.
  • Delivery: Crafting executive summaries and detailed reports for stakeholders.

The goal of this project was to classify TikTok user interactions as claims or opinions, leveraging machine learning to enhance moderation efficiency, reduce backlogs, and improve user experience. This repository highlights cross-functional collaboration, technical expertise, and the ability to deliver actionable insights, making it a valuable example of applied data analytics and machine learning.

Note: Team member names used in this workplace scenario project are fictional and are not representative of TikTok.


Contents

  1. Project Workflow
    • An Excel sheet outlining the workflow and key milestones for the TikTok Claims Classification Project.
  2. Project Proposal
    • Outline of objectives and approach for this initiative.
  3. Initial EDA - Jupyter Notebook
    • Initial data exploration and analysis.
  4. Executive Summary
    • Summary of key findings from the preliminary analysis.
  5. Full EDA - Jupyter Notebook
    • Full data exploration and analysis.
  6. Tableau Summary Dashboard
    • Visual representation of key insights from the full Exploratory Data Analysis.
  7. Executive Summary - Full EDA
    • Summary of key findings from the full exploratory analysis.
  8. Hypothesis Testing - Jupyter Notebook
    • Technical summary for internal stakeholders (the data team), documenting the full analysis process and two-sample hypothesis test.
  9. Executive Summary - Hypothesis Test
    • High-level summary of key findings and insights for external stakeholders.
  10. Logistic Regression Model - Jupyter Notebook
    • Explores the relationship between video charateristics and user verification status, providing insights for feature selection in the final claims classification model.
  11. Executive Summary - Logistic Regression Analysis
    • Summary of key insights and model performance.
  12. Pre-Processed Dataset - Jupyter Notebook
    • Prepares the TikTok dataset for machine learning by addressing data quality issues and feature engineering.
  13. Model Development - Jupyter Notebook
    • Builds and evaluates Random Forest and XGBoost models to classify TikTok videos and support efficient content moderation.
  14. Final Executive Report
    • Summarizes the development, performance, and recommendations for a machine learning model to enhance TikTok's content moderation.

License: All rights reserved. No part of this repository may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the owner.

About

This project demonstrates the complete analytics lifecycle, culminating in a machine learning model that classifies TikTok user reports as claims or opinions to support content moderation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published