Skip to content

arun-data-analyst/Data-Sourcing-and-Cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“₯ Data Sourcing and Cleaning

This project demonstrates real-world data sourcing and cleaning techniques using structured and unstructured data. It includes profiling, validation, transformation, and storage across different formats such as CSV, TXT, and SQL.


πŸ“ Contents

βœ… Final Project

  • Sourced and cleaned raw text into structured CSV
  • Analyzed and visualized using charts and summaries
  • Stored cleaned data in .db and .csv formats

βœ… Assignment 15

  • Performed cleaning and transformation on books.csv
  • Included outlier detection, column standardization, and data checks

🧰 Tools Used

  • Python + Jupyter Notebooks
  • pandas, matplotlib
  • SQL for data storage and retrieval

πŸš€ How to Run

  1. Clone the repo:

    git clone https://github.com/arun-data-analyst/Data-Sourcing-and-Cleaning.git
    cd Data-Sourcing-and-Cleaning
  2. Install the Python packages:

    pip install -r requirements.txt
  3. Open the notebooks in Jupyter Lab or any Python IDE

🧠 Author

Arun Acharya
Data Analyst in training | Willis College

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published