Skip to content

keshavgarg24/AutoData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Data Cleaning Pipeline with AI Integration

Just built a comprehensive data cleaning solution that handles CSV, Excel, database, and API data sources through a unified FastAPI backend. The system combines rule-based cleaning with AI-powered analysis using Google's Gemini model and LangGraph for intelligent batch processing. Key technical features:

Asynchronous file processing with configurable batch sizes Multi-source data ingestion (PostgreSQL, REST APIs, file uploads) Automated missing value imputation and outlier detection using IQR method Column standardization and duplicate removal AI-driven data quality assessment with structured state management Error handling and graceful degradation when AI services are unavailable

The pipeline processes data in batches to optimize memory usage and API token limits, making it scalable for large datasets. Rule-based cleaning handles standard operations like null value handling and data type conversion, while the AI agent provides intelligent insights on data quality patterns and recommended cleaning strategies.

Data cleaning doesn't have to be tedious when you can automate the entire workflow from ingestion to analysis.

About

An automation script to clean data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages