AI-Powered Analysis for ALL File Types - Your Complete Data Analysis Companion
The Universal Data Analyst Agent is a comprehensive AI-powered tool that can analyze virtually any type of file or data format. Whether you're working with structured data (CSV, Excel), documents (PDF, Word, PowerPoint), text files, or even images, this agent provides intelligent insights, automated analysis, and interactive visualizations.
- ๐๏ธ Universal File Support: Handles 10+ file formats including CSV, Excel, PDF, Word, PowerPoint, images, and text files
- ๐ค AI-Powered Analysis: Leverages Together.ai's Llama models for intelligent insights and natural language querying
- ๐ Automated Visualizations: Creates comprehensive dashboards and interactive plots
- ๐ฌ Natural Language Queries: Ask questions about your data in plain English
- ๐ Statistical Analysis: Automatic correlation analysis, missing value detection, and data quality assessment
- ๐ผ๏ธ OCR Text Extraction: Extracts and analyzes text from images
- ๐ Comprehensive Reporting: Generates detailed analysis reports in multiple formats
- ๐ Analysis History: Tracks all your analyses with exportable history
- Python 3.7 or higher
- Together.ai API key (Get one here)
-
Clone or download the script:
# Save the code as universal_analyst.py
-
Install required packages:
pip install pandas numpy requests matplotlib seaborn openpyxl python-docx PyPDF2 pillow pdfplumber plotly wordcloud scikit-learn nltk python-pptx pytesseract
-
Run the agent:
python universal_analyst.py
-
Enter your Together.ai API key when prompted
Category | File Types | Capabilities |
---|---|---|
Structured Data | CSV, Excel (.xlsx, .xls), JSON arrays | Full statistical analysis, visualizations, correlations |
Documents | PDF, Word (.docx), PowerPoint (.pptx) | Text extraction, content analysis, sentiment analysis |
Text Files | TXT, Markdown | Natural language processing, keyword analysis, sentiment |
Images | JPG, PNG, BMP, TIFF, GIF | OCR text extraction, metadata analysis, content insights |
load <filename> # Load any supported file type
summary # Get comprehensive file overview
info # Display current file information
analyze # Perform AI-powered comprehensive analysis
ask <question> # Ask specific questions about your data
report # Generate detailed analysis report
visualize # Create data analysis dashboard
visualize interactive # Create interactive Plotly visualizations
export json # Export analysis results as JSON
export txt # Export analysis results as text
history # View all previous analyses
help # Display comprehensive help menu
quit # Exit the application
๐ Enter command: load sales_data.csv
โ
Successfully loaded structured data 'sales_data.csv'
Shape: 1000 rows ร 8 columns
Columns: product, price, quantity, date, region
Size: 0.15 MB
๐ Enter command: analyze
๐ Performing comprehensive analysis...
โ
**File Overview**
โข File: sales_data.csv
โข Type: Structured
โข Shape: 1000 rows ร 8 columns
โ
**AI Analysis Insights**
Key patterns identified:
- Seasonal trends in Q4 showing 40% increase in sales
- Top performing regions: North (35%), South (28%)
- Premium products (>$100) show higher profit margins
[Additional insights...]
๐ Enter command: ask What are the top 5 selling products?
โ
**Question:** What are the top 5 selling products?
**Answer:**
Based on the sales data analysis:
1. Product A: 145 units sold ($14,500 revenue)
2. Product B: 132 units sold ($13,200 revenue)
[Detailed breakdown with supporting data...]
๐ Enter command: visualize
โ
Visualization created successfully!
Saved as: visualization_20241215_143022.png
Type: Data Analysis Dashboard
๐ Enter command: load research_paper.pdf
โ
Successfully loaded text document 'research_paper.pdf'
Words: 8,547
Lines: 342
Size: 2.3 MB
๐ Enter command: ask What are the main findings of this research?
โ
**Answer:**
The research presents three main findings:
1. Machine learning models show 23% improvement over traditional methods
2. Data preprocessing techniques reduce error rates by 15%
3. Cross-validation results demonstrate model reliability
[Detailed analysis with supporting quotes...]
๐ Enter command: report
๐ Comprehensive report generated!
File: analysis_report_20241215_143555.md
Sections: Executive Summary, Content Analysis, Key Insights, Recommendations
๐ Enter command: load infographic.png
โ
Successfully loaded image 'infographic.png'
Dimensions: 1920ร1080 pixels
๐ Contains text: Yes
Size: 1.2 MB
๐ Enter command: analyze
๐ Performing comprehensive analysis...
โ
**Image Properties**
โข Format: PNG
โข Dimensions: 1920ร1080 pixels
โข Contains extracted text about market trends
โ
**AI Analysis Insights**
The infographic contains key business metrics:
- Revenue growth: 15% YoY
- Customer satisfaction: 87%
[OCR extracted content analysis...]
The agent automatically creates comprehensive dashboards including:
- ๐ Data Distribution Histograms: Shows distribution of numeric columns
- ๐ฅ Missing Values Heatmap: Visualizes data completeness patterns
- ๐ Correlation Matrix: Displays relationships between variables
- ๐ Statistical Overviews: Bar charts of key metrics
When you use visualize interactive
, the agent creates:
- Interactive scatter plots with hover details
- Dynamic box plots for outlier detection
- Zoomable and filterable charts using Plotly
- Data Quality Assessment: Automatic detection of missing values, outliers, and inconsistencies
- Statistical Insights: Correlation analysis, descriptive statistics, trend identification
- Business Intelligence: Revenue patterns, customer behavior, performance metrics
- Predictive Insights: Trend forecasting and anomaly detection
- Sentiment Analysis: Emotional tone assessment using VADER sentiment analyzer
- Topic Modeling: Key themes and concept extraction
- Content Quality: Readability analysis and structure assessment
- Keyword Analysis: Most frequent terms and phrases
- OCR Text Extraction: Automatic text recognition from images
- Metadata Analysis: Image properties, dimensions, and technical details
- Content Classification: Visual content type identification
Generated reports include:
- Executive Summary: Key findings and recommendations
- Technical Analysis: Detailed methodology and statistics
- Business Insights: Actionable recommendations
- Visual Summaries: Chart descriptions and interpretations
- JSON: Machine-readable format with complete analysis history
- Text: Human-readable format for documentation
- Markdown: Formatted reports with proper structure
The agent uses Together.ai's Llama-4-Maverick model by default. You can modify the model in the code:
self.model = "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
Modify the create_visualization()
method to customize:
- Chart types and styling
- Color schemes
- Plot dimensions
- Statistical measures
Configure NLTK settings for:
- Custom stopwords
- Language-specific processing
- Sentiment analysis thresholds
-
NLTK Download Errors:
# Manual NLTK data download python -c "import nltk; nltk.download('all')"
-
Missing Dependencies:
# Install specific packages pip install --upgrade pandas matplotlib seaborn
-
OCR Issues (Tesseract):
# Install Tesseract OCR # Windows: Download from GitHub # macOS: brew install tesseract # Linux: sudo apt-get install tesseract-ocr
-
API Connection Issues:
- Verify your Together.ai API key
- Check internet connection
- Ensure API quota availability
- Large Files: Files over 100MB may take longer to process
- Image Processing: High-resolution images require more processing time
- Memory Usage: Large datasets may require increased system RAM
We welcome contributions! Here's how you can help:
- Report Bugs: Create detailed issue reports
- Feature Requests: Suggest new file formats or analysis types
- Code Contributions: Submit pull requests with improvements
- Documentation: Help improve guides and examples
# Clone the repository
git clone <repository-url>
cd universal-data-analyst
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
- Together.ai: For providing the powerful Llama AI models
- Open Source Libraries: pandas, matplotlib, nltk, and many others
- Community: All contributors and users providing feedback
- Issues: Report bugs on GitHub Issues
- Discussions: Join our community discussions
- Documentation: Check our comprehensive docs
- Email: Contact us at [your-email]
- Database Connectivity: MySQL, PostgreSQL, MongoDB support
- Cloud Storage: AWS S3, Google Drive integration
- Advanced ML: Automated model training and prediction
- Web Interface: Browser-based GUI
- API Endpoints: RESTful API for integration
- Collaborative Features: Team sharing and collaboration
- Scheduled Analysis: Automated recurring reports
- v1.0.0: Initial release with core functionality
- v1.1.0: Added image analysis and OCR support
- v1.2.0: Enhanced AI insights and interactive visualizations
Ready to analyze your data? ๐
python universal_analyst.py
My API Key = "tgp_v1_z3tfiH3oDSFGl9szn2BzGYpWPtl2Y1OS1MFr2CX-pEU"
Transform your data into insights with the power of AI!