I'm a results-driven Machine Learning Data Scientist with over 7 years of experience architecting and deploying end-to-end AI solutions from concept to production. My passion lies in leveraging deep expertise in Generative AI (RAG, LLMs), NLP, and Computer Vision to solve complex business problems, automate processes, and deliver significant, measurable value. I build systems that not only predict but also prescribe actions, turning data into intelligent, automated workflows.
From engineering multivariate time-series forecasting models and intelligent document processors to deploying edge AI systems, I thrive on the full project lifecycle. This includes architecting robust ETL pipelines, performing advanced feature engineering, building and fine-tuning models, and creating interactive Tableau/Power BI dashboards that provide leadership with on-demand strategic insights.
Python โ PySpark, Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch
SQL โ Advanced querying, ETL, data validation, Stored Procedures
Other Languages โ SQL, JAVA, C++
LLMs & Frameworks: Google Gemini, LangChain, Transformers, HuggingFace
Techniques: Retrieval-Augmented Generation (RAG), Prompt Engineering, Text Classification, Sentiment Analysis
Core NLP: NLTK, spaCy, Text Preprocessing
Core ML: Time-Series Forecasting (ARIMA, Prophet), Anomaly Detection, Recommendation Systems, Statistical Modeling, A/B Testing
Deep Learning: Computer Vision (OpenCV), Graph Neural Networks (GNNs)
Platforms: AWS SageMaker, GCP Vertex AI, Databricks
Orchestration & Pipelines: Airflow, CI/CD, ETL Pipelines
Infrastructure: Docker, Kubernetes, Confluent Kafka
Data Storage: Data Warehousing (Redshift, Snowflake), Data Modeling
Deployment: FastAPI, Streamlit
Databases: PostgreSQL, MySQL, SQL Server, Neo4j (Graph), MongoDB
Visualization: Tableau, Power BI, QlikSense, Cognos, MicroStrategy
Collaboration & Workflow: Git, GitHub, Jira, Confluence, Agile/Scrum
Here are a few projects that reflect my skills and problem-solving capabilities:
AI agent that tailors resumes, matches job descriptions, and writes personalized cover letters.
- Tools: Python, Google Gemini Pro, Prompt Engineering
- Outputs: Match scoring, bullet suggestions, JSON-structured output
- Featured on Kaggle, GitHub, and YouTube
๐ GitHub Repo | Kaggle Notebook | YouTube Demo
Leverages LLMs and AI agents to automatically analyze reports (PDF/Excel/CSV) and generate actionable summaries, charts, and insights.
๐ Automated insight extraction using Python & OpenAI APIs
๐ Visualizations using Plotly and Matplotlib
๐ค Intelligent summarization & natural language generation
An interactive Streamlit application visualizing and comparing cost of living indices across various countries.
Technologies Used: Python, Streamlit, Pandas, Plotly, Seabornโ
Features:
-
๐บ๏ธ Compare indices by country using visual charts
-
๐ Built with Plotly, Seaborn, Streamlit
-
๐งฎ Focus on rent, groceries, utilities, etc.
Outcome: Facilitates users in making informed decisions regarding global cost comparisons.
Analyzes sales data and builds time-series models to forecast future trends.
- ๐งผ Data wrangling and preprocessing with Pandas
- ๐ Time-series forecasting with ARIMA & statsmodels
- ๐ Actionable sales insights for business planning
๐ฌ Letโs Connect
Iโm always excited to collaborate, learn, or just chat about data!
๐ LinkedIn
๐ง Email: bethusreeja@gmail.com
๐ง Portfolio Website: https://sreejabethu.github.io/datascience/
๐ Location: United States (Open to Remote & Hybrid Roles)
Letโs make data work smarter with AI ๐