An intelligent LangChain-based agent for analyzing semiconductor wafer data using advanced clustering techniques with a natural language interface.
- 🤖 Natural Language Interface - Ask questions in plain English powered by GPT-4
- 📊 Multiple Clustering Algorithms - K-Means, DBSCAN, Hierarchical, GMM
- 📈 Automatic Optimization - Finds optimal number of clusters using silhouette and elbow methods
- 🎨 Interactive Visualizations - PCA plots, feature distributions, cluster comparisons
- 💻 User-Friendly UI - Gradio-based web interface for easy interaction
- 🔧 Extensible Architecture - Easy to add new tools and algorithms
The easiest way to get started is using Google Colab:
- Clone the repository
git clone https://github.com/janhavi-giri/semiconductor-wafer-clustering-agent.git
cd semiconductor-wafer-clustering-agent
- Install dependencies
pip install -r requirements.txt
- Set your OpenAI API key
export OPENAI_API_KEY="sk-your-api-key-here"
- Run the UI
python run_ui.py
- Initialize the agent with your OpenAI API key
- Load your data - Upload CSV or generate synthetic data
- Ask questions in natural language
- View results - Get insights and visualizations
from src.agent import WaferClusteringAgent
# Initialize agent
agent = WaferClusteringAgent(api_key="your-openai-api-key")
# Load your data
import pandas as pd
df = pd.read_csv("your_wafer_data.csv")
agent.load_data(df)
# Or generate synthetic data
df = agent.generate_synthetic_data(n_wafers=1000)
agent.load_data(df)
# Analyze using natural language
response = agent.analyze("Find the optimal number of clusters")
print(response)
response = agent.analyze("Apply k-means clustering and identify outliers")
print(response)
- "What patterns exist in my wafer data?"
- "Find the optimal number of clusters for this dataset"
- "Apply k-means clustering with 4 clusters and analyze the results"
- "Which cluster has the highest yield?"
- "Compare k-means and DBSCAN clustering algorithms"
- "Identify any outlier wafers that need attention"
- "Create a PCA visualization of the clusters"
- "What factors correlate most strongly with wafer yield?"
Your CSV should contain wafer measurements with columns such as:
Wafer_ID
- Unique identifierYield_%
- Wafer yield percentageDefect_Density
- Number of defects per unit areaTemperature
- Process temperaturePressure
- Process pressureProcess_Time
- Processing duration- Additional measurement parameters
- Python 3.8+
- OpenAI API key with GPT-4 access
- 8GB+ RAM recommended for large datasets
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Built with LangChain for agent orchestration
- Powered by OpenAI GPT-4 for natural language understanding
- UI created with Gradio
- Clustering algorithms from scikit-learn
If you use this project in your research, please cite:
@software{wafer_clustering_agent,
title={Semiconductor Wafer Clustering AI Agent},
author={Janhavi Giri},
year={2025},
url={https://github.com/janhavi-giri/semiconductor-wafer-clustering-agent}
}
Made with ❤️ for the semiconductor industry