Intelligent Feature Engineering with AI Agents
PyroChain combines PyTorch's deep learning capabilities with LangChain's agentic AI to automate feature extraction from complex, multimodal data. AI agents collaborate to understand, process, and extract meaningful features from text, images, and structured data.
Traditional Feature Engineering is Hard:
- Manual feature extraction is time-consuming and error-prone
- Different data types require different approaches
- Domain expertise is needed to create meaningful features
- Features become outdated as data patterns change
PyroChain Makes It Easy:
- AI agents automatically extract relevant features from any data type
- Collaborative agents validate and refine features using chain-of-thought reasoning
- Learns from your data to improve feature quality over time
- Works seamlessly with existing ML pipelines
- π€ AI Agents: Intelligent agents that collaborate to extract, validate, and refine features
- π Multimodal Processing: Handle text, images, and structured data in one pipeline
- β‘ Lightweight & Fast: Efficient LoRA adapters that train quickly on your data
- π§ Memory & Learning: Agents remember past decisions and improve over time
- π E-commerce Ready: Built-in tools for product recommendations and customer analysis
- ποΈ Production Ready: Scalable architecture designed for real-world applications
E-commerce & Retail:
- Product recommendation systems
- Customer sentiment analysis
- Inventory optimization
- Price prediction and analysis
Content & Media:
- Text classification and tagging
- Image content analysis
- Content recommendation
- Automated content moderation
Business Intelligence:
- Customer behavior analysis
- Market trend detection
- Risk assessment
- Automated reporting
pip install pyrochain
git clone https://github.com/irfanalidv/PyroChain.git
cd PyroChain
pip install -e .
- Python 3.8+
- PyTorch 2.0+
- LangChain 0.1+
- Transformers 4.20+
from pyrochain import PyroChain
from transformers import AutoTokenizer, AutoModel
from textblob import TextBlob
import torch
from datasets import load_dataset
# Load real transformer model and tokenizer
model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Initialize PyroChain with transformer model
pyrochain = PyroChain()
# Load REAL data from IMDB dataset
print("π Loading real IMDB dataset...")
dataset = load_dataset("imdb", split="train[:4]") # Load first 4 real reviews
# Extract features from REAL dataset with TextBlob sentiment analysis
for i, sample in enumerate(dataset):
text = sample["text"]
label = sample["label"] # 0 = negative, 1 = positive
# Use TextBlob for real sentiment analysis
blob = TextBlob(text)
sentiment_score = (blob.sentiment.polarity + 1) / 2 # Convert to 0-1 scale
data = {
"text": text,
"title": f"IMDB Review {i+1}",
"rating": 5 if label == 1 else 1,
"category": "movie_review"
}
features = pyrochain.extract_features(
data,
"Extract features for sentiment analysis using TextBlob and transformer model"
)
print(f"Text: {text[:100]}...")
print(f"Real Label: {label} | TextBlob Sentiment: {sentiment_score:.3f}")
print(f"Features: {len(features['features'])}")
print("---")
# Run the complete real data example
cd examples
python main_example.py
What you'll see:
π₯ PyroChain Real Data Demo - 100% Real Analysis
============================================================
π Real Data Feature Extraction Example
==================================================
π Loading real IMDB dataset using transformer models...
π₯ Downloading real IMDB dataset...
β
Loaded 5 real IMDB samples using transformer model
π Processing: IMDB Review 1
Text: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it w...
Rating: 1/5 (Real IMDB Label: 0 = Negative)
β
Extracted 2 feature sets
π Modalities: ['text']
β±οΈ Processing time: 0.025s
π Data source: real_imdb_dataset
π sentiment_analysis:
sentiment_score: 0.57
polarity: 0.14
subjectivity: 0.85
positive_words: 16
negative_words: 4
total_sentiment_words: 20
confidence: 0.95
π text_features:
word_count: 288
char_count: 1640
sentence_count: 14
avg_word_length: 4.7
avg_sentence_length: 20.57
readability_score: 0.0
topic_keywords: ['movie', 'review', 'story', 'direction', 'visuals', 'drama']
π Real Data E-commerce Analysis
==================================================
π Analyzing: Wireless Bluetooth Headphones
π° Price: $199.99
β Rating: 4.5/5 (128 votes)
β
Recommendation score: 0.91
π Features extracted: 2
π Top Recommendations:
1. Wireless Bluetooth Headphones - Score: 0.91
2. Organic Cotton T-Shirt - Score: 0.815
- Data Ingestion: Accepts multimodal data (text, images, structured)
- Agent Processing: AI agents analyze data using chain-of-thought reasoning
- Feature Extraction: Collaborative agents extract relevant features
- Validation: Agents validate and refine features through discussion
- Output: Clean, structured features ready for ML models
from pyrochain import PyroChain, PyroChainConfig
from transformers import AutoTokenizer, AutoModel
import torch
# Load real transformer model for e-commerce analysis
model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Real e-commerce product data
products = [
{
"id": "prod_001",
"title": "Wireless Bluetooth Headphones",
"description": "High-quality wireless headphones with noise cancellation and 30-hour battery life. Perfect for music lovers and professionals.",
"price": 199.99,
"category": "electronics",
"rating": 4.5
},
{
"id": "prod_002",
"title": "Organic Cotton T-Shirt",
"description": "Comfortable organic cotton t-shirt in various colors and sizes. Made from 100% organic cotton, eco-friendly and sustainable.",
"price": 29.99,
"category": "clothing",
"rating": 4.2
}
]
# Configure for e-commerce with transformer model
config = PyroChainConfig(
task_type="ecommerce", # Task type: "general", "ecommerce", "custom"
enable_agents=True, # Enable AI agent collaboration
enable_training=False, # Enable model training
max_length=512, # Maximum input length
learning_rate=1e-4, # Learning rate for training
num_epochs=3, # Number of training epochs
device="auto" # Device: "auto", "cpu", "cuda"
)
pyrochain = PyroChain(config=config)
# Process real product data with transformer analysis
for product in products:
features = pyrochain.extract_features(
product,
"Extract features for product recommendation using transformer model"
)
print(f"Product: {product['title']} - Features: {len(features['features'])}")
print(f"Price: ${product['price']} - Rating: {product['rating']}/5")
PyroChain
: Main library class for feature extractionPyroChainConfig
: Configuration class for customizing behaviorLoRAAdapter
: Lightweight adapter for efficient model fine-tuningMultimodalProcessor
: Handles text, image, and structured data processing
extract_features(data, task_description)
: Extract features from datatrain(training_data, task_description)
: Train custom agentsevaluate(test_data)
: Evaluate model performancesave_model(path)
: Save trained modelload_model(path)
: Load pre-trained model
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- PyTorch for deep learning capabilities
- LangChain for agentic AI framework
- Hugging Face for transformer models
- Sentence Transformers for text embeddings
Need help? We're here to support you:
- π Documentation
- π Report Issues
- π‘ Feature Requests
- π§ Contact
PyroChain - Transform your data into intelligent features with AI agents. π₯
Built with β€οΈ by Irfan Ali