Synthetic Data Forge – RAG-Powered Custom Dataset Generator

Hi Appwrite team/community! I'm Anik Chand, aspiring ML Engineer from Kolkata, India (B.Tech CSE at Haldia Institute of Technology, CGPA 8.59). With hands-on experience in GANs (TensorFlow/Keras for synthetic handwritten digits on MNIST, including training viz pipelines), RAG chatbots (FastAPI/LangChain/Gemini API deployed on Render for portfolio Q&A), and ensemble classifiers (Scikit-learn/TF-IDF achieving 89% accuracy on sentiment segmentation), I'm pumped to contribute to Hacktoberfest 2025. Check my [GitHub](https://github.com/anikchand461) for full projects like Fake Handwritten Digits Generation and AkBOT.

### Project Overview
A web app where users describe datasets they need (e.g., "Generate 1K synthetic medical records for privacy-safe ML training"), and a RAG system pulls from public schemas/examples to guide a GAN in creating realistic, exportable data. Includes validation via quick ensemble classifiers and a dashboard for previews—perfect for data-scarce ML projects.

### Key Features & Appwrite Integration
- **Auth**: User accounts to save/share generated datasets securely.
- **Databases**: Store user prompts, generation params, and validation metrics (e.g., accuracy logs from Scikit-learn).
- **Storage**: Upload/export CSVs/JSONs or GAN artifacts (images/models).
- **Functions**: Serverless GAN training/inference (Python runtime with tf.GradientTape) and RAG queries (LangChain for prompt enhancement).
- Bonus: Realtime progress updates via Messaging for long gens.

Tech: Python/FastAPI backend, Gradio for interactive UI (Matplotlib previews), TensorFlow/NumPy/Scikit-learn. Deployed on Appwrite Sites—leveraging 4+ services for central functionality.

### Questions for Feedback
- How to handle large GAN models in Appwrite Functions (e.g., avoiding cold starts with pre-loaded weights)?
- Best way to integrate Appwrite Databases as a vector store for RAG (instead of ChromaDB)?
- Ideas to boost creativity/impact, like Hugging Face model swaps or collaborative dataset sharing?

Would love your input to refine before prototyping—proto repo coming soon!

Thanks!  
Anik Chand  
mail : anikchand461@gmail.com
portfolio : https://portfolio-fawn-beta-28.vercel.app/
linkedin : https://www.linkedin.com/in/anik-chand-3b14b12b6/
github : https://github.com/anikchand461

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synthetic Data Forge – RAG-Powered Custom Dataset Generator #14

Project Overview

Key Features & Appwrite Integration

Questions for Feedback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Synthetic Data Forge – RAG-Powered Custom Dataset Generator #14

Description

Project Overview

Key Features & Appwrite Integration

Questions for Feedback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions