Skip to content

Synthetic Data Forge – RAG-Powered Custom Dataset Generator #14

@anikchand461

Description

@anikchand461

Hi Appwrite team/community! I'm Anik Chand, aspiring ML Engineer from Kolkata, India (B.Tech CSE at Haldia Institute of Technology, CGPA 8.59). With hands-on experience in GANs (TensorFlow/Keras for synthetic handwritten digits on MNIST, including training viz pipelines), RAG chatbots (FastAPI/LangChain/Gemini API deployed on Render for portfolio Q&A), and ensemble classifiers (Scikit-learn/TF-IDF achieving 89% accuracy on sentiment segmentation), I'm pumped to contribute to Hacktoberfest 2025. Check my GitHub for full projects like Fake Handwritten Digits Generation and AkBOT.

Project Overview

A web app where users describe datasets they need (e.g., "Generate 1K synthetic medical records for privacy-safe ML training"), and a RAG system pulls from public schemas/examples to guide a GAN in creating realistic, exportable data. Includes validation via quick ensemble classifiers and a dashboard for previews—perfect for data-scarce ML projects.

Key Features & Appwrite Integration

  • Auth: User accounts to save/share generated datasets securely.
  • Databases: Store user prompts, generation params, and validation metrics (e.g., accuracy logs from Scikit-learn).
  • Storage: Upload/export CSVs/JSONs or GAN artifacts (images/models).
  • Functions: Serverless GAN training/inference (Python runtime with tf.GradientTape) and RAG queries (LangChain for prompt enhancement).
  • Bonus: Realtime progress updates via Messaging for long gens.

Tech: Python/FastAPI backend, Gradio for interactive UI (Matplotlib previews), TensorFlow/NumPy/Scikit-learn. Deployed on Appwrite Sites—leveraging 4+ services for central functionality.

Questions for Feedback

  • How to handle large GAN models in Appwrite Functions (e.g., avoiding cold starts with pre-loaded weights)?
  • Best way to integrate Appwrite Databases as a vector store for RAG (instead of ChromaDB)?
  • Ideas to boost creativity/impact, like Hugging Face model swaps or collaborative dataset sharing?

Would love your input to refine before prototyping—proto repo coming soon!

Thanks!
Anik Chand
mail : anikchand461@gmail.com
portfolio : https://portfolio-fawn-beta-28.vercel.app/
linkedin : https://www.linkedin.com/in/anik-chand-3b14b12b6/
github : https://github.com/anikchand461

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions