🎬 MovieSeek: Semantic Search for Movies

A Hackathon6 Project – Built with Streamlit, Hugging Face Transformers & Pinecone

🌟 Project Overview

MovieSeek is a semantic search engine that allows users to explore movies using natural language queries. Instead of keyword-based filters, our app uses embeddings from pre-trained language models to retrieve relevant results based on meaning—not just syntax.

This project was created during Hackathon6 to demonstrate how LLM-powered search can be applied to entertainment data such as Netflix and IMDb.

🔍 Key Features

🧠 Semantic Search using Hugging Face sentence-transformers and Pinecone vector database
🎞️ Query Netflix/IMDb-style datasets using natural language
⚡ Streamlit App frontend for interactive movie exploration
📈 Backend pipeline for data cleaning, embedding generation, and storage

🧠 Motivation

With countless movies available, it can be overwhelming to choose what to watch. Our goal was to design an intelligent system that understands user intent—even if vague or descriptive—and returns meaningful recommendations.

🏗️ Architecture & Tech Stack

Layer	Technology
Frontend	Streamlit
Backend	Python, sentence-transformers, Pinecone, scikit-learn
Embedding	`all-MiniLM-L6-v2` via Hugging Face Transformers
Data Storage	Pinecone Vector DB
Dev Tools	GitHub Codespaces, Jupyter Notebooks

📂 Repository Structure

Hackathon6/ ├── LICENSE ├── README.md ├── Netflix TV Shows and Movies.csv # Raw dataset ├── clean_data.py # Data cleaning script ├── create_dataset.py # Dataset preparation ├── embed_and_store_data.py # Generate and push embeddings ├── search_app.py # Streamlit app entry point

🚧 Challenges Faced

📚 Learning curve with semantic search and vector databases
⏱️ Limited time to integrate frontend with backend
🤝 Team collaboration across different skill levels
🔄 Adjusting scope mid-hackathon due to RAG AI workshop insights

✅ Achievements

Successfully implemented end-to-end semantic search
Created a working Streamlit interface for real-time movie queries
Promoted cross-disciplinary teamwork under tight deadlines
Used real-world movie datasets for a meaningful demo

📚 Lessons Learned

How to leverage semantic embeddings for search and ranking
Importance of cleaning and preprocessing for real-world datasets
How to quickly prototype apps with Streamlit
Integration between frontend UX and backend ML models

🚀 What's Next

✅ Expand dataset coverage with additional metadata (e.g., user reviews, actors)
🔍 Improve retrieval quality using custom fine-tuning
🖥️ Migrate to a more dynamic full-stack framework (e.g., React + FastAPI)
📊 Add interactive data visualizations for richer user feedback

🛠️ Installation & Usage

1. Clone the repository

git clone https://github.com/SaharZargarzadeh/Hackathon6.git

cd Hackathon6

2. Install dependencies

pip install -r requirements.txt

OR manually install:

pip install transformers torch sentence-transformers pinecone-client streamlit scikit-learn

3. Prepare data and embeddings

Run each of the following Python scripts in order:

python clean_data.py
python create_dataset.py
python embed_and_store_data.py

4. Launch the app

streamlit run search_app.py
App will open in your browser at http://localhost:8501

📄 License

This project is licensed under the GPL-3.0 License.

🙌 Acknowledgments

Hugging Face for sentence-transformers

Pinecone for vector search infrastructure

Streamlit for the rapid prototyping framework

Hackathon6 mentors and organizers

✅ Let me know if you'd like me to also generate:

A requirements.txt file
A badge-friendly GitHub “About” section
A banner image for the top of the README
A project thumbnail for social media previews

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 MovieSeek: Semantic Search for Movies

🌟 Project Overview

🔍 Key Features

🧠 Motivation

🏗️ Architecture & Tech Stack

📂 Repository Structure

🚧 Challenges Faced

✅ Achievements

📚 Lessons Learned

🚀 What's Next

🛠️ Installation & Usage

1. Clone the repository

2. Install dependencies

OR manually install:

3. Prepare data and embeddings

4. Launch the app

📄 License

🙌 Acknowledgments

✅ Let me know if you'd like me to also generate:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
Netflix TV Shows and Movies.csv		Netflix TV Shows and Movies.csv
README.md		README.md
clean_data.py		clean_data.py
create_dataset.py		create_dataset.py
embed_and_store_data.py		embed_and_store_data.py
search_app.py		search_app.py

License

SaharZargarzadeh/semantic-movie-search-hackathon6

Folders and files

Latest commit

History

Repository files navigation

🎬 MovieSeek: Semantic Search for Movies

🌟 Project Overview

🔍 Key Features

🧠 Motivation

🏗️ Architecture & Tech Stack

📂 Repository Structure

🚧 Challenges Faced

✅ Achievements

📚 Lessons Learned

🚀 What's Next

🛠️ Installation & Usage

1. Clone the repository

2. Install dependencies

OR manually install:

3. Prepare data and embeddings

4. Launch the app

📄 License

🙌 Acknowledgments

✅ Let me know if you'd like me to also generate:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages