GitHub Stars Search

Neural search for your starred GitHub repositories using txtai embeddings.

Overview

GitHub Stars Search is a tool that allows you to search through your starred GitHub repositories using neural embeddings. It extracts README files and repository metadata, processes them into chunks, and creates embeddings for efficient semantic search.

Key features:

Fetches your starred GitHub repositories and their metadata
Extracts README files (preferably in English)
Processes content using intelligent chunking strategies
Generates embeddings using BAAI/bge-small-en-v1.5 via txtai
Provides hybrid search combining neural embeddings and BM25 keyword search
Stores all data and embeddings on disk for reuse
Supports incremental updates for newly starred repositories
Configurable search parameters and embedding models

Installation

Clone the repository:

git clone https://github.com/yourusername/github_stars.git
cd github_stars

Install dependencies:

pip install -r requirements.txt

Set up your GitHub API key: Create a .env file in the project root with your GitHub API key:

GITHUB_STARS_KEY=your_github_api_key

You can generate a GitHub API key from your GitHub Developer Settings.

Usage

Updating Repository Data

Before searching, you need to fetch and process your starred repositories:

python github_stars_search.py update

This will:

Fetch your starred repositories from GitHub
Extract README files and metadata
Process content into chunks
Generate embeddings
Store everything on disk

You can limit the number of repositories to update:

python github_stars_search.py update --limit 100

Or force update all repositories, even if they haven't changed:

python github_stars_search.py update --force

Searching Repositories

Once you've updated your repository data, you can search through them:

python github_stars_search.py search "machine learning for time series"

You can apply filters to your search:

python github_stars_search.py search "machine learning" --min-stars 100 --language python

And customize the search weights:

python github_stars_search.py search "machine learning" --neural-weight 0.8 --keyword-weight 0.2

Web Interface

For a more user-friendly experience, you can use the included web interface:

# Install Flask if you don't have it
pip install flask

# Run the web interface
python examples/web_interface.py

This will start a local web server at http://127.0.0.1:5000 where you can:

Search your repositories with a simple form
Adjust neural and keyword search weights
Filter by language and minimum stars
View nicely formatted search results

Configuration

You can view and modify the configuration:

python github_stars_search.py config --show

Set a different embedding model:

python github_stars_search.py config --embedding-model "sentence-transformers/all-MiniLM-L6-v2"

Change search weights:

python github_stars_search.py config --neural-weight 0.6 --keyword-weight 0.4

Information

View information about your data:

python github_stars_search.py info

Configuration

The config.yaml file contains various settings that you can customize:

GitHub API settings (pagination, retries, timeout)
Content processing settings (chunk size, strategy)
Embedding settings (model, device, batch size)
Search settings (weights, result limits)
Storage settings (compression, backups)

How It Works

Content Processing

The system uses a hybrid chunking strategy:

First attempts to chunk by semantic sections (headers)
For large sections, applies sliding window chunking with overlap
Preserves repository context in each chunk

Search

The search engine combines two approaches:

Neural search using embeddings for semantic understanding
BM25 keyword search for traditional relevance

Results are merged with configurable weights to provide the most relevant repositories.

Testing

The project includes a comprehensive test suite to ensure code quality and reliability.

Running Tests

First, install the testing dependencies:

# Using the provided script
./install_test_deps.sh

# Or manually
pip install pytest pytest-cov pytest-mock

Then run the tests:

# Using the provided scripts
./run_tests.sh                # Run all tests
./run_specific_test.sh tests/test_github_client.py  # Run a specific test file
./run_marked_tests.sh unit    # Run tests with a specific marker

# Or using pytest directly
cd github_stars
pytest
pytest --cov=src  # Run with coverage

Test Categories

Tests are categorized using pytest markers:

unit: Unit tests that test individual components in isolation
integration: Integration tests that test the interaction between components
api: Tests that interact with the GitHub API (requires a valid API key)
slow: Tests that are slow to run

To run tests with a specific marker:

pytest -m unit

To run tests excluding a specific marker:

pytest -m "not api"

See the tests README for more information.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Stars Search

Overview

Installation

Usage

Updating Repository Data

Searching Repositories

Web Interface

Configuration

Information

Configuration

How It Works

Content Processing

Search

Testing

Running Tests

Test Categories

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
exports		exports
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
github_stars_search.py		github_stars_search.py
install_test_deps.sh		install_test_deps.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_marked_tests.sh		run_marked_tests.sh
run_specific_test.sh		run_specific_test.sh
run_tests.sh		run_tests.sh
setup.py		setup.py
test_installation.py		test_installation.py

License

SushantDaga/github-stars-search

Folders and files

Latest commit

History

Repository files navigation

GitHub Stars Search

Overview

Installation

Usage

Updating Repository Data

Searching Repositories

Web Interface

Configuration

Information

Configuration

How It Works

Content Processing

Search

Testing

Running Tests

Test Categories

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages