Skip to content

ignasf5/github-readme-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GitHub README Chatbot

Overview

This Streamlit application allows users to interact with a GitHub repository's README.md file in a chatbot format. By submitting a query, the bot will search through the sections of the README.md file and return the most relevant information based on the user's question. The bot uses natural language processing (NLP) techniques, including TF-IDF vectorization and cosine similarity, to determine the best matches for the given query.

Features

  • Interactive Chatbot: Ask questions about the repository's README.md content.
  • Multiple Repository Support: Fetch and combine content from multiple GitHub repositories.
  • Natural Language Processing: The chatbot uses TF-IDF and cosine similarity to find relevant answers from the repository documentation.
  • Image parser: Content with images will be displayed.

Installation

  1. Clone the repository to your local machine:
git clone https://github.com/ignasf5/github-readme-chatbot.git
cd github-readme-chatbot
  1. Build image
docker build -t github-readme-chatbot .
  1. Run image
docker run -p 8501:8501 github-readme-chatbot
  1. Run image with new default repo
docker run -p 8501:8501 -e DEFAULT_REPO_URL="https://github.com/another/repository" github-readme-chatbot

Or run through terminale streamlit run ./app.py

Set the number of characters to return.

ENV SUMMARY_MAX_LENGTH=500

How it Works

Key Components

ReadmeChatbot class: This class parses the README.md content, processes sections, and uses TF-IDF for matching queries to the relevant sections.

Cosine Similarity: The chatbot calculates cosine similarity between the user's query and sections of the README.md file to determine the most relevant responses. Fetching README Files: The app fetches README.md files from GitHub repositories via the GitHub API.

Workflow

Initialization: When the app starts, it loads the README.md from a default GitHub repository.

User Interaction: Users can enter a GitHub repository URL to load another repository's README.md. The bot combines sections from both repositories.

Query Processing: The user submits queries, and the bot returns the most relevant sections of the README.md files.

Example Usage Default Repository: The app starts with a default repository (https://github.com/ignasf5/chatbot). You can modify the default repository URL in the code.

Adding Another Repository: Users can add another repository by entering its GitHub URL in the input field. The bot will process and combine both repositories' README.md files.

Asking Questions: Type a question related to the repository's documentation in the chat, and the bot will return the top matching sections.

Example Interaction

image

Technologies Used

Streamlit: A Python library used for building the interactive UI.

scikit-learn: Used for natural language processing tasks like TF-IDF vectorization and cosine similarity.

BeautifulSoup: Used for parsing and processing the HTML content from README.md files.

Requests: Used to fetch raw README.md files from GitHub repositories.

Troubleshooting

Error fetching README: If you encounter issues fetching the README.md from GitHub, ensure the repository URL is correct and the repository has a README.md file in the default branch (typically main or master).

Slow response times: The first time a repository is processed, it may take a moment to load and process the content. Once the vectors are cached, subsequent queries will be faster.

image

image

Content check

# Show the raw README.md content
readme_content = fetch_readme_from_github(default_repo_url)
if st.checkbox("Show raw README.md content"):
    st.code(readme_content, language="markdown")

# Extract chatbot instance for parsed sections
combined_chatbot, _, _ = primary_resources
if st.checkbox("Show parsed sections"):
    st.write(combined_chatbot.sections)

image

Additionaly

Implementeded image parser.

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published