RAG GitHub Actions

A collection of GitHub Actions that provide typical Langchain RAG workflows such as content indexing, chunking and embedding. The resulting embeddings are then stored in Supabase tables for retrieval and querying.

Setup

Create a Supabase account and database and create required vector tables and functions using the provided example SQL file.

All workflows require the same base configuration, consisting of an OpenAI API Key, Supabase URL and API Key which should be created as GitHub secrets so they can be reused across multiple workflows.

Ingestion

Inputs:

openai_api_key: OpenAI API key
embedding_model: Embedding model alias (default: text-embedding-3-small)
supabase_url: Supabase URL
supabase_key: Supabase API key
supabase_table: Supabase table
metadata: Metadata (dict)
args: General arguments (dict)
loader_class: Document loader class alias (default: markdown)
loader_args: Document loader class arguments (default: {})
chunker_class: Chunker class alias (default: recursive_character)
chunker_args: Chunker class arguments (default: {"chunk_size": 1000, "chunk_overlap": 200})
user_agent: User agent string (default: recent Firefox/Linux)

Markdown Directory Example:

- name: Markdown Directory Ingestion
  uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
  with:
    openai_api_key: ${{ secrets.OPENAI_API_KEY }}
    supabase_url: ${{ secrets.SUPABASE_URL }}
    supabase_key: ${{ secrets.SUPABASE_KEY }}
    supabase_table: ${{ vars.SUPABASE_TABLE }}
    metadata: |
      {
        "collection": "${{ vars.SUPABASE_COLLECTION }}",
        "github": {
          "run": "${{ github.run_id }}"
        }
      }
    args: |
      {
        "path": "./test/content/test",
        "glob": "**/*.md"
      }
    loader_class: "markdown"
    loader_args: '{}'
    chunker_class: "recursive_character"
    chunker_args: '{"chunk_size": 1000, "chunk_overlap": 200}'

RSS Feed Example:

  - name: RSS Feed Ingestion
    uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      metadata: |
        {
          "collection": "${{ vars.SUPABASE_COLLECTION }}",
          "feed": "news",
          "source": "bbc"
        }
      loader_class: "rss"
      loader_args: |
        {
          "urls": [
            "https://feeds.bbci.co.uk/news/rss.xml"
          ]
        }

Web Page Example:

  - name: Web Page Ingestion
    uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      metadata: |
        {
          "collection": "${{ vars.SUPABASE_COLLECTION }}"
        }
      loader_class: "web"
      loader_args: |
        {
          "web_path": "https://www.bbc.co.uk"
        }

Inferrence

Inputs:

openai_api_key: OpenAI API key
embedding_model: Embedding model alias (default: text-embedding-3-small)
chat_model: Chat model alias (default: gpt-4o-mini)
supabase_url: Supabase URL
supabase_key: Supabase API key
supabase_table: Supabase table
supabase_filter: Supabase filter (default: {})
chat_prompt: Chat prompt (default: You are a helpful assistant. Answer the question based on the provided context.)
query: Query for RAG retrieval
top_k: Number of top results to return (default: 5)
output_file: Output file path

Chat Example:

  - name: Chat
    uses: jinglemansweep/rag-actions/.github/actions/infer-chat@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      supabase_filter: '{"collection": "${{ vars.SUPABASE_COLLECTION }}"}'
      chat_model: "gpt-4o-mini"
      chat_prompt: "You are a helpful assistant. Answer the question based on the provided context."
      query: "UK News"
      top_k: "5"
      output_file: "./uknews.md

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
docs/images		docs/images
rag_action		rag_action
supabase		supabase
test/content		test/content
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG GitHub Actions

Setup

Ingestion

Inferrence

About

Uh oh!

Releases

Packages

Languages

License

jinglemansweep/rag-actions

Folders and files

Latest commit

History

Repository files navigation

RAG GitHub Actions

Setup

Ingestion

Inferrence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages