Skip to content

jinglemansweep/rag-actions

Repository files navigation

RAG GitHub Actions

Docker Build/Push Tests: Ingestion Tests: Inferrence Black Flake8 MyPy

Logo

A collection of GitHub Actions that provide typical Langchain RAG workflows such as content indexing, chunking and embedding. The resulting embeddings are then stored in Supabase tables for retrieval and querying.

Setup

Create a Supabase account and database and create required vector tables and functions using the provided example SQL file.

All workflows require the same base configuration, consisting of an OpenAI API Key, Supabase URL and API Key which should be created as GitHub secrets so they can be reused across multiple workflows.

Ingestion

Inputs:

  • openai_api_key: OpenAI API key
  • embedding_model: Embedding model alias (default: text-embedding-3-small)
  • supabase_url: Supabase URL
  • supabase_key: Supabase API key
  • supabase_table: Supabase table
  • metadata: Metadata (dict)
  • args: General arguments (dict)
  • loader_class: Document loader class alias (default: markdown)
  • loader_args: Document loader class arguments (default: {})
  • chunker_class: Chunker class alias (default: recursive_character)
  • chunker_args: Chunker class arguments (default: {"chunk_size": 1000, "chunk_overlap": 200})
  • user_agent: User agent string (default: recent Firefox/Linux)

Markdown Directory Example:

- name: Markdown Directory Ingestion
  uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
  with:
    openai_api_key: ${{ secrets.OPENAI_API_KEY }}
    supabase_url: ${{ secrets.SUPABASE_URL }}
    supabase_key: ${{ secrets.SUPABASE_KEY }}
    supabase_table: ${{ vars.SUPABASE_TABLE }}
    metadata: |
      {
        "collection": "${{ vars.SUPABASE_COLLECTION }}",
        "github": {
          "run": "${{ github.run_id }}"
        }
      }
    args: |
      {
        "path": "./test/content/test",
        "glob": "**/*.md"
      }
    loader_class: "markdown"
    loader_args: '{}'
    chunker_class: "recursive_character"
    chunker_args: '{"chunk_size": 1000, "chunk_overlap": 200}'

RSS Feed Example:

  - name: RSS Feed Ingestion
    uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      metadata: |
        {
          "collection": "${{ vars.SUPABASE_COLLECTION }}",
          "feed": "news",
          "source": "bbc"
        }
      loader_class: "rss"
      loader_args: |
        {
          "urls": [
            "https://feeds.bbci.co.uk/news/rss.xml"
          ]
        }

Web Page Example:

  - name: Web Page Ingestion
    uses: jinglemansweep/rag-actions/.github/actions/ingest-loader@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      metadata: |
        {
          "collection": "${{ vars.SUPABASE_COLLECTION }}"
        }
      loader_class: "web"
      loader_args: |
        {
          "web_path": "https://www.bbc.co.uk"
        }

Inferrence

Inputs:

  • openai_api_key: OpenAI API key
  • embedding_model: Embedding model alias (default: text-embedding-3-small)
  • chat_model: Chat model alias (default: gpt-4o-mini)
  • supabase_url: Supabase URL
  • supabase_key: Supabase API key
  • supabase_table: Supabase table
  • supabase_filter: Supabase filter (default: {})
  • chat_prompt: Chat prompt (default: You are a helpful assistant. Answer the question based on the provided context.)
  • query: Query for RAG retrieval
  • top_k: Number of top results to return (default: 5)
  • output_file: Output file path

Chat Example:

  - name: Chat
    uses: jinglemansweep/rag-actions/.github/actions/infer-chat@main
    with:
      openai_api_key: ${{ secrets.OPENAI_API_KEY }}
      supabase_url: ${{ secrets.SUPABASE_URL }}
      supabase_key: ${{ secrets.SUPABASE_KEY }}
      supabase_table: ${{ vars.SUPABASE_TABLE }}
      supabase_filter: '{"collection": "${{ vars.SUPABASE_COLLECTION }}"}'
      chat_model: "gpt-4o-mini"
      chat_prompt: "You are a helpful assistant. Answer the question based on the provided context."
      query: "UK News"
      top_k: "5"
      output_file: "./uknews.md

About

LLM RAG (Retrieval-Augmented Generation) GitHub Actions

Resources

License

Stars

Watchers

Forks

Packages

No packages published