This repository was created based on the codes found in the openshift/lighspeed-rag-content for building a RAG (Retrieval-Augmented Generation) vector database, which is used with the Ansible Automation Platform (AAP) chatbot from the documentation sources stored in the ansible/aap-docs repository.
make install-tools
make install-deps
make install-deps-test
Currently, aap-rag-content images are built with Gitlab.cee aap-rag-content repository, which references this repository as a git submodule. However, If you need to build an image manually from this repository, use the following steps.
-
Obtain the access to the Mimir repository and clone the repository.
-
Create the
./mimir
folder in the project root. -
Copy
mimir-extract-latest.tgz.enc
file to./mimir
-
Run
./scripts/mimir-parser.py
, which will extract markdown files in./aap-product-docs-plaintext
folder../scripts/mimir-parser.py
If you want to include Knowledge Base articles, which are stored
under the red_hat_content/solutions
folder in the Mimir archive,
run the mimir-parser.py
with the --add-kb-articles
option, i.e.,
./scripts/mimir-parser.py --add-kb-articles
The script extracts only the articles whose [products]
metadata
contains Red Hat Ansible Automation Platform
. The metadata is
defined in the beginning of each Knowledge Base markdown file.
make build-image-aap
podman login quay.io
podman push aap-rag-content quay.io/ansible/aap-rag-content
By default, Faiss Vector Store is used for saving embeddings and the result is included in container images. You can also use Postgresql database as the vector store with its PGVector extension.
make start-postgres-debug
The data
directory of Postgres is created under ./postgresql/data
.
make generate-embeddings-postgres
The result is saved in the data_aap_product_docs_2_5
table.
$ podman exec -it pgvector bash
root@7894ab5c94e2:/# psql -U postgres
psql (16.4 (Debian 16.4-1.pgdg120+2))
Type "help" for help.
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------------------+-------+----------
public | data_aap_product_docs_2_5 | table | postgres
(1 row)
postgres=#