Skip to content

[Github] Allow ingestion of additional documentation file formats such as .adoc and .mdx #3602

@keiransteele-phocas

Description

@keiransteele-phocas

Problem Description

We would like to ingest documentation for tooling that we use to then be able to search, chat with and integrate into the AI assistant. Generally the documentation is available in Github but some is not in formats that the connector is able to ingest.

This is two examples of documentation available in Github that the connector can't ingest:
Keycloak documentation in asciidoc (.adoc) format: https://github.com/keycloak/keycloak/tree/a8225655cfc1d4d01d6cbeea70cf45e4958e36e8/docs/guides/getting-started
Octopus Deploy documentation in MDX (.mdx) format: https://github.com/OctopusDeploy/docs/blob/main/src/pages/docs.mdx

Given the ease Elastic has with ingesting and searching raw text from Confluence and Markdown ingestion from Github is a similar blob of text, I'm unable to see why there is such strict limitation on the extensions that are being ingested from Github

Proposed Solution

Extend the list of supported extensions to include additional formats that documentation is commonly written in.

Current configuration:

SUPPORTED_EXTENSION = [".markdown", ".md", ".rst"]

Proposed configuration or similar:

SUPPORTED_EXTENSION = [".markdown", ".md", ".rst", ".adoc", ".mdx"]

A more complex solution would be to expose the list to the user via the advanced sync rules or other configuration.

Alternatives

I have considered using the web crawler but ideally would like to use the connector given that it's available and this seems to be one of it's primary use cases. https://github.com/elastic/crawler

Additional Context

I would be happy to have a go at a PR if the proposal to change the list is good enough.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions