Skip to content

georgejaymcmc/AICampKGTalk

Repository files navigation

Knowledge graphs for codebases

Description

This notebook illustrates how to use knowledge graphs (KGs) to understand an unfamiliar codebase.

KGs are ideally suited for codebases because they are designed to piece together connected data.

Using the teardown of the popular open-source content management application Zotero as an example, the resulting KGs are split into 3 separate sections:

  1. Data KG - created by ingesting an RDBMS schema

    • easily identifies the shortest join path between any 2 (or more) tables
    • an approach that works with a database of any size
  2. Application KG - created by using:

    • the Abstract Syntax Tree to extract function and parameter names, and
    • lexical search to connect different file types in the repo
    • files of interest are then sent to an LLM for a natural language explanations
  3. Business Domain KG - illustrates how to ingest a public ontology to tie-in business concepts to content

Dependencies

  1. Install Zotero's desktop application

    • for access to the SQLite RDBMS
  2. Install Neo4j Community edition

    • desktop version, or
    • their free Cloud-tier
    • upload the included dump files to run neo4j queries and bypass the data creation and ingestion step
  3. Access to an LLM

    • example uses deepseek-coder-v2:16B running locally via Ollama
    • author has used OpenAI APIs in previous iterations
  4. Generate a classic Github Personal Access Token

    • to use the Github Codesearch API for its lexical search capability

Installation

  1. Sample .env
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=pw-for-your-neo4j-dbms
GITHUB_TOKEN=from-github
  1. Install Python packages from Pipfile
>pip install pipenv 

>pipenv install

>pipenv shell

>pipenv graph 
  1. Install NodeJS babel packages from package.json
  • Install NodeJS to traverse the Javascript ASTs to extract functions, params, etc
>npm install

Contact

For questions, suggestions, or collaborations, feel free to:

About

Notebook for AICamp talk on Mar 19, 2025

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published