Skip to content

A comprehensive system for analyzing Go codebases and tracking CVE impact through exact symbol matching and document-based search.

Notifications You must be signed in to change notification settings

sunku5494/cve-affected-go-code-usage-tracer

Repository files navigation

CVE-Affected Go Code Usage Tracer

A comprehensive system for analyzing Go codebases and tracking CVE impact through exact symbol matching and document-based search.


Overview

This system enables engineers to quickly identify and assess the impact of CVEs on Go codebases by creating a searchable knowledge base of code chunks with rich metadata and exact symbol matching capabilities.


System Architecture

graph TB
    %% Input Sources
    GO_PROJECT["Go Projects<br/>• Main modules<br/>• Vendor dependencies"]
    CVE_INFO["CVE Information<br/>• Affected packages<br/>• Vulnerable symbols"]
    
    %% Core Processing Components
    GO_AST_PARSER["go-ast-parser<br/>• AST analysis<br/>• Code extraction<br/>• JSON generation"]
    
    INGESTION["Ingestion Script<br/>• JSON parsing<br/>• ChromaDB storage<br/>• Symbol indexing"]
    
    SEARCH_QUERY["Search Query Script<br/>• Exact symbol matching<br/>• CVE impact analysis<br/>• Usage identification"]

    %% Data Storage
    JSON_OUTPUT["code_chunks.json<br/>• Structured output<br/>• Rich metadata"]
    CHROMA_DB["ChromaDB<br/>• Document storage<br/>• Exact text search"]

    %% Data Flow
    GO_PROJECT --> GO_AST_PARSER
    GO_AST_PARSER --> JSON_OUTPUT
    JSON_OUTPUT --> INGESTION
    INGESTION --> CHROMA_DB
    
    %% CVE Response Flow
    CVE_INFO --> SEARCH_QUERY
    SEARCH_QUERY --> CHROMA_DB
    CHROMA_DB --> SEARCH_QUERY

    %% Styling
    classDef inputStyle fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef processStyle fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef storageStyle fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px

    class GO_PROJECT,CVE_INFO inputStyle
    class GO_AST_PARSER,INGESTION,SEARCH_QUERY processStyle
    class JSON_OUTPUT,CHROMA_DB storageStyle
Loading

Three-Stage Pipeline

Stage Component Description
1 Code Analysis & Extraction (Go) go-ast-parser analyzes Go projects and extracts structured code chunks to code_chunks.json
2 Knowledge Base Creation (Python) Ingestion script processes JSON data and stores in ChromaDB for indexing
3 CVE Impact Assessment (Python) Search script executes exact symbol matching queries for impact analysis

Features

Feature Description
Comprehensive Analysis Processes main module + vendor dependencies
Rich Metadata Extracts types, symbols, functions, methods
Exact Symbol Matching Precise import_pkg.symbol pattern matching
CVE Impact Assessment Rapid vulnerability impact analysis
JSON Output Structured data for semantic search systems
Modular Architecture Clean, testable, maintainable codebase

Requirements

  • Go: 1.23+
  • Python: 3.8+
  • ChromaDB: Server instance
  • Target Projects: Must have go.mod files

Complete Setup & Usage Commands

1. Environment Setup

# Clone the repository (if not already done)
git clone <repository-url>
cd cve-affected-go-code-usage-tracer

# Create and activate Python virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

# Build the Go AST parser
make build

2. Start ChromaDB Server

# Start ChromaDB server on default port 8000
chroma run --path ./chromadb --port 8000

3. Extract Code Chunks from Go Project

# Analyze a Go project and extract code chunks
./bin/go-ast-parser -path /path/to/your/go/project

# This generates: code_chunks.json

4. Ingest Code Chunks to Database

# Load the JSON data into ChromaDB
python ingest_code_chunks_to_chromadb.py

5. Search for CVE Impact

# Search using exact import package and symbol name format
python cve_impact_search.py "import_pkg_name.symbolname"

Search Examples

# Search for vulnerable function usage
python cve_impact_search.py "github.com/vulnerable/package.VulnerableFunction"

# Search for TLS configuration usage
python cve_impact_search.py "crypto/tls.Config"

# Search for HTTP status usage
python cve_impact_search.py "net/http.StatusOK"

6. Additional Commands

# Show available make targets
make help

# Clean build artifacts
make clean

# View all available make commands
make

Search Method

This system uses exact document-based matching rather than semantic search:

Aspect Details
Input Format import_pkg_name.symbolname
Matching Strategy Exact string matching against stored symbol metadata
Search Target The accessed_symbols field in document metadata
Performance Fast, precise results with no false positives

Output Format

The go-ast-parser generates structured JSON output with rich metadata:

{
  "id": "/path/to/utils.go:45-47-IncludeFiles",
  "document": "func (wm WalkMode) IncludeFiles() bool {\n\treturn wm == W_FILEONLY || wm == W_ALL\n}",
  "metadata": {
    "accessed_symbols": [
      "archive/tar.Header",
      "io.Reader",
      "github.com/sirupsen/logrus.FieldLogger"
    ],
    "entity_name": "IncludeFiles",
    "entity_type": "method",
    "file_path": "/path/to/utils.go",
    "is_vendored": false,
    "package_name": "utils",
    "receiver_type": "github.com/openshift/assisted-installer/src/utils.WalkMode"
  }
}

Field Descriptions

Field Description
id Unique identifier: file_path:line_start-line_end-entity_name
document Raw Go code content with actual formatting
accessed_symbols Array of imported packages/symbols used (can be null for constants)
entity_name Name of the code entity
entity_type Type: function, struct, method, interface, const, var, alias_or_basic
file_path Full path to the source file
is_vendored Boolean indicating if code is from vendor directory
package_name Go package name
receiver_type Methods only: Fully qualified receiver type

Configuration

Setting Value
ChromaDB localhost:8000
Embedding Model all-MiniLM-L6-v2
Search Method Exact document text matching
Collection go_code_chunks
Search Format import_pkg_name.symbolname
Batch Size Configurable for large codebases

Key Workflows

Proactive Code Indexing

Go Projects → go-ast-parser → code_chunks.json → Ingestion Script → ChromaDB → Searchable Knowledge Base

Reactive CVE Response

CVE Alert → Symbol Query (import_pkg.symbol) → Search Script → ChromaDB → Impact Results → Remediation Planning

Use Cases

  • Immediate CVE impact assessment across large codebases
  • Comprehensive vulnerability tracking in Go ecosystems
  • Precise identification of affected code locations
  • Security response and code analysis

Additional Resources


Key Value Proposition
This system enables organizations to quickly answer: "What code in our systems uses this vulnerable package or function?" through exact symbol matching, making it invaluable for security response and code analysis.

About

A comprehensive system for analyzing Go codebases and tracking CVE impact through exact symbol matching and document-based search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published