A comprehensive system for analyzing Go codebases and tracking CVE impact through exact symbol matching and document-based search.
This system enables engineers to quickly identify and assess the impact of CVEs on Go codebases by creating a searchable knowledge base of code chunks with rich metadata and exact symbol matching capabilities.
graph TB
%% Input Sources
GO_PROJECT["Go Projects<br/>• Main modules<br/>• Vendor dependencies"]
CVE_INFO["CVE Information<br/>• Affected packages<br/>• Vulnerable symbols"]
%% Core Processing Components
GO_AST_PARSER["go-ast-parser<br/>• AST analysis<br/>• Code extraction<br/>• JSON generation"]
INGESTION["Ingestion Script<br/>• JSON parsing<br/>• ChromaDB storage<br/>• Symbol indexing"]
SEARCH_QUERY["Search Query Script<br/>• Exact symbol matching<br/>• CVE impact analysis<br/>• Usage identification"]
%% Data Storage
JSON_OUTPUT["code_chunks.json<br/>• Structured output<br/>• Rich metadata"]
CHROMA_DB["ChromaDB<br/>• Document storage<br/>• Exact text search"]
%% Data Flow
GO_PROJECT --> GO_AST_PARSER
GO_AST_PARSER --> JSON_OUTPUT
JSON_OUTPUT --> INGESTION
INGESTION --> CHROMA_DB
%% CVE Response Flow
CVE_INFO --> SEARCH_QUERY
SEARCH_QUERY --> CHROMA_DB
CHROMA_DB --> SEARCH_QUERY
%% Styling
classDef inputStyle fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef processStyle fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef storageStyle fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
class GO_PROJECT,CVE_INFO inputStyle
class GO_AST_PARSER,INGESTION,SEARCH_QUERY processStyle
class JSON_OUTPUT,CHROMA_DB storageStyle
Stage | Component | Description |
---|---|---|
1 | Code Analysis & Extraction (Go) | go-ast-parser analyzes Go projects and extracts structured code chunks to code_chunks.json |
2 | Knowledge Base Creation (Python) | Ingestion script processes JSON data and stores in ChromaDB for indexing |
3 | CVE Impact Assessment (Python) | Search script executes exact symbol matching queries for impact analysis |
Feature | Description |
---|---|
Comprehensive Analysis | Processes main module + vendor dependencies |
Rich Metadata | Extracts types, symbols, functions, methods |
Exact Symbol Matching | Precise import_pkg.symbol pattern matching |
CVE Impact Assessment | Rapid vulnerability impact analysis |
JSON Output | Structured data for semantic search systems |
Modular Architecture | Clean, testable, maintainable codebase |
- Go: 1.23+
- Python: 3.8+
- ChromaDB: Server instance
- Target Projects: Must have
go.mod
files
# Clone the repository (if not already done)
git clone <repository-url>
cd cve-affected-go-code-usage-tracer
# Create and activate Python virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt
# Build the Go AST parser
make build
# Start ChromaDB server on default port 8000
chroma run --path ./chromadb --port 8000
# Analyze a Go project and extract code chunks
./bin/go-ast-parser -path /path/to/your/go/project
# This generates: code_chunks.json
# Load the JSON data into ChromaDB
python ingest_code_chunks_to_chromadb.py
# Search using exact import package and symbol name format
python cve_impact_search.py "import_pkg_name.symbolname"
# Search for vulnerable function usage
python cve_impact_search.py "github.com/vulnerable/package.VulnerableFunction"
# Search for TLS configuration usage
python cve_impact_search.py "crypto/tls.Config"
# Search for HTTP status usage
python cve_impact_search.py "net/http.StatusOK"
# Show available make targets
make help
# Clean build artifacts
make clean
# View all available make commands
make
This system uses exact document-based matching rather than semantic search:
Aspect | Details |
---|---|
Input Format | import_pkg_name.symbolname |
Matching Strategy | Exact string matching against stored symbol metadata |
Search Target | The accessed_symbols field in document metadata |
Performance | Fast, precise results with no false positives |
The go-ast-parser
generates structured JSON output with rich metadata:
{
"id": "/path/to/utils.go:45-47-IncludeFiles",
"document": "func (wm WalkMode) IncludeFiles() bool {\n\treturn wm == W_FILEONLY || wm == W_ALL\n}",
"metadata": {
"accessed_symbols": [
"archive/tar.Header",
"io.Reader",
"github.com/sirupsen/logrus.FieldLogger"
],
"entity_name": "IncludeFiles",
"entity_type": "method",
"file_path": "/path/to/utils.go",
"is_vendored": false,
"package_name": "utils",
"receiver_type": "github.com/openshift/assisted-installer/src/utils.WalkMode"
}
}
Field | Description |
---|---|
id |
Unique identifier: file_path:line_start-line_end-entity_name |
document |
Raw Go code content with actual formatting |
accessed_symbols |
Array of imported packages/symbols used (can be null for constants) |
entity_name |
Name of the code entity |
entity_type |
Type: function , struct , method , interface , const , var , alias_or_basic |
file_path |
Full path to the source file |
is_vendored |
Boolean indicating if code is from vendor directory |
package_name |
Go package name |
receiver_type |
Methods only: Fully qualified receiver type |
Setting | Value |
---|---|
ChromaDB | localhost:8000 |
Embedding Model | all-MiniLM-L6-v2 |
Search Method | Exact document text matching |
Collection | go_code_chunks |
Search Format | import_pkg_name.symbolname |
Batch Size | Configurable for large codebases |
Go Projects → go-ast-parser → code_chunks.json → Ingestion Script → ChromaDB → Searchable Knowledge Base
CVE Alert → Symbol Query (import_pkg.symbol) → Search Script → ChromaDB → Impact Results → Remediation Planning
- Immediate CVE impact assessment across large codebases
- Comprehensive vulnerability tracking in Go ecosystems
- Precise identification of affected code locations
- Security response and code analysis
- Architecture Documentation - Detailed technical architecture
- Makefile Commands - Build and maintenance commands
- Dependencies - Python package requirements
Key Value Proposition
This system enables organizations to quickly answer: "What code in our systems uses this vulnerable package or function?" through exact symbol matching, making it invaluable for security response and code analysis.