Data analysis for TG/RAG project @ CDL
The project uses uv to manage and lock project dependencies for a consistent and reproducible environment. If you do not have uv
installed on your system, visit this page for installation instructions.
Note: If you have pip
, you can invoke:
pip install uv
# or
brew install uv
# Clone the repo
git@github.com:ekmpa/CrediGraph.git
# Enter the repo directory
cd CrediGraph
# Install core dependencies into an isolated environment
uv sync
# The isolated env is .venv
source .venv/bin/activate
cd bash_scripts
./end-to-end.sh /bash_scripts/CC-Crawl/CC-2025.txt
Given the size of our datasets we must leverage mini-batching in our GNN experiments. To do this we use PyG's neighbor_loader,
which requires additional libraries having undocumented build-time dependencies. As such, users are required to install them in their
own venv. seperate from uv sync
.
pyg-lib:
uv pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
PyTorch Sparse:
uv pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.7.0+${CUDA}.html
For information on installations of these additional libraries see pyg-lib and PyTorch Sparse.
To run our baseline static experimentation:
uv run tgrag/experiments/main.py
Alternatively, you can design you own configuration, updating the model paramaters:
uv run tgrag/experiments/main.py --config configs/your_config.yaml
To learn more about making a contribution to CrediGraph see our contribution guide