OpenForge: Probabilistic Metadata Integration

Environment Setup

We use Miniconda for Python package management on Linux machines. Due to the complexity of the dependencies, we separate the environments for training prior models and running inference over Markov Random Fields (MRFs).

Clone this repo in your working directory:

git clone <OpenForge repo url>

cd openforge

Create two independent environments, one for training prior models and the other for running MRF inference:
```
conda env create -f huggingface_env.yml
```
```
conda env create -f pgmax_gpu_env.yml
```
Heads-up: The pgmax-gpu environment depends on the GPU version of Jax, which requires Nvidia driver version to be >= 525.60.13 for CUDA 12 on Linux. You can also choose to create the CPU version of the pgmax environment for running MRF inference:
```
conda env create -f pgmax_cpu_env.yml
```

Import OpenForge as an editable package to both environments:

conda activate huggingface
conda develop <path to the OpenForge repository, e.g., /home/congtj/openforge>

conda activate pgmax-gpu
conda develop <path to the OpenForge repository, e.g., /home/congtj/openforge>

Datasets

We provide the following datasets for training prior models and running MRF inference. Each dataset is stored in a separate Google Drive folder, which contains raw data and preprocessed data that are ready for running MRF inference.

Quick Start

Run hyperparameter tuning and MRF inference to obtain posterior beliefs:

conda activate pgmax-gpu

cd openforge/mrf_inference

python pgmax_lbp_icpsr_hyper.py \
    --config_path=./tuning_exp_configs/icpsr/qwen2.5-7b-instruct-lora.ini \
    --mode=hp_tuning

Run MRF inference to obtain posterior beliefs (with the best found hyperparameters hard-coded in the program):

conda activate pgmax-gpu

cd openforge/mrf_inference

python pgmax_lbp_icpsr_hyper.py \
    --config_path=./tuning_exp_configs/icpsr/qwen2.5-7b-instruct-lora.ini \
    --mode=inference

Fine-tune a LLM with LoRA:

conda activate huggingface

cd openforge/llm_finetuning

python google_gemma_lora_icpsr.py \
    --config_path=./exp_configs/icpsr/qwen2.5-7b-instruct_lora.ini

Name		Name	Last commit message	Last commit date
Latest commit History 769 Commits
.vscode		.vscode
baselines		baselines
examples		examples
openforge		openforge
slurm_scripts		slurm_scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
huggingface_env.yml		huggingface_env.yml
pgmax_cpu_env.yml		pgmax_cpu_env.yml
pgmax_gpu_env.yml		pgmax_gpu_env.yml
pgmpy_env.yml		pgmpy_env.yml
pymc_env.yml		pymc_env.yml
run_hp_optimization_for_lbp.py		run_hp_optimization_for_lbp.py
run_hp_optimization_for_mplp.py		run_hp_optimization_for_mplp.py
sentence_transformers_env.yml		sentence_transformers_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenForge: Probabilistic Metadata Integration

Environment Setup

Datasets

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

superctj/openforge

Folders and files

Latest commit

History

Repository files navigation

OpenForge: Probabilistic Metadata Integration

Environment Setup

Datasets

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages