High‑fidelity, literature‑grounded detection, visualization and reporting of noncovalent interactions in protein (and protein–ligand) structures – with performance instrumentation, provenance, and extensibility built in.
MolBridge unifies a fast analysis core, a Streamlit web UI, a REST API, and export / provenance utilities into a single platform for structural biology & computational chemistry. It currently classifies 15+ interaction families (hydrogen, halogen, chalcogen, pnictogen, tetrel, π–π, cation–π, anion–π, CH–π, sulfur–π, dispersion, hydrophobic, salt bridges, n→π*, metal coordination) including hydrogen bond subtypes.
Key pillars:
- Performance: Vector geometry, KD‑tree pruning, adaptive thresholds, shared‑memory parallelism, optional Numba/Rust kernels
- Scientific Rigor: Literature‑anchored criteria with transparent parameter presets & exploration ranges
- Reproducibility: Provenance hashing, normalized records, golden regression datasets, metrics funnel (raw→candidate→accepted)
- Visualization: 3D overlays, Ramachandran plots, interaction networks, density & distribution charts
- Extensibility: Decorator‑based detector registry; modular docs for rapid onboarding
- Exports: PDF, PPTX, Excel, CSV, session snapshots (configuration + provenance)
Use Cases: interface profiling, ligand optimization, quality control of predicted vs experimental models, enrichment & network analyses, manuscript figure generation.
| Aspect | Snapshot |
|---|---|
| Interaction Families | 15 core + metal coordination + HB subtypes |
| Interfaces | Web UI • CLI • REST API • Batch processor |
| Performance Features | Vectorization • KD‑trees • Adaptive thresholds • Shared memory • Optional JIT/Rust |
| Reproducibility | Normalized records • Provenance hash • Golden baseline • Metrics funnel |
| Visualization | 3D overlays • Ramachandran • Network graph • Distributions |
| Exports | PDF • PPTX • Excel • CSV • Session snapshot |
| Deployment | Local • Docker • Streamlit Cloud • Heroku • Containers |
| Extensibility | Detector registry • Parameter metadata mapping |
Quick links: Quick Start · Run UI · API Usage · Scientific Criteria · Architecture · Parameter Glossary · Doc Index
| Mode | Command | Notes |
|---|---|---|
| Docker | docker build -t molbridge . && docker run -p 8501:8501 -p 8000:8000 molbridge |
UI at 8501, API at 8000 |
| Local (pip) | pip install -r requirements.txt && streamlit run server.py |
Dev & customization |
| API Only | uvicorn server:app --host 0.0.0.0 --port 8000 |
Headless integration |
Then open: http://localhost:8501 (UI) and optionally http://localhost:8000/docs (Swagger API docs).
Representative defaults (see full scientific docs for rationale & ranges):
| Class | Representative Geometry |
|---|---|
| Hydrogen Bond | Donor–Acceptor ≤3.5 Å; angle ≥120° |
| Halogen Bond | X···A ≤ ΣvdW+0.2; C–X···A ≥150° |
| π–π Stacking | Centroids ≤5.5 Å; interplanar ≤30° (parallel) |
| Cation–π | 3.0–6.0 Å; lateral offset ≤1.5 Å |
| Anion–π | Anion–centroid ≤5.0 Å; approach ≤30° to ring normal |
| CH–π | C···centroid 2.0–4.8 Å; angle ≥90° |
| Chalcogen / Pnictogen / Tetrel | Donor–A ≤ ΣvdW+≤0.5; axial alignment high |
| Salt Bridge | Opp. charge atoms ≤4.0 Å (centroid ≤6.0 Å) |
| Hydrophobic / Dispersion | C···C shell ~3.5–5.0 Å |
| Sulfur–π | S–centroid ≤6.0 Å; offset ≤3.0 Å |
| n→π* | O→C 2.9–3.6 Å; approach 95–140° |
| Metal Coordination | Primary shell ≤2.6 Å (ion specific) |
Full tables & edge cases: docs/scientific/criteria_breakdown.md.
export MOLBRIDGE_ENABLE_NORMALIZATION=1
export MOLBRIDGE_ENABLE_PROVENANCE=1Exports embed a provenance digest (structure signature + parameters + detector set + version tag) – cite this in manuscripts.
- Implement detector (geometry pruning + classification) under
src/analysis/. - Register via existing detector registry decorator.
- Add parameters & defaults; update scientific criteria docs.
- Add parity + performance tests.
- Submit PR with rationale & references.
Detailed guide: docs/architecture/detectors.md, docs/architecture/extensibility_guide.md.
Full Extended Reference (Original Detailed README Content)
A comprehensive, production-ready Streamlit application for analyzing noncovalent interactions in protein structures. MolBridge provides state-of-the-art detection algorithms for a broad and growing spectrum of molecular interactions (now 15 core interaction classes plus hydrogen bond subtype analytics), interactive 3D visualization, and extensive reporting capabilities.
| Document | Purpose |
|---|---|
| Technical Documentation | Architecture, modules, pipeline, flags, extension guidance |
| Scientific Documentation | Biophysical rationale, criteria, interpretation & caveats |
| Workflow Pipeline | End-to-end logic blocks, data flow & Miro diagram prompt |
| Optimization Backlog | Deferred performance / refactor tasks & prioritization |
These complement the user-facing guide and serve different audiences (research, engineering, operations).
Representative visuals (see docs/images/).
| Main Dashboard |
|---|
![]() |
| Chalcogen Overlay | Ramachandran Plot |
|---|---|
![]() |
![]() |
Mind Map (Architecture & Feature Taxonomy)
- Hydrogen Bonds: Conventional, low-barrier, and C5-type hydrogen bonds
- Halogen Bonds: Cl, Br, I, and F-mediated interactions with directional sigma holes
- π-π Stacking: Face-to-face and edge-to-face aromatic interactions
- Ionic Interactions: Distance-based charge pairing (general)
- Hydrophobic Contacts: Van der Waals interactions between nonpolar groups
- C-H···π Interactions: Weak hydrogen bonds to aromatic systems
- Chalcogen Bonds: S, Se, Te-mediated interactions
- Pnictogen Bonds: N, P, As-mediated interactions
- Tetrel Bonds: C, Si, Ge-mediated interactions
- Anion-π Interactions: Negative charges interacting with π-systems
- n→π Interactions*: Orbital overlap interactions
- London Dispersion: Heuristic detection of dispersion forces
- Cation–π Interactions: Aromatic ring stabilization by proximal cationic side chains (ARG, LYS, protonated HIS, metal centers)
- Salt Bridges: Explicit detection of polar side-chain ion pairs (ARG/LYS/HIS⁺ vs ASP/GLU⁻) with atom and centroid distance heuristics
- Sulfur–π Interactions: Soft sulfur (CYS SG, MET SD) contacts with aromatic ring faces
- Metal Coordination: Inner/outer sphere coordination (primary ≤2.6 Å, extended ≤3.0 Å) to common protein-bound metals (Zn, Fe, Mg, Ca, Mn, Cu, Ni, Co, Na, K)
Automated post-processing classifies each hydrogen bond into:
- Backbone–Backbone
- Backbone–Sidechain
- Sidechain–Sidechain
- Ligand (one partner non-canonical)
- Water-Mediated (HOH/WAT involvement)
Outputs counts, fractions, and an annotated list; available in the UI (Advanced Extensions) and exported (CSV + aggregate summaries). Toggle via enable_hbond_subtypes.
- Interactive 3D Viewer: py3Dmol-based structure visualization with interaction overlays
- Comprehensive Plots: Heatmaps, distribution plots, Ramachandran plots with hotspot highlighting
- Comparative Analysis: Side-by-side comparison of multiple structures
- Multi-Structure Ramachandran Overlay: Overlay φ/ψ distributions from additional analyzed structures
- Hotspot Detection: Identification of interaction-rich regions
- Export Capabilities: PyMOL and ChimeraX session generation
- Interaction Network Graph: Force-directed network with edge filtering, collapsing and label toggles
- Command Palette (Ctrl/Cmd+K): Fuzzy command search for presets, profiles, layout ops
- Scenario Profiles: YAML-defined templates (electrostatic focus, interface hotspots, metal site context, etc.)
- Layout Snapshots: Save & restore interaction selections and strength filters
- Persistent Progress Log: Streaming per-detector events + live status pill
- Diff & Comparison Utilities: Coordinate-aware hashing, filtering, CSV export
- Virtualized Interaction Table: AgGrid fallback for large datasets
- Compact (Small Screen) Mode: Manual toggle for space-constrained devices
- Dark Theme Default: Light mode removed for streamlined styling; high contrast & classic Ramachandran color toggles retained
- Intelligent Caching: Persistent storage with 7-day expiration
- Session Management: Save, load, and compare analysis sessions
- Batch Processing: Parallel analysis of multiple structures
- Quality Control: Structure validation and completeness assessment
- PDF Reports: Comprehensive analysis reports with figures and tables
- Excel Workbooks: Multi-sheet data export with summary statistics
- PowerPoint Presentations: Publication-ready slide decks
- CSV Data: Raw interaction data for further analysis
- LaTeX Export: Publication-ready snippets for manuscripts
- Programmatic Access: Full REST API with FastAPI backend
- Background Processing: Asynchronous job execution with progress tracking
- Multiple Formats: JSON, CSV, PDF, Excel output formats
- File Upload: Custom structure analysis support
streamlit>=1.28.0
biopython>=1.83
biopandas>=0.4.0
numpy>=1.24.0
scipy>=1.12.0
pandas>=2.2.0
plotly>=5.18.0
py3dmol>=2.0.4
matplotlib>=3.8.0
seaborn>=0.13.0
fastapi>=0.104.0
uvicorn>=0.24.0
reportlab>=4.0.0
openpyxl>=3.1.0
python-pptx>=0.6.23
diskcache>=5.6.0
scikit-learn>=1.3.0
networkx>=3.2
git clone https://github.com/yourusername/protein-interaction-explorer.git
cd protein-interaction-explorer
docker build -t protein-explorer .
docker run -p 8501:8501 -p 8000:8000 protein-explorergit clone https://github.com/yourusername/protein-interaction-explorer.git
cd protein-interaction-explorer
pip install -r requirements.txtstreamlit run server.pyAccess the application at http://localhost:8501
API documentation available at `http://localhost:8000/docs`
from src.utils.config import AppConfig
-
Input Structures
- Enter PDB IDs (e.g., "1LYZ, 2LYZ, 3LYZ")
- Upload custom PDB/CIF files
- Paste structure data directly
-
Configure Analysis
- Choose interaction types to analyze
- Select parameter preset (Conservative/Literature Default/Exploratory)
- Customize individual parameters if needed
-
Run Analysis
- Click "Start Analysis" to begin processing
- Monitor progress in real-time
- Review results as they complete
-
Explore Results
- Structure Tab: Interactive 3D visualization with interaction overlays
- Analysis Tab: Detailed interaction tables and statistics
- Visualization Tab: Heatmaps, distribution plots, and comparative charts
- Reports Tab: Generate and download comprehensive reports
- Stricter distance and angle cutoffs
- Higher confidence interactions only
- Suitable for high-confidence structural biology
- Standard literature-based parameters
- Balanced sensitivity and specificity
- Most commonly used in publications
- Relaxed cutoffs for comprehensive screening
- Higher sensitivity, lower specificity
- Useful for discovering weak interactions
session_manager.save_session("my_analysis", { "pdb_ids": ["1LYZ"], session_data = session_manager.load_session("my_analysis")
config.interaction_config.hbond_angle_cutoff = 25.0
# Set processing options
config.processing_config.max_workers = 8
config.processing_config.use_cache = True
# Process multiple structures with custom parameters
results = batch_processor.process_batch(
pdb_ids=["1LYZ", "2LYZ", "3LYZ"],
interaction_types=["hydrogen_bond", "pi_pi", "ionic"],
use_cache=True,
include_metadata=True
)All interaction detection algorithms implement literature-validated geometric criteria:
- Distance cutoffs based on sum of van der Waals radii plus tolerance
- Angle constraints for directional interactions (hydrogen bonds, halogen bonds)
- Planarity requirements for π-π stacking interactions
- Charge-distance relationships for ionic interactions
- Structure completeness assessment
- Missing atom detection and handling
- Coordinate validation and outlier detection
- Chain break identification
- Alternative conformer handling
- KD-tree spatial indexing for efficient neighbor searches
- Multiprocessing for parallel structure analysis
- Intelligent caching to avoid redundant calculations
- Memory-efficient data structures for large complexes
- Vector Geometry Fast-Paths (opt-in) for hydrogen bonds & π-π stacking via
MOLBRIDGE_ENABLE_VECTOR_GEOM=1 - Structure Indexing Layer (Aromatic Ring Index) powering vector π-π detection when enabled
- Adaptive KD-tree Threshold Tuning automatically relaxes/tightens distance cutoffs to balance candidate counts & speed (
utils.kdtree_thresholds) - Shared Memory Participant Distribution for coordinates, ring centroids/normals, charged/acidic centroids, and H-bond donors/acceptors (process pool path)
- Process Pool High-Performance Mode (opt-in) with fallback to threads on failure
- Task Graph Precompute Stage extracts reusable features once per structure (rings, centroids, donors/acceptors) then fans out to detectors
- Optional Numba/Rust Kernels accelerate pairwise distance and geometry primitives when enabled
- Structured JSON Metrics Stream for per-detector timing + adaptive threshold instrumentation (consumable by
cli_metrics) - Golden Dataset Regression Framework with curated PDB set & CI gating to prevent interaction count / performance regressions
The original optimization roadmap is now fully implemented (historical draft superseded). A live backlog of future improvements is now maintained in docs/OPTIMIZATION_BACKLOG.md. Key pillars:
- Detector Architecture & Registry
- Unified registration decorators; stable key normalization & backward compatibility tests.
- Geometry Acceleration
- Vectorized distance/angle paths (π-π, cation–π, hydrophobic, hydrogen bond angular batching, chalcogen/pnictogen/tetrel multi-angle handling).
- Spatial Pruning Layer
- Centralized KD-tree helper + adaptive runtime threshold feedback loop recorded per detector.
- FeatureStore & Caching
- Ephemeral in-process feature bundles (coords, aromatic rings, charged & acidic centroids, H-bond donors/acceptors) to eliminate recomputation.
- Parallel & Shared Memory Execution
- Hybrid thread/process strategy; opt-in process pool uses POSIX shared memory blocks for large immutable arrays.
- Observability & Metrics
- Streaming JSON logs:
detector_complete,batch_complete, and batch metrics channel; CLI summarization with per-detector mean & p95.
- Streaming JSON logs:
- Regression Protection
- Golden snapshot builder & comparator; CI artifact reuse + strict gating on interaction count & timing drift.
- Acceleration Backends
- Optional Numba kernel path (
MOLBRIDGE_USE_NUMBA=1) and PyO3 Rust prototype (rust/+ maturin) with graceful fallback.
- Optional Numba kernel path (
- Distributed Execution (Experimental)
- Scaffold for Ray / Dask powered multi-node execution (extensible from the unified processor facade).
MolBridge now ships with selectable performance profiles plus a heuristic auto-tuner:
| Profile | Intent | Enabled Features | When to Use |
|---|---|---|---|
| auto (default) | Smart heuristic per structure | Vector geometry, task graph always; process pool + SHM & Numba only on large workloads | General usage / mixed batch sizes |
| full | Max acceleration | Vector geometry, task graph, process pool, SHM, Numba (or Rust), adaptive thresholds | Large multi-structure batch on a beefy machine |
| minimal | Low overhead | Vector geometry + task graph only | Small single-structure interactive runs or CI |
| manual | User control | Whatever you toggle in Settings | Benchmarking / A/B comparisons |
Set via UI Settings tab (⚙️) or environment variable MOLBRIDGE_PERF_PROFILE.
The auto mode inspects atom count, detector count, and estimated work (atoms × detectors):
- Enables process pool + shared memory if structure is large ( > ~12k atoms & ≥6 detectors) or overall estimated work is very large.
- Enables Numba only if Rust extension is unavailable and workload passes a moderate threshold.
- Always enables vector geometry + task graph (negligible overhead, consistent speedup).
- Respects any explicit user/exported env overrides (treats them as manual for that variable).
Instrumentation of the applied decision is emitted as a JSON log event auto_perf_profile with fields profile, mode, n_atoms, n_detectors, and which toggles were activated.
In manual profile you can explicitly toggle:
- Vector Geometry Fast Paths
- Process Pool Execution
- Shared Memory Transport
- Numba JIT Acceleration
- Task Graph Precompute
- Prefer Rust Geometry Extension
- Emit Adaptive Threshold Instrumentation
Changing toggles writes the corresponding environment variables and requires a re-run / new analysis batch to take effect.
| Component | CLI / Module | Purpose |
|---|---|---|
| Golden Snapshot Builder | cli_golden snapshot |
Produce canonical JSON baseline for curated PDB list |
| Golden Comparator | cli_golden compare |
Detect interaction count or timing drift vs baseline |
| Performance Gate Script | scripts/perf_regression_check.py |
Enforced in CI (non-zero exit on regression) |
| Metrics Aggregator | cli_metrics summarize |
Mean & p95 batch + per-detector timing & adaptive stats |
| CSV Export | cli_metrics export-csv |
Flat event log for external BI tools |
| Microbenchmark Harness | python -m performance.microbench |
Rapid p95 timing + acceptance + funnel + memory delta scan (synthetic or real structures) |
| Microbench Regression Gate | python -m performance.bench_regression_gate |
Compares microbench JSON vs baseline (p95 slowdown, acceptance anomalies) |
| Microbench Baseline Refresh | python scripts/update_microbench_baseline.py |
Intentionally regenerate committed baseline with metadata (seed, repeat, timestamp) |
- Golden counts/timing check (curated real PDBs):
python scripts/perf_regression_check.py --baseline golden_baseline --pdb 1CRN 4HHB 2VXN --time-tolerance 0.35
```bash
python -m performance.microbench > bench_run.json
# Optional regression gate vs existing baseline
python -m performance.bench_regression_gate bench_run.json \
--baseline microbench_baseline.json \
--max-p95-growth 1.30 --strict-accept
# Refresh (intentional) baseline after an accepted optimization
python scripts/update_microbench_baseline.py --seed 42 --repeat 2Environment variables:
| Variable | Purpose |
|---|---|
MOLBRIDGE_BENCH_PDB_SMALL |
Path to a real small PDB (fallback: synthetic) |
MOLBRIDGE_BENCH_PDB_MEDIUM |
Path to a medium structure |
MOLBRIDGE_BENCH_PDB_LARGE |
Path to a large structure |
MOLBRIDGE_BENCH_REPEAT |
Repetitions per size tier (default 2) |
MOLBRIDGE_BENCH_SEED |
Deterministic seed for synthetic coordinate generation |
MOLBRIDGE_OTEL_EXPORT |
If 1, emit a lightweight OpenTelemetry stub event (placeholder for future exporter) |
| python -m cli_golden snapshot --output golden_baseline.json --list-file docs/curated_golden_list.txt |
python -m cli_golden compare --baseline golden_baseline.json --list-file docs/curated_golden_list.txt
### Metrics Aggregation
During runs (with metrics emission enabled) a `metrics.jsonl` file accumulates structured events:
```bash
python -m cli_metrics summarize --file metrics.jsonl --out metrics_summary.json
python -m cli_metrics export-csv --file metrics.jsonl --out metrics_summary.csv
Summary includes:
batch_duration_mean,batch_duration_p95detector_timing_mean+detector_timing_p95- Adaptive threshold rollups: original vs adapted means & p95
- Funnel metrics:
raw_pairs,candidate_pairs,accepted_pairs,acceptance_ratio,candidate_density
The lightweight microbenchmark harness exercises a representative subset of detectors across small/medium/large synthetic (or provided) structures:
python -m performance.microbench > bench_run.json
# Optionally compare against a saved baseline
python -m performance.bench_regression_gate bench_run.json --baseline previous_bench.json --max-p95-growth 1.30 --strict-acceptEnvironment variables:
| Variable | Purpose |
|---|---|
MOLBRIDGE_BENCH_PDB_SMALL |
Path to a real small PDB (fallback: synthetic) |
MOLBRIDGE_BENCH_PDB_MEDIUM |
Path to a medium structure |
MOLBRIDGE_BENCH_PDB_LARGE |
Path to a large structure |
MOLBRIDGE_BENCH_REPEAT |
Repetitions per size tier (default 2) |
Gate script flags:
| Flag | Meaning |
|---|---|
--max-p95-growth |
Allowed slowdown factor vs baseline p95 |
--min-accept-ratio |
Warn/fail (with --strict-accept) if mean acceptance ratio below threshold |
--strict-accept |
Escalate acceptance warnings to failures |
The microbench JSON embeds per-detector warnings (high_accept_ratio, near_zero_accept_ratio) that surface in the gate output.
See also Quick Performance Validation for a condensed workflow.
Adaptive KD-tree thresholds now optionally persist across runs when you set:
export MOLBRIDGE_ADAPTIVE_CACHE_PATH=.cache/adaptive_thresholds.jsonOn each adaptation, the new thresholds are flushed to this JSON file. Subsequent executions load them automatically, ensuring stability in iterative benchmark sessions and CI jobs.
Each detector's instrumentation exports:
| Field | Description |
|---|---|
original_threshold |
Threshold when detector started |
adaptive_threshold |
Possibly updated threshold after feedback |
threshold_changed |
Boolean flag |
adapt_reason |
One of raise_low_util, lower_high_miss, density_adjust_up/down, unchanged |
last_density |
Observed candidate density used for density-based adjustments |
kdtree_used |
Heuristic utilization flag (pruning effectiveness) |
raw_pairs |
Pre-filter (broad distance) raw pair count |
candidate_pairs |
Post KD-tree / geometry pruning but pre-final acceptance count |
accepted_pairs |
Final interaction count after all geometry/class tests |
acceptance_ratio |
accepted_pairs / candidate_pairs (0 if denom=0) |
Heuristic pruning detection leverages a shared function (should_flag_kdtree) comparing candidate/raw ratios against a density target band; a dedicated test (test_should_flag_kdtree.py) guards its boundary behavior.
Each detector optionally records: original_threshold, adapted_threshold, candidate counts, and adaptation decision flags. These surface in metrics summaries for longitudinal tuning.
| Technique | Enable | Notes |
|---|---|---|
| Vector Geometry | MOLBRIDGE_ENABLE_VECTOR_GEOM=1 |
Fast batched distance/angle math (NumPy) |
| Numba Kernel | MOLBRIDGE_USE_NUMBA=1 |
JIT fallback for distance ops when Rust off |
| Rust Geometry Prototype | Install dev-rust extra + maturin develop -m rust/Cargo.toml |
Provides PyO3 pairwise_sq_dists |
| Process Pool + SHM | MOLBRIDGE_USE_PROCESS_POOL=1 MOLBRIDGE_USE_SHM=1 |
Shared memory for coords/rings/centroids/HB donors/acceptors |
| Task Graph Precompute | MOLBRIDGE_TASK_GRAPH=1 (default) |
Early feature extraction reused across detectors |
| Auto System Tuning | MOLBRIDGE_AUTO_TUNE=1 (default) |
Applies GC + thread env heuristics on import |
Fallback order for geometry kernels: Rust > Numba > Pure Python (NumPy). All feature flags are fail-safe (silent fallback on error; logged at debug level).
The unified processor facade includes a scaffold for Ray/Dask extension. To experiment:
- Install your preferred framework (e.g.,
pip install ray[default]). - Wrap structure batches through the
DistributedProcessor(see forthcoming docs) which mirrors the HighPerformanceProcessor interface. - Disable process pool locally (
MOLBRIDGE_USE_PROCESS_POOL=0) to avoid contention and let the distributed backend manage resources.
This layer remains opt-in and is isolated so local single-node performance stays lean.
The GitHub Actions workflow now additionally enforces software supply & container integrity:
- Generates or downloads a golden baseline artifact.
- Runs the regression check script; any interaction count deviation beyond 5% or timing slowdown beyond configured tolerance (currently 35%) fails the job.
- Executes microbenchmark + regression gate (p95 + funnel sanity) against committed
microbench_baseline.json. - Performs a security scan (
pip-audit) for published CVEs (fails on vulnerable pinned versions). - Performs a Docker build smoke test to ensure container image viability.
- Uploads metrics summaries for offline inspection.
To adjust gating thresholds edit scripts/perf_regression_check.py parameters or workflow step arguments.
Grouped by function.
| Variable | Purpose | Default |
|---|---|---|
MOLBRIDGE_PERF_PROFILE |
Performance profile (auto/full/minimal/manual) | auto |
MOLBRIDGE_MAX_WORKERS |
Override worker/process count | heuristic |
MOLBRIDGE_USE_PROCESS_POOL |
Enable process pool execution path | 0 |
MOLBRIDGE_USE_SHM |
Use POSIX shared memory for large feature arrays | 0 |
MOLBRIDGE_TASK_GRAPH |
Enable feature precompute task graph | 1 |
MOLBRIDGE_ENABLE_VECTOR_GEOM |
Vector geometry fast paths | 0 |
MOLBRIDGE_USE_NUMBA |
Enable Numba kernels | 0 |
MOLBRIDGE_USE_RUST |
Prefer Rust extension if available | 1 (auto) |
MOLBRIDGE_AUTO_TUNE |
Apply system heuristics at import | 1 |
MOLBRIDGE_ADAPTIVE_CACHE_PATH |
Persist adaptive KD-tree thresholds | unset |
MOLBRIDGE_ENABLE_NORMALIZATION |
Emit normalized interaction records | 0 |
MOLBRIDGE_ENABLE_PROVENANCE |
Include provenance hashing metadata | 0 |
MOLBRIDGE_BENCH_SEED |
Seed for microbenchmark synthetic coords | unset |
MOLBRIDGE_OTEL_EXPORT |
Emit OpenTelemetry stub line (experimental) | 0 |
Removed duplicates; this table is now the single source of truth (see performance engine doc for deeper rationale).
pip install .[dev-rust]
maturin develop -m rust/Cargo.toml # builds in-place (debug)
# or
maturin build --release -m rust/Cargo.tomlSet MOLBRIDGE_USE_RUST=0 to force fallback for A/B comparisons.
MOLBRIDGE_ENABLE_VECTOR_GEOM=1 MOLBRIDGE_USE_PROCESS_POOL=1 MOLBRIDGE_USE_SHM=1 \
python -m cli_golden compare --baseline golden_baseline.json --list-file docs/curated_golden_list.txt
python -m cli_metrics summarize --file metrics.jsonl --out metrics_summary.jsonInspect metrics_summary.json for detector *_p95 outliers; adapt thresholds or investigate regressions as needed.
Structure: 1LYZ (Lysozyme)
Total hydrogen bonds: 89
Average distance: 2.85 ± 0.23 Å
Average angle: 162.3 ± 12.7°
Strength distribution:
Strong: 45 bonds (50.6%)
Moderate: 32 bonds (36.0%)
Weak: 12 bonds (13.5%)
Top interaction hotspots:
- ASP52A - 12 interactions (4 HB, 3 ionic, 5 others)
- GLU35A - 9 interactions (3 HB, 2 ionic, 4 others)
- TRP62A - 8 interactions (2 HB, 4 π-π, 2 others)
## 🌐 REST API Usage
### Start Analysis Job
```bash
curl -X POST "http://localhost:8000/analyze"
-H "Content-Type: application/json"
-d '{
"pdb_ids": ["1LYZ", "2LYZ"],
"interaction_types": ["hydrogen_bond", "pi_pi"],
"config_preset": "literature_default"
}'
curl "http://localhost:8000/jobs/{job_id}"curl "http://localhost:8000/results/{job_id}" > results.jsoncurl -X POST "http://localhost:8000/reports"
-H "Content-Type: application/json"
-d '{
"pdb_ids": ["1LYZ"],
"format": "pdf",
"include_methodology": true
}'
--output report.pdfpython -m pytest tests/ -vpip install -r requirements-dev.txt
pre-commit install- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MolBridge now uses a single authoritative requirements.txt for all runtime dependencies. Previous duplicates (requirements-prod.txt) were removed to eliminate drift.
Guidelines:
- Add new runtime libraries only to
requirements.txt. - Mark optional/large or rarely used packages with an inline
# OPTIONALcomment. - For strictly development-only tools (linters, formatters), prefer a separate (uncommitted) local dev file or use editable installs; we intentionally keep the public surface minimal.
- To produce a frozen snapshot for reproducible deployment:
pip install -r requirements.txt pip freeze --exclude-editable > requirements.lock pyproject.tomlretains minimal metadata and a subset of dependencies for packaging; the canonical list lives inrequirements.txtto align with Streamlit/Heroku style deployments.
Removed during consolidation:
requirements-prod.txt(was redundant; differences were only commented optional packages)- Duplicate virtual environment directory
venv/(keep.venv/only). Ifvenv/still exists locally you can safely delete it:rm -rf venv
Obsolete or experimental artifacts were removed to reduce noise:
chalcogen_debug.log(large runtime log; sampled data retained underdocs/logs/)dropdown_fix.css(superseded by modular styles inapp/assets/styles/)temp_viewer_method.py,simple_viewer_test.py(prototype scripts)test_pdb_data.py,test_single_interaction.py(replaced by structured tests undersrc/tests/)=5.9.5(stray editor/version artifact)
If you still have these locally, remove them or reset your working copy to match main.
If you do not need AI or LaTeX features, comment out the corresponding lines in requirements.txt before deployment for faster install.
Recent optimization work introduced configurable layers to balance speed, memory, and observability. Below is a concise guide to the new levers you can tune in different deployment scenarios (local exploration vs. batch server vs. constrained cloud).
| Feature | Module | Purpose | Default |
|---|---|---|---|
| In-memory LRU Hot Cache | utils.cache.CacheManager |
Avoid disk (de)serialization for most recently accessed analysis / metadata objects | 32 entries (set via env) |
| Batch Write Context | CacheManager.batch() |
Coalesce multiple cache writes to reduce filesystem & metadata overhead | Off (opt-in) |
| Lazy Decompression | CacheManager.get_* |
Only decompress & unpickle when accessed; preserves I/O for cold entries | Enabled automatically |
| Streaming Manifest Export | CacheManager.export_cache_manifest |
Low-cost inspection of cache contents with heuristic sizing & filtering | Manual call |
| Timing Utilities | performance.timing |
Lightweight decorator/context manager to aggregate wall time & throughput metrics | Enabled (can disable) |
| Variable | Effect | Example |
|---|---|---|
MOLBRIDGE_CACHE_HOT_SIZE |
Overrides hot LRU cache capacity (0 disables) | MOLBRIDGE_CACHE_HOT_SIZE=64 |
MOLBRIDGE_MANIFEST_LIMIT |
Caps number of entries returned by manifest export | MOLBRIDGE_MANIFEST_LIMIT=500 |
MOLBRIDGE_ENABLE_TIMING |
Disable (0) or enable (1) global timing instrumentation | MOLBRIDGE_ENABLE_TIMING=0 |
MOLBRIDGE_MAX_WORKERS |
Force worker thread count in high-performance processors | MOLBRIDGE_MAX_WORKERS=12 |
MOLBRIDGE_ENABLE_VECTOR_GEOM |
Enable vectorized geometry paths (H-bonds, π-π) | MOLBRIDGE_ENABLE_VECTOR_GEOM=1 |
MOLBRIDGE_ENABLE_NORMALIZATION |
Emit unified normalized interaction records | MOLBRIDGE_ENABLE_NORMALIZATION=1 |
MOLBRIDGE_ENABLE_PROVENANCE |
Include provenance hashing/metadata in outputs | MOLBRIDGE_ENABLE_PROVENANCE=1 |
MOLBRIDGE_ENABLE_UNIFIED_PROCESSOR |
Route through unified batch processor prototype | MOLBRIDGE_ENABLE_UNIFIED_PROCESSOR=1 |
from utils.cache import CacheManager
cm = CacheManager(cache_dir=Path('cache'), hot_cache_size=64) # or via env MOLBRIDGE_CACHE_HOT_SIZE
result = cm.get_or_compute_analysis(pdb_id, 'hydrogenbond', params, compute_fn)The first access populates disk + hot cache; subsequent lookups stay in-memory until evicted (LRU discipline).
with cm.batch():
for job in jobs:
cm.cache_analysis_result(job.pdb, job.type, job.params, job.result)Writes are deferred until the context exits, reducing disk metadata churn especially on networked or slow storage.
manifest = cm.export_cache_manifest(limit=100, pattern='analysis_')
for entry in manifest['entries']:
print(entry['key'], entry['size_bytes'])Heuristic sizing avoids full pickling for large objects and supports prefix filtering.
from performance.timing import time_function, time_block, TIMINGS
@time_function(items_attr='count')
def detect(example):
# ... run detection producing object with .count
return example
with time_block('post_processing'):
refine(result)
print(TIMINGS.snapshot())Disable globally with MOLBRIDGE_ENABLE_TIMING=0 (e.g., minimal overhead CI runs).
| Scenario | Suggested Settings |
|---|---|
| Local exploratory analysis | MOLBRIDGE_CACHE_HOT_SIZE=64, leave timing on |
| High-throughput batch server | MOLBRIDGE_CACHE_HOT_SIZE=128, MOLBRIDGE_MANIFEST_LIMIT=300, enable batch writes |
| Memory constrained (small container) | MOLBRIDGE_CACHE_HOT_SIZE=8, MOLBRIDGE_ENABLE_TIMING=0 |
| Debug / observability focus | Increase hot size, keep timing on, export manifest periodically |
Planned (not yet implemented) enhancements include: adaptive eviction based on object size, async compression pipeline, broadened vectorization (cation–π, CH–π, hydrophobic), cache key schema versioning, and structured metrics export (OpenTelemetry-compatible).
Recently completed (since v1.0.1 docs initial draft):
- Unified funnel metrics across all detectors (raw→candidate→accepted stages)
- Adaptive threshold extension to chalcogen & tetrel interactions
- Microbenchmark harness + regression gate CLI for rapid performance drift detection
- Persistent adaptive threshold cache (opt-in)
- Heuristic pruning utilization unification (
should_flag_kdtree) - Dependency lock file (
requirements.lock) for reproducible installations
Set MOLBRIDGE_ENABLE_VECTOR_GEOM=1 to activate:
- Hydrogen bond vector candidate filtering (distance matrix prefilter vs nested loops)
- π-π stacking vector detection using a pre-built
AromaticRingIndex(centers & normals in matrices)
If disabled (default), legacy logic is used verbatim for full parity. A parity test (test_pipi_vector_parity.py) ensures consistency between paths for representative aromatic stacking geometry.
To enable unified normalized records and provenance hashing outside of performance profiles (manual control):
export MOLBRIDGE_ENABLE_NORMALIZATION=1
export MOLBRIDGE_ENABLE_PROVENANCE=1These currently lack explicit UI checkboxes (pending minor UI enhancement); toggle via environment variables before launching the app.
In performance mode non-critical detector info logs are suppressed. Force them on:
export MOLBRIDGE_VERBOSE_DETECTOR_LOGS=1Use sparingly—adds overhead in large batches.
- Jeffrey, G.A. (1997). "An Introduction to Hydrogen Bonding"
- Desiraju, G.R. et al. (2013). "Definition of the halogen bond" Pure Appl. Chem. 85, 1711-1713
- McGaughey, G.B. et al. (1998). "π-Stacking interactions" J. Biol. Chem. 273, 15458-15463
- Kumar, S. & Nussinov, R. (2002). "Close-range electrostatic interactions in proteins" ChemBioChem 3, 604-617
- Hydrogen bonds: Baker & Hubbard (1984) criteria with modern extensions
- Halogen bonds: Auffinger et al. (2004) geometric definitions
- π-π stacking: Janiak (2000) classification system
- Ionic interactions: Barlow & Thornton (1983) distance criteria
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: User Guide | API Reference
- Examples: Jupyter Notebooks | Example Scripts
- Issues: GitHub Issues
- Discussions: GitHub Discussions
If you use MolBridge in your research, please cite:
@software{molbridge,
title={MolBridge: NonCovalent Atlas for Comprehensive Analysis of Noncovalent Interactions in Protein Structures},
author={Your Name},
year={2024},
url={https://github.com/yourusername/protein-interaction-explorer}
}The detailed version history has moved to CHANGELOG.md for clarity and maintainability. Only a high-level pointer remains here.
- Full changelog: CHANGELOG.md
- Roadmap & scoring:
docs/architecture/roadmap_and_decisions.md - Unified bibliography (BibTeX):
docs/references/bibliography.bib
Planned / in-progress features are tracked in the roadmap document rather than inline here.
Your application now includes built-in deployment options! Access the Settings tab in your Streamlit app for deployment guides and configuration.
- Push to GitHub: Make sure your repository is public
- Deploy: Go to share.streamlit.io
- Connect: Link your GitHub account and select this repository
- Configure: Set main file path to
server.py - Launch: Click "Deploy" - your app will be live instantly!
Your app URL: https://your-app-name.streamlit.app
Note: The project now uses a single authoritative requirements.txt. (Previous split requirements-prod.txt was removed to prevent drift.) Streamlit Cloud and other platforms will install directly from it; comment out optional heavy deps if you need a slimmer build.
-
Install Heroku CLI:
curl https://cli-assets.heroku.com/install.sh | sh -
Deploy:
heroku create your-protein-app git push heroku main
-
Access: Your app will be at
https://your-protein-app.herokuapp.com
-
Build the container:
docker build -t protein-interaction-app . -
Run locally:
docker run -p 8501:8501 protein-interaction-app
-
Deploy to cloud platforms:
- AWS: Use ECS or Elastic Beanstalk
- Google Cloud: Use Cloud Run
- Azure: Use Container Instances
- DigitalOcean: Use App Platform
The former helper script (deploy.sh) was removed in favor of simpler, explicit commands. Use one of the platform sections above or:
# Local development
streamlit run server.py
# Docker build & run
docker build -t molbridge .
docker run -p 8501:8501 molbridgeConfigure these for production deployment:
# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true
STREAMLIT_BROWSER_GATHER_USAGE_STATS=false
# Performance Tuning
MAX_WORKERS=4
MEMORY_LIMIT_GB=8| Item | Status |
|---|---|
Main entrypoint server.py |
Required |
Single requirements.txt present |
Required |
App runs with streamlit run server.py |
Verify locally |
| No hard‑coded local file paths | Verify |
| Secrets moved to environment variables | Required |
.streamlit/config.toml tuned (theme, server) |
Optional |
| Dockerfile builds successfully | Optional |
| Symptom | Likely Cause | Action |
|---|---|---|
| App won't start | Missing dependency / version conflict | Rebuild env; ensure pip install -r requirements.txt succeeds |
| Memory usage high | Too many workers / large batch | Set MOLBRIDGE_MAX_WORKERS lower (e.g. 4) |
| Slow first run | Cold cache / network PDB fetch | Enable caching; re-run to warm |
| Timeout on platform | Default worker timeout too low | Increase platform timeout / reduce batch size |
| File upload rejected | Size exceeds Streamlit limit | Adjust server.maxUploadSize in .streamlit/config.toml |
Built with ❤️ for the structural biology community
The application exposes every geometric / classification threshold for transparent, reproducible tuning. Ranges below reflect typical literature-supported exploration windows (values outside may yield noisy or sparse detections).
| Parameter | Description | Typical Range | Default (Literature) | Notes |
|---|---|---|---|---|
hbond_distance_cutoff |
Donor heavy atom – acceptor heavy atom max distance | 2.7–3.8 Å | 3.5 Å | ≤3.2 Å = strong, >3.5 Å may include weak/long bonds |
hbond_angle_cutoff |
Minimum D–H···A angle | 110–180° | 120° | Raise (≥140°) for highly linear selection |
halogen_distance_cutoff |
X···A (X = Cl/Br/I/F) distance | 3.2–4.5 Å | 4.0 Å | I > Br > Cl > F typical strength trend |
halogen_angle_cutoff |
C–X···A angle (σ-hole alignment) | 140–180° | 140° | Increase for stricter directionality |
chalcogen_distance_cutoff |
Chalcogen (S/Se/Te) to acceptor distance | 3.2–4.5 Å | 4.0 Å | Comparable to halogen geometrically |
chalcogen_theta_min / chalcogen_theta_max |
θ window: centroid→S vs S→acceptor alignment | 110–160° (window) | 115–155° | Narrow window increases directionality stringency |
chalcogen_phi_max |
φ | (δ) out-of-plane deviation limit | 30–60° | |
pnictogen_distance_cutoff |
Pnictogen (N/P/As) to acceptor distance | 3.1–4.3 Å | 4.0 Å | Less common; exploratory |
pnictogen_angle_cutoff |
σ-hole alignment angle | 140–180° | 140° | Similar logic to halogen/chalcogen |
tetrel_distance_cutoff |
Tetrel center to acceptor distance | 3.0–4.2 Å | 3.6 Å | Carbon-based tetrel weaker; Si/Ge rarely present |
tetrel_angle_cutoff |
σ-hole direction angle | 150–180° | 160° | |
pi_pi_distance_cutoff |
Aromatic centroid–centroid distance | 4.0–6.5 Å | 5.5 Å | Face-to-face often 3.8–4.1 Å |
pi_pi_angle_cutoff |
Planar angle deviation | 0–50° | 30° | Edge / T-shaped if larger |
ch_pi_min_distance |
Min C···centroid boundary | 1.8–2.3 Å | 2.0 Å | Prevents steric clashes |
ch_pi_max_distance |
Max C···centroid distance | 4.2–5.5 Å | 4.8 Å | Larger increases false positives |
ch_pi_angle_cutoff |
Max C–H···centroid angle | 80–120° | 90° | Lower favors more linear approach |
ch_pi_max_height |
Max perpendicular offset | 1.0–3.5 Å | 2.5 Å | |
anion_pi_distance_cutoff |
Anion to aromatic centroid distance | 3.0–5.5 Å | 5.0 Å | Tight (<4.2 Å) enriches for strong orientation |
n_pi_star_distance_cutoff |
Carbonyl O → adjacent carbonyl C distance | 2.9–3.6 Å | 3.5 Å | Shorter suggests stronger orbital overlap |
n_pi_star_angle_cutoff |
n→π* approach angle | 95–140° | 120° | |
dispersion_min_distance |
Lower vdW window | 3.0–3.6 Å | 3.5 Å | Pair with max window |
dispersion_distance_cutoff |
Upper vdW window | 4.5–5.8 Å | 5.0 Å | |
ionic_distance_cutoff |
Charged group centroid separation | 4.0–7.5 Å | 6.0 Å | <4.5 Å often part of salt bridge |
hydrophobic_distance_cutoff |
Hydrophobic C···C cutoff | 3.8–6.5 Å | 5.0 Å | |
cation_pi_distance_cutoff |
Cation side-chain centroid to ring centroid | 4.0–7.5 Å | 6.0 Å | Strong stabilization <5.5 Å |
salt_bridge_distance_cutoff |
Closest oppositely charged atom–atom distance | 2.5–4.5 Å | 4.0 Å | <3.2 Å often very strong pairs |
salt_bridge_centroid_cutoff |
Charged group centroid distance | 3.5–7.0 Å | 6.0 Å | Provides context for diffuse pairs |
sulfur_pi_distance_cutoff |
Sulfur (CYS/MET) to ring centroid | 4.0–7.0 Å | 6.0 Å | <5.0 Å stronger polarization |
sulfur_pi_max_perp |
Max perpendicular offset | 1.0–4.0 Å | 3.0 Å | |
metal_coordination_primary_cutoff |
Inner coordination shell | 1.9–2.6 Å | 2.4 Å | Depends on ion identity |
metal_coordination_extended_cutoff |
Secondary shell | 2.5–3.4 Å | 3.0 Å | Captures water / secondary ligands |
| Parameter | Meaning | Defaults |
|---|---|---|
strong_threshold |
Normalized internal score above which an interaction is labeled strong | 0.85 |
moderate_threshold |
Boundary between moderate and weak | 0.65 |
- Start with Literature preset → inspect counts → tighten distances if dense, relax if sparse.
- Always re-run analysis after modifying sliders (green re-analysis banner appears automatically when parameters change).
- Export a snapshot (Analysis tab → Session Snapshot) before extensive tuning to preserve reproducibility.
The Analysis tab now includes a Performance panel when timing metadata is available.
- Time (s): Wall-clock seconds spent in the detector logic (post-parallel aggregation)
- Count: Number of interactions returned by detector
- Rate (int/s): Throughput for that interaction type
- Session Aggregate: Sum across all analyzed structures with percent contribution to total detector time
- High Time, Low Count: Potential optimization hotspot (e.g., geometric search too broad)
- High Count, High Rate: Efficient and prolific detector (healthy)
- Dominant Time % >40%: Candidate for algorithmic or vectorization review
- Reduce exploratory distance upper bounds for broad-range detectors (dispersion, hydrophobic)
- Restrict interaction subset when profiling by using per-interaction re-run buttons
- Increase parallel workers only if CPU-bound and system has spare cores
- Use snapshot export to capture timing summaries (
detector_timings_summary) for comparison between tuning sessions. - Consider archiving snapshots with commit SHA for manuscript methods sections.
The project is undergoing an incremental modularization to improve maintainability, testing, and future extensibility.
app/
server.py (entry – may remain at repo root for Streamlit)
ui/
style_loader.py
panels/
info_panel.py (extracted)
assets/
styles/ (extracted inline CSS -> base + themes)
src/ (legacy core code – will be gradually reorganized)
tests/
test_config.py (initial smoke tests)
- Decompose monolithic
main_interface.py(>3000 lines) into panel modules. - Centralize styling in
app/assets/stylesand eliminate large inline CSS blocks. - Introduce an interaction registry abstraction (planned) to unify parameter metadata, colors, and detection dispatch.
- Separate pure logic from Streamlit UI to enable headless testing of detectors & processors.
- Provide clearer boundaries:
core/(algorithms),services/(cache, session, config),visualization/,reporting/,ui/. - Adopt
pyproject.tomlfor modern packaging and tool configuration.
- Added
.gitignoreto exclude cache, logs, build artifacts. - Added
pyproject.tomlscaffold (PEP 621 metadata + tooling sections). - Extracted initial CSS (base + dark/light + select hardening).
- Added style loader utility (
app/ui/style_loader.py). - Modularized Info tab (
info_panel.py). - Added smoke tests (
tests/test_config.py).
- Migrate remaining tab render functions into
panels/modules. - Implement
interaction_registry.pyto consolidate parameter descriptors. - Introduce lightweight plugin mechanism for adding new interaction types.
- Write tests for one or two detectors (geometry correctness & edge cases).
- Move PDF reference material into
docs/references/(currently under root collections). - Remove or archive legacy scripts after confirmation (
temp_viewer_method.py,simple_viewer_test.py).
- Avoid adding new inline CSS; place styles in theme or component-level CSS.
- Keep new UI code under
app/ui/panelsorapp/ui/components. - Prefer pure, testable functions in future
core/modules; Streamlit layer should only orchestrate.
Existing imports like from ui.main_interface import MainInterface remain valid during the transition. Once panels are fully extracted, the legacy file will shrink dramatically; an import compatibility shim may be left temporarily.
For questions or to propose structural changes, open a GitHub Discussion tagged architecture.
Professional overview sections added above; original exhaustive reference retained verbatim inside collapsible section for continuity.




