Skip to content

Commit 8f49f99

Browse files
danpfjackdent
andauthored
Add CLI option to set colabfold server url (#199)
* Add CLI option to set colabfold server * Update README.md Co-authored-by: Jack Dent <jack@d3nt.com> * Update chai_lab/chai1.py Co-authored-by: Jack Dent <jack@d3nt.com> * Update chai_lab/chai1.py Co-authored-by: Jack Dent <jack@d3nt.com> * Update chai_lab/data/dataset/msas/colabfold.py Co-authored-by: Jack Dent <jack@d3nt.com> * Update chai_lab/data/dataset/msas/colabfold.py Co-authored-by: Jack Dent <jack@d3nt.com> * Update README.md Co-authored-by: Jack Dent <jack@d3nt.com> --------- Co-authored-by: Jack Dent <jack@d3nt.com>
1 parent e80bb3a commit 8f49f99

File tree

4 files changed

+23
-7
lines changed

4 files changed

+23
-7
lines changed

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,12 @@ For example, to run the model with MSAs (which we recommend for improved perform
3737
chai fold --use-msa-server input.fasta output_folder
3838
```
3939

40+
If you are hosting your own ColabFold server, additionally pass the `--msa-server` flag with your server:
41+
42+
```shell
43+
chai fold --use-msa-server --msa-server-url "https://api.internalcolabserver.com" input.fasta output_folder
44+
```
45+
4046
### Programmatic inference
4147

4248
The main entrypoint into the Chai-1 folding code is through the `chai_lab.chai1.run_inference` function. The following script demonstrates how to programmatically provide inputs to the model, and obtain a list of PDB files for downstream analysis:
@@ -71,7 +77,7 @@ CHAI_DOWNLOADS_DIR=/tmp/downloads python ./examples/predict_structure.py
7177

7278
Chai-1 supports MSAs provided as an `aligned.pqt` file. This file format is similar to an `a3m` file, but has additional columns that provide metadata like the source database and sequence pairing keys. We provide code to convert `a3m` files to `aligned.pqt` files. For more information on how to provide MSAs to Chai-1, see [this documentation](examples/msas/README.md).
7379

74-
For user convenience, we also support automatic MSA generation via the ColabFold [MMseqs2](https://github.com/soedinglab/MMseqs2) server via the `--msa-server` flag. As detailed in the ColabFold [repository](https://github.com/sokrypton/ColabFold), please keep in mind that this is a shared resource. Note that the results reported in our preprint and the webserver use a different MSA search strategy than MMseqs2, though we expect results to be broadly similar.
80+
For user convenience, we also support automatic MSA generation via the ColabFold [MMseqs2](https://github.com/soedinglab/MMseqs2) server via the `--use-msa-server` flag. As detailed in the ColabFold [repository](https://github.com/sokrypton/ColabFold), please keep in mind that this is a shared resource. Note that the results reported in our preprint and the webserver use a different MSA search strategy than MMseqs2, though we expect results to be broadly similar.
7581

7682
</p>
7783
</details>

chai_lab/chai1.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,8 @@ def run_inference(
270270
*,
271271
output_dir: Path,
272272
use_esm_embeddings: bool = True,
273-
msa_server: bool = False,
273+
use_msa_server: bool = False,
274+
msa_server_url: str = "https://api.colabfold.com",
274275
msa_directory: Path | None = None,
275276
constraint_path: Path | None = None,
276277
# expose some params for easy tweaking
@@ -285,7 +286,7 @@ def run_inference(
285286
), f"Output directory {output_dir} is not empty."
286287
torch_device = torch.device(device if device is not None else "cuda:0")
287288
assert not (
288-
msa_server and msa_directory
289+
use_msa_server and msa_directory
289290
), "Cannot specify both MSA server and directory"
290291

291292
# Prepare inputs
@@ -311,15 +312,19 @@ def run_inference(
311312
raise_if_too_many_tokens(n_actual_tokens)
312313

313314
# Generated and/or load MSAs
314-
if msa_server:
315+
if use_msa_server:
315316
protein_sequences = [
316317
chain.entity_data.sequence
317318
for chain in chains
318319
if chain.entity_data.entity_type == EntityType.PROTEIN
319320
]
320321
msa_dir = output_dir / "msas"
321322
msa_dir.mkdir(parents=True, exist_ok=False)
322-
generate_colabfold_msas(protein_seqs=protein_sequences, msa_dir=msa_dir)
323+
generate_colabfold_msas(
324+
protein_seqs=protein_sequences,
325+
msa_dir=msa_dir,
326+
msa_server_url=msa_server_url,
327+
)
323328
msa_context, msa_profile_context = get_msa_contexts(
324329
chains, msa_directory=msa_dir
325330
)

chai_lab/data/dataset/msas/colabfold.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,11 @@ def download(ID, path):
341341
return (a3m_lines, template_paths) if use_templates else a3m_lines
342342

343343

344-
def generate_colabfold_msas(protein_seqs: list[str], msa_dir: Path):
344+
def generate_colabfold_msas(
345+
protein_seqs: list[str],
346+
msa_dir: Path,
347+
msa_server_url: str,
348+
):
345349
"""
346350
Generate MSAs using the ColabFold (https://github.com/sokrypton/ColabFold)
347351
server.
@@ -374,6 +378,7 @@ def generate_colabfold_msas(protein_seqs: list[str], msa_dir: Path):
374378
mmseqs_dir,
375379
# N.B. we can set this to False to disable pairing
376380
use_pairing=len(protein_seqs) > 1,
381+
host_url=msa_server_url,
377382
user_agent="chai-lab/0.4.0 feedback@chaidiscovery.com",
378383
)
379384
assert isinstance(msas, list)

examples/msas/predict_with_msas.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
# See example .aligned.pqt files in this directory
3737
msa_directory=Path(__file__).parent,
3838
# Exclusive with msa_directory; can be used for MMseqs2 server MSA generation
39-
msa_server=False,
39+
use_msa_server=False,
4040
)
4141
cif_paths = candidates.cif_paths
4242
scores = [rd.aggregate_score for rd in candidates.ranking_data]

0 commit comments

Comments
 (0)