- Introduction
- Star History
- Reproduce the Results
- How to Cite
- How to load the embeddings
- Contact
- License
Official repository for the paper in Bioinformatics: SPACE: STRING proteins as complementary embeddings, in which we precalculated:
- cross-species network embeddings
- ProtT5 sequence embeddings
for all eukaryotic proteins in STRING v12.0.
You can download all the embeddings from the STRING website:
- protein.network.embeddings.v12.0.h5
- protein.sequence.embeddings.v12.0.h5
Please follow this document.
If you use this work in your research, please cite the SPACE paper:
Hu, Dewei, et al. "SPACE: STRING proteins as complementary embeddings." Bioinformatics (2025): btaf496. https://doi.org/10.1101/2024.11.25.625140
and the STRING database:
Szklarczyk, D., Nastou, K., Koutrouli, M., Kirsch, R., Mehryary, F., Hachilif, R., ... & von Mering, C. (2025). The STRING database in 2025: protein networks with directionality of regulation. Nucleic Acids Research, 53(D1), D730-D737. https://doi.org/10.1093/nar/gkae1113
The following code reads the cross-species network embedding file 9606.protein.network.embeddings.v12.0.h5.
pip install h5pyimport h5py
filename = '9606.protein.network.embeddings.v12.0.h5'
with h5py.File(filename, 'r') as f:
    meta_keys = f['metadata'].attrs.keys()
    for key in meta_keys:
        print(key, f['metadata'].attrs[key])
    embedding = f['embeddings'][:]
    proteins = f['proteins'][:]
	
    # protein names are stored as bytes, convert them to strings
    proteins = [p.decode('utf-8') for p in proteins]Install the rhdf5 package to read the embedding files. The following code reads the embedding file 9606.protein.network.embeddings.v12.0.h5.
# Install required packages if not already installed
# install.packages("rhdf5")
# Load the library
library(rhdf5)
filename <- '9606.protein.network.embeddings.v12.0.h5'
metadata <- h5readAttributes(filename, "metadata")
for (key in names(meta_keys)) {
    print(paste(key, meta_keys[[key]]))
}
embeddings <- h5read(filename, "embeddings")
proteins <- h5read(filename, "proteins")Read the combined network embedding file of all eukaryotes with Python
import h5py
filename = 'protein.network.embeddings.v12.0.h5'
with h5py.File(filename, 'r') as f:
    meta_keys = f['metadata'].attrs.keys()
    for key in meta_keys:
        print(key, f['metadata'].attrs[key])
  
  species = '4932'  # if we check the brewer's yeast
  embeddings = f['species'][species]['embeddings'][:]
  proteins = f['species'][species]['proteins'][:]
  # protein names are stored as bytes, convert them to strings
  proteins = [p.decode('utf-8') for p in proteins]Read the combined file with R
library(rhdf5)
filename <- 'protein.network.embeddings.v12.0.h5'
meta_keys <- h5attributes(h5file$metadata)
for (key in names(meta_keys)) {
    print(paste(key, meta_keys[[key]]))
}
species <- '4932'  # for brewer's yeast
embeddings <- h5read(filename, paste0('species/', species, '/embeddings'))
proteins <- h5read(filename, paste0('species/', species, '/proteins'))MIT.
