Using ARTEM to Infer Sequence alignment (ARTEMIS) is a tool for the RNA 3D structure superposition and determination of the structure-based sequence alignment.
Check out our other developments
ARTEMIS reads a reference and a query structure from the specified coordinate files in PDB or in mmCIF format and identifies the two best superpositions. The first superposition is strictly backbone-dependent (i.e. does not allow backbone permutations) and is fed to the standard output. In verbose mode, the details of the second superposition, which allows permutations, are additionally fed to the standard output. The second superposition details are printed automatically if its TM-score is at least 10% higher than the TM-score of the first alignment. The user can choose to save the superpositions of the query structure in PDB or mmCIF format along with the residue matching tables in the specified folder. By default, ARTEMIS saves the output files only for the first alignment, unless the second one is at least 10% better in TM-score. By default, ARTEMIS reads the entire first models of the input files.
Clone the GitHub repository by typing
git clone https://github.com/david-bogdan-r/ARTEMIS.git
ARTEMIS requires five Python3 libraries to be installed:
- numpy
- pandas
- scipy
- matplotlib
- requests
To install, type:
pip install -r requirements.txt
ARTEMIS was tested with two different Python3 environments:
- python==3.8
- numpy==1.22.3
- pandas==1.4.1
- scipy==1.8.0
- requests==2.31.0
- python==3.12
- numpy==1.26.3
- pandas==2.1.4
- scipy==1.11.4
- requests==2.31.0
python3 artemis.py r=FILENAME q=FILENAME [OPTIONS]
1) python3 artemis.py r=examples/6ugg/6ugg.cif q=examples/1ivs.pdb rres=/B qres=/C saveto=result saveformat=pdb
This command will superimpose chain C from 1ivs to chain B from 6ugg
and will save the superimposed structure into "result" sub-folder
in PDB format.
2) python3 artemis.py r=examples/6ugg/6ugg.cif q=examples/1ivs.pdb rres=/B qres=/C saveto=result saveformat=pdb -v
The same command as in example 1, but with verbose mode enabled.
r=FILENAME/FOLDER/PDB-ENTRY [REQUIRED OPTION]
Path to a reference structure in PDB/mmCIF format. For faster
performance, it's advised to specify the largest of the two
structures as the reference structure.
If a folder or a mask is specified instead, ARTEMIS will process
all the PDB/mmCIF files (according to the rformat parameter)
in that folder/mask as a reference structure one by one.
If a 4-character PDB entry is specified, ARTEMIS will download the
structure from RCSB PDB.
q=FILENAME/FOLDER/PDB-ENTRY [REQUIRED OPTION]
Path to a query structure, the one that ARTEMIS superimposes to
the reference, in PDB/mmCIF format.
If a folder or a mask is specified instead, ARTEMIS will process
all the PDB/mmCIF files (according to the qformat parameter)
in that folder/mask as a query structure one by one.
If a 4-character PDB entry is specified, ARTEMIS will download the
structure from RCSB PDB.
addhits=FLOAT [DEFAULT: addhits=0.0]
Return additional suboptimal hits (query motif search mode).
If 0 < addhits < 1, ARTEMIS will report alternative hits with the
TM-score value of the query structure >= addhits. If addhits > 1,
ARTEMIS will return up to int(addhits) alternative hits.
For each alternative hit, a new ARTEMIS iteration will be performed,
with the rresneg parameter populated with the residues of the hits
obtained at the previous iterations, as defined by "-s" and "-p"
parameters. If both of the parameters are off, the sequentially-ordered
hits will be considered unless the topology-independent hit has the
query TM-score that is at least 10% higher than the one in the
sequentially-ordered hit.
matchrange=FLOAT [DEFAULT: matchrange=3.5]
The threshold used for searching the mutually closest residues.
Only those pairs of residues that have their C3' atoms at a distance
under the specified matchrange value can be added to the subset
of the mutually closest residues. The higher matchrange value
will produce more "noisy" matchings but won't miss anything. The lower
matchrange value will produce more "clean" matchings but
can miss something.
rformat=KEYWORD, qformat=KEYWORD [DEFAULT: rformat=PDB,qformat=PDB]
The specification of the input coordinate file formats
(case-insensitive). By default, ARTEMIS tries to infer the format
from the extensions of the input filenames. ".pdb", ".cif",
and ".mmcif" formats can be recognized (case-insensitive). In the case
of any other extension ARTEMIS will treat the file as the PDB-format
file by default. If the "rformat" ("qformat") parameter is specified
and it's either "PDB", "CIF", or "MMCIF" (case-insensitive),
ARTEMIS will treat the reference (query) coordinate file
as the specified format.
rres=STRING, qres=STRING [DEFAULT: rres="#1",qres="#1"]
The specification of the input reference (rres) and query (qres)
structures. Only the specified residues will be considered as part
of the structure and all the other residues will be ignored.
See the format description at the end of the OPTIONS section.
rresneg=STRING, qresneg=STRING [DEFAULT: None]
The specification of the input reference (rresneg) and query (qresneg)
structures. The specified residues will be ignored and all the other
residues considered as part of the structure.
See the format description at the end ofthe OPTIONS section.
rseed=STRING, qseed=STRING [DEFAULT: rseed=rres,qseed=qres]
The specification of the reference and query residues that ARTEMIS
can use for single-residue matching seeds.
See the format description at the end of the OPTIONS section.
saveformat=KEYWORD [DEFAULT: saveformat=qformat]
The specification of the format of the output coordinate files.
By default, ARTEMIS will save the coordinate files in the same format
as the query input file. If the "saveformat" parameter is specified
and it's either "PDB", "CIF", or "MMCIF" (case-insensitive), ARTEMIS
will save the output coordinate files in the specified format.
saveres=STRING [DEFAULT: saveres=qres]
The specification of the query structure residues that will be saved
in the output coordinate files.
See the format description at the end of the OPTIONS section.
saveto=FOLDER [DEFAULT: None]
Path to the output folder to save the coordinate files
of the superimposed query structure along with the mutually
closest residue subsets. If the specified folder does not exist,
ARTEMIS will create it. If the folder is not specified,
nothing will be saved.
shift=FLOAT [DEFAULT: shift=3 if len(qres) < 500 else 20]
The shift2 value for the ScoreMatrix used in the Needleman-Wunsch
algorithm. Larger shift2 provides higher coverage.
stepdiv=INT [DEFAULT: stepdiv=0 if len(qres) < 500 else 100]
The step divider parameter. The parameter is used to speed up
the procedure for large structures. If stepdiv > 0, ARTEMIS
will consider only each Sth reference residue as matching seed,
where S = 1 + len(qres)//stepdiv. If the size of the query
structure exceeds 500 residues, stepdiv will be set to 100
by default.
toplargest=INT [DEFAULT: toplargest=len(qres) if len(qres) < 500 else 2*threads]
Number of largest mutually closest residue sets for which
alignments are constructed.
threads=INT [DEFAULT: threads=CPU_COUNT]
Number of CPUs to use.
-p, --permutation [DEFAULT: OFF]
Permutation mode. If specified, ARTEMIS will add the topology-independent
alignment details to the standard output, and save its output files
to the saveto folder (if the folder is specified).
The mode is automatically activated if the TM-score of the query
structure for topology-independent alignment is at least 10% higher
than the TM-score of the backbone-dependent alignment.
If addhits > 0 and this parameter is on, topology-independent hits
will always contribute to the rresneg parameter of the next iteration.
-s, --sequential [DEFAULT: OFF]
If addhits > 0 and this parameter is on, sequentially-ordered hits
will always contribute to the rresneg parameter of the next iteration.
-tsv, --tsv [DEFAULT: OFF]
If specified, the output will be in TSV format instead of plain text.
By default, this is turned off with addhits <= 0 and turned on otherwise.
-superonly, --superonly [DEFAULT: OFF]
If specified, ARTEMIS will assume the input sequences
are of the same length and will try to superimpose them
with the perfect sequence alignment.
-v, --verbose [DEFAULT: OFF]
Verbose mode.
-silent, --silent [DEFAULT: OFF]
If specified, ARTEMIS will not raise any errors.
-notmopt, --notmopt [DEFAULT: OFF]
If specified, ARTEMIS will not maximize TM-scores for a given Lali set, but minimize RMSD instead.
***************************************************************************
ARTEMIS uses a ChimeraX-like format to specify the residues of interest
using the "res" parameters:
[#[INT]][/[STRING]][:[STRING][_INT[CHAR|_INT]] - The structure specification
format. The "res"
parameters can be defined
with a number
of specifications
separated by spaces and
enclosed in double quotes.
#[INT] == Model number
/[STRING] == Chain identifier
:[STRING][_INT[CHAR|_INT]] == Residue(s) specification:
:STRING == Residue type
:_INT[CHAR] == Residue number [with insertion code]
:_INT_INT == Range of residue numbers
Structure specification examples:
rres="#1/B:_-10_20 #1/A" - Consider the entire chain A from model 1
and the range of chain B residues with
numbers from -10 to 20 from model 1 as
the reference structure.
qres="#" - Consider all the residues from all
the models in the "q" file as
the query structure.
saveres="/C:_10_20 /C:_20A" - Save the chain C residues with numbers
from 10 to 20 and the chain C residue
with number 20A (A is the insertion code).
rseed=:A - Use only the model 1 adenosines as the
single-residue seeds from the reference
structure.
David Bogdan, e-mail: dav.bog.rom@gmail.com