Global Alignment with Scoring Matrix, Custom Weight, Affine Gap Penalty and Bounded Dynamic Programming (DP) Acceleration
-
Every Rosalind node have its own file in the format
ACRN.py
(in algorithms folder) whereACRN
is the acronym for the node in the tree view -
The main function in each node is main_XXX, eg. main function in
ACRN.py
ismain_ACRN(...)
the main function input is what it says on the rosalind site. -
Testing function is named test_function_name, eg. for
ACRN.py
->test_ACRN.py
-
There are multiple tests per function, first two are positive, last one is negative. Test data are named form 1-3 in dataset (except for extension, there are 6 dataset, first 3 are used by experiment).
Code quality and test coverage is ensured by the continuous integration server Codacy and python Coverage. Codacy uses Radon by default to calculate metrics from the source code.
We achieved 100% coverage on all algorithm files. Check Coverage: Run the following commands in project root directory.
coverage run -m unittest discover
coverage report
Please install all requirements from requirements.txt
.
figures.py
contains the efficiency experiment
(Comparing time efficiency between unbounded algorithm
and bounded acceleration) and
comparing optimal scores experiment
(Comparing alignment scores from bounded algorithm with the
optimal scores from unbounded algorithm).
experiments.py
contains all other experiments performed by us.
GAFF_extended(s, t, scoring_matrix, gap, gap_ext, conserved_seq="", conserved_strength=0, bound=-1,
ignore_start_gaps=False, ignore_end_gaps=False, auto_bound = False):
Global Alignment with Scoring Matrix, Custom Weight, Affine Gap Penalty and Bounded DP Acceleration.
Inputs: Two protein strings s and t in FASTA format.
gap: a negative int representing gap opening penalty.
gap_ext: a negative int representing gap extension penalty.
conserved_seq: increase the weight on letters in the input sequence (string). Default empty string.
conserved_strength: weight on conserved_seq. Default 0.
bound: Define the range of bound dynamic programming. Negative number will run unbound version.
Default -1.
auto_bound: Auto calculate the bound size using DNA length difference, default False.
ignore_start_gaps: ignoring start gaps in alignment, default False.
ignore_end_gaps: ignoring end gaps in alignment, default False.
Returns: The maximum alignment score between s and t, followed by two augmented strings
s′ and t′ representing an optimal alignment of s and t.
Our academic paper can be found in paper.pdf
.