Skip to content

Global Alignment with Scoring Matrix, Custom weight, Affine Gap Penalty and Bounded Dynamic Programming acceleration.

License

Notifications You must be signed in to change notification settings

ChengyuanSha/Smart-Alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codacy Badge Codacy Badge License: MIT

Global Alignment with Scoring Matrix, Custom Weight, Affine Gap Penalty and Bounded Dynamic Programming (DP) Acceleration

Introduction

  • Every Rosalind node have its own file in the format ACRN.py (in algorithms folder) where ACRN is the acronym for the node in the tree view

  • The main function in each node is main_XXX, eg. main function in ACRN.py ismain_ACRN(...) the main function input is what it says on the rosalind site.

  • Testing function is named test_function_name, eg. for ACRN.py -> test_ACRN.py

  • There are multiple tests per function, first two are positive, last one is negative. Test data are named form 1-3 in dataset (except for extension, there are 6 dataset, first 3 are used by experiment).

Code quality

Code quality and test coverage is ensured by the continuous integration server Codacy and python Coverage. Codacy uses Radon by default to calculate metrics from the source code.

We achieved 100% coverage on all algorithm files. Check Coverage: Run the following commands in project root directory.

coverage run -m unittest discover
coverage report

Running Instruction

Please install all requirements from requirements.txt.

figures.py contains the efficiency experiment (Comparing time efficiency between unbounded algorithm and bounded acceleration) and comparing optimal scores experiment (Comparing alignment scores from bounded algorithm with the optimal scores from unbounded algorithm).

experiments.py contains all other experiments performed by us.

Function usage

GAFF_extended(s, t, scoring_matrix, gap, gap_ext, conserved_seq="", conserved_strength=0, bound=-1,
                  ignore_start_gaps=False, ignore_end_gaps=False, auto_bound = False):

    Global Alignment with Scoring Matrix, Custom Weight, Affine Gap Penalty and Bounded DP Acceleration.

    Inputs: Two protein strings s and t in FASTA format.
            gap: a negative int representing gap opening penalty.
            gap_ext: a negative int representing gap extension penalty.
            conserved_seq: increase the weight on letters in the input sequence (string). Default empty string.
            conserved_strength: weight on conserved_seq. Default 0.
            bound: Define the range of bound dynamic programming. Negative number will run unbound version.
                   Default -1.
            auto_bound: Auto calculate the bound size using DNA length difference, default False.
            ignore_start_gaps: ignoring start gaps in alignment, default False.
            ignore_end_gaps: ignoring end gaps in alignment, default False.

    Returns: The maximum alignment score between s and t, followed by two augmented strings
             s′ and t′ representing an optimal alignment of s and t.

More detailed explanation

Our academic paper can be found in paper.pdf.

About

Global Alignment with Scoring Matrix, Custom weight, Affine Gap Penalty and Bounded Dynamic Programming acceleration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages