Sequence Alignmnet algorithm with dynamic programming approach and a more space efficient Divide and conquer plus DP approach.
A. Algorithm Description Suppose we are given two strings π and π, where π consists of the sequence of symbols π₯ and consists of the sequence of symbols . 1, π₯2 , ... , π₯π π π¦1, π¦2 , ... , π¦π Consider the sets {1, 2, ... ,π} and {1, 2, ... , π} as representing the different positions in the strings π and π, and consider a matching of these sets; Recall that a matching is a set of ordered pairs with the property that each item occurs in at most one pair. We say that a matching π of these two sets is an alignment if there are no βcrossingβ pairs: if (π, π), (π', π') Ο΅ π and π < π' , then π < π'. Intuitively, an alignment gives a way of lining up the two strings, by telling us which pairs of positions will be lined up with one another. Our definition of similarity will be based on finding the optimal alignment between π and π, according to the following criteria. Suppose π is a given alignment between π and π:
- First, there is a parameter Ξ΄ that defines a gap penalty. For each π > 0 position of πor π that is not matched in π β it is a gap β we incur a cost of Ξ΄.
- Second, for each pair of letters π, π in our alphabet, there is a mismatch cost of Ξ± for lining up with . Thus, for each , we pay the ππ π π (π, π) Ο΅ π appropriate mismatch cost Ξ± for lining up with . One generally π₯ππ¦π π₯π π¦π assumes that Ξ± = 0 for each letter βthere is no mismatch cost to line up ππ π a letter with another copy of itselfβalthough this will not be necessary in anything that follows.
- The cost of π is the sum of its gap and mismatch costs, and we seek an alignment of minimum cost.