Skip to content

Commit 14bec12

Browse files
authored
Merge pull request #696 from dhanashreeg368/Dhanashree
added python programs for rabinkarp and KMP
2 parents f5bc481 + e40ebc7 commit 14bec12

File tree

2 files changed

+200
-0
lines changed

2 files changed

+200
-0
lines changed

Strings/KMP_Algorithm.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
"""KMP Algorithm - Pattern Searching Algorithm
2+
3+
KMP Algorithm is also called as Knuth, Morris, and Pratt string searching algorithm
4+
This algorithm uses the previous comparison data.
5+
It uses a partial match table to analyze the pattern structure.
6+
The goal of the table is to allow the algorithm not to match any character of pattern more than once.
7+
The basic idea behind KMP’s algorithm is:
8+
whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window.
9+
We take advantage of this information to avoid matching the characters that we know will anyway match.
10+
We need to know about proper prefixes and proper suffixes first.
11+
12+
Proper prefix - All the characters in a string, with one or more cut off the end.
13+
“C”, “Co”, “Cod”, and “Codi” are all the proper prefixes of “Coding”.
14+
15+
Profer suffix - All the characters in a string, with one or more cut off the beginning.
16+
“adrid”, “drid”, “rid”, “id”, and “d” are all proper suffixes of “Madrid”.
17+
18+
The value of the partial table is the "length of the longest proper prefix that matches a proper suffix".
19+
20+
Pseudocode -
21+
if table[partial_match_length] > 1:
22+
skip ahead by partial_match_length - table[partial_match_length - 1] characters
23+
else:
24+
don’t get to skip ahead by partial_match_length - table[partial_match_length - 1] characters. See next partial match.
25+
26+
Let’s say we’re matching the pattern “abababca” against the text “bacbababaabcbab”.
27+
Here’s our partial match table again for easy reference
28+
29+
char: | a | b | a | b | a | b | c | a |
30+
index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
31+
value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |
32+
33+
Example -
34+
1. The first match is at index 1.
35+
bacbababaabcbab
36+
|
37+
abababca
38+
Hence partial_match_length = 1
39+
See the next partial match and so on.
40+
Repeat the steps till the last partial match is found.
41+
42+
Time Complexity :
43+
Assuming n is the length of text and m is the length of pattern.
44+
It can search for a pattern in O(n) time as it never re-compares a text symbol that has matched a pattern symbol.
45+
Construction of a partial match table takes O(m) time.
46+
Therefore, the overall time complexity of the KMP algorithm is O(m + n).
47+
"""
48+
49+
# Python program for KMP Algorithm
50+
def KMPSearch(pat, txt):
51+
plen = len(pat)
52+
tlen = len(txt)
53+
54+
# create lps[] that will hold the longest prefix suffix
55+
# values for pattern
56+
lps = [0]*plen
57+
j = 0 # index for pat[]
58+
59+
alen = 0 # length of the previous longest prefix suffix
60+
61+
lps[0] # lps[0] is always 0
62+
i = 1
63+
64+
# the loop calculates lps[i] for i = 1 to M-1
65+
while i < plen:
66+
if pat[i]== pat[alen]:
67+
alen += 1
68+
lps[i] = alen
69+
i += 1
70+
else:
71+
if len != 0:
72+
len = lps[alen-1]
73+
# Also, note that we do not increment i here
74+
else:
75+
lps[i] = 0
76+
i += 1
77+
78+
i = 0 # index for txt[]
79+
while i < tlen:
80+
if pat[j] == txt[i]:
81+
i += 1
82+
j += 1
83+
84+
if j == plen:
85+
print ("Found pattern at index " + str(i-j))
86+
j = lps[j-1]
87+
88+
# mismatch after j matches
89+
elif i < tlen and pat[j] != txt[i]:
90+
# Do not match lps[0..lps[j-1]] characters,
91+
# they will match anyway
92+
if j != 0:
93+
j = lps[j-1]
94+
else:
95+
i += 1
96+
97+
print("enter text: ")
98+
txt=input()
99+
print("enter pattern; ")
100+
pat=input()
101+
KMPSearch(pat, txt)

Strings/Rabin_Karp.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
""" Rabin Karp Algorithm for pattern searching in Python
2+
3+
Rabin-Karp algorithm is an algorithm used for searching/matching patterns in the text using a hash function.
4+
It does not travel through every character in the initial phase rather it filters the characters that do not match
5+
and then performs the comparison.
6+
7+
Working -
8+
A sequence of characters is taken and checked for the possibility of the presence of the required string.
9+
If the possibility is found then, character matching is performed.
10+
f the hash values are unequal, the algorithm will determine the hash value for next plen-character sequence. If the hash values are equal,
11+
the algorithm will analyze the pattern and the plen-character sequence.
12+
In this way, there is only one comparison per text subsequence, and character matching is only required when the hash values match.
13+
14+
Features -
15+
Like naive algorithm we slide the pattern over the string one by one and compare every character in pattern with the text.
16+
To reduce the number of comparisons, we use hashing.
17+
We compare the hash values of the pattern and the current text window if the hash value match,
18+
then only we proceed to compare individual characters of pattern and the text window.
19+
To calculate the hash value of the current window we use the concept of rolling hash.
20+
In rolling hash we compute hash value of current window using the hash value of the previous window.
21+
22+
Time Complexity -
23+
Assuming n is length of text and m is length of pattern.
24+
Its worst-case time is O(nm).
25+
Worst case of Rabin-Karp algorithm occurs when all characters of pattern and text are same as the hash values
26+
of all the substrings of text[] match with hash value of pattern[].
27+
28+
Example -
29+
text – “abdabc”
30+
pattern – “abc”
31+
32+
a b c d
33+
1 2 3 4
34+
35+
text – “a b d a b c”
36+
plen = pattern length = 3
37+
t = 4
38+
39+
hash_0 = 1 * 4^2 + 2 * 4^1 + 4 * 4^0 = 28
40+
hash_1 = 4 * {hash_0 – 1 * 4^2} + 1 = 49
41+
hash_2 = 4 * {hash_1 – 2 * 4^2} + 2 = 70
42+
hash_3 = 4 * {hash_2 – 4 * 4^2} + 3 = 27
43+
44+
Hence in general
45+
hash_i+1 = t * {hash_i – text[i] * t^(plen-1)} + text[i+plen]
46+
47+
"""
48+
49+
#Program
50+
51+
52+
d = 10
53+
54+
def Rabin_Karp(pattern, text, q):
55+
m = len(pattern) #len of pattern
56+
n = len(text) #len of text
57+
p = 0
58+
t = 0
59+
h = 1
60+
i = 0
61+
j = 0
62+
63+
for i in range(m-1):
64+
h = (h*d) % q
65+
66+
# Calculate hash value for pattern and text
67+
for i in range(m):
68+
p = (d*p + ord(pattern[i])) % q #formula to calculate hash func for pattern
69+
t = (d*t + ord(text[i])) % q #formula to calculate hash func for text
70+
71+
# Find the match
72+
for i in range(n-m+1):
73+
if p == t:
74+
#Check for characters one by one
75+
for j in range(m):
76+
if text[i+j] != pattern[j]:
77+
break
78+
79+
j += 1
80+
if j == m:
81+
print("Pattern is found at position: " + str(i+1))
82+
83+
#Calculate hash value for next window of text: Remove leading digit, add trailing digit
84+
if i < n-m:
85+
#Calculate hash value of next window
86+
#t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q
87+
t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q
88+
89+
# We might get negative value of t, converting it to positive
90+
if t < 0:
91+
t = t+q
92+
93+
94+
print("Enter text: ")
95+
text=input() #input text
96+
print("Enter pattern: ")
97+
pattern=input() #input pattern
98+
q = len(text)+len(pattern)
99+
Rabin_Karp(pattern, text, q)

0 commit comments

Comments
 (0)