The Knuth-Morris-Pratt algorithm can be regarded as a major contribution to the analysis of string match problems. It involves seeking a given sequence of characters or patterns within a larger text. Hence, the traditional search methods, such as the naive approach, often lead to slow, inefficient searches, particularly when searching through long texts. KMP (algorithm developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1970.) The running time of this algorithm is linear. This efficiency is made through intelligent pattern preprocessing, which determines the need for certain comparisons.
String matching is not just a theoretical concept but a practical necessity in various fields. From search engines to DNA sequence comparisons and text editors, the KMP algorithm’s efficiency makes it a valuable tool in these and many other domains, underscoring its relevance and versatility.

POSTGRADUATE PROGRAM IN
Multi Cloud Architecture & DevOps
Master cloud architecture, DevOps practices, and automation to build scalable, resilient systems.
Historical Background and Significance
Donald Knuth, Vaughan Pratt, and James H. Morris first presented the KMP algorithm in 1970. Their main achievement was to enhance the performance of an extraordinarily important issue in computer science called pattern matching. Before KMP, most pattern-matching algorithms used took up to quadratic time, which made their use in large texts almost impossible.
The importance or the value of the KMP algorithm is that it only requires O(n + m) time to work with it; n is the number of characters in the text, while m represents the number of characters in the pattern. This pioneering accomplishment was to give a highly effective procedure for string matching issues.
Detailed Explanation of the Knuth-Morris-Pratt Algorithm
Basic Principles
The KMP algorithm is a significant improvement over naive methods, as it avoids redundant comparisons. Instead of checking every possible starting position in the text, KMP preprocesses the pattern to build an auxiliary array known as the Longest Prefix Suffix (LPS) array. This array is the key to the algorithm’s efficiency. It helps in skipping characters that we know will match, thus reducing the number of comparisons needed and ultimately speeding up the search process.
How It Works
The KMP algorithm operates in two main phases:
- Preprocessing the Pattern: This involves constructing the LPS array, which stores the lengths of the longest prefix, which is also a suffix for each sub-pattern.
- Pattern Matching: Using the LPS array, the algorithm scans the text to find the pattern efficiently.
The key idea is that when a mismatch occurs after some matches, the LPS array indicates the next positions to match, thereby skipping unnecessary checks.
Why It’s Efficient
The efficiency of the KMP algorithm comes from its ability to avoid backtracking in the text. By using the LPS array, the algorithm ensures that each character of the text is compared at most once. This results in a linear time complexity, making it significantly faster than traditional methods.
Also Read– What is Data Structure
The Role and Construction of the Longest Prefix Suffix (LPS) Array
Importance of the LPS Array
The LPS array is central to the KMP algorithm. It helps optimise the search process by indicating the longest proper prefix of the pattern, which is also a suffix. This information allows the algorithm to skip unnecessary comparisons and resume matching from the correct position.
How to Construct the LPS Array?
We compute the LPS values for each pattern point as part of the preprocessing procedure before building the LPS array. The following is a detailed explanation:
- Initialisation: Start with an array lps of the same length as the pattern, initialised to 0. The first element of the LPS array is always 0 since a single character cannot have a proper prefix or suffix.
- Iterate Over the Pattern: Use two pointers, i and len, where i traverses the pattern, and len tracks the length of the current longest prefix suffix.
- Matching Characters:
- If the characters at positions i and len match, increment len and set lps[i] = len. Move i to the next position.
- If the characters do not match and len is not 0, set len to lps[len-1]. This step avoids unnecessary comparisons by using previously computed LPS values.
- If len is 0, set lps[i] = 0 and move i to the next position.
Here’s a simple example to illustrate the process:
Example
Consider the pattern “ABABAC”:
- Start with i = 1 and len = 0.
- A (pattern[0]) matches A (pattern[1]), so set lps[1] = 1 and increment i and len.
- B (pattern[1]) matches B (pattern[2]), so set lps[2] = 2 and increment i and len.
- Continue this process until i = 5. If a mismatch occurs, adjust len using lps[len-1] and continue.
By the end of this process, we have the LPS array: [0, 0, 1, 2, 3, 0].
Practical Implications
The LPS array is crucial for the efficiency of the KMP algorithm. It ensures the algorithm skips unnecessary comparisons, making the pattern-matching process faster and more efficient. Although this preprocessing step takes linear time, it significantly enhances the overall performance of the KMP algorithm.

82.9%
of professionals don't believe their degree can help them get ahead at work.
Step-by-Step Breakdown of the KMP Algorithm
Understanding the Knuth-Morris-Pratt algorithm involves two main phases: preprocessing the pattern and performing the pattern matching. Let’s walk through each step to see how these phases work together to efficiently search for a pattern within a text.
Phase 1: Preprocessing the Pattern
The first step is to preprocess the pattern to create the Longest Prefix Suffix (LPS) array. This array helps us skip unnecessary comparisons during the pattern-matching phase.
- Initialise Variables: Start with two variables. i will iterate over the pattern, and len will keep track of the current longest prefix, which is also a suffix.
- Create the LPS Array: Initialise an array lps of the same length as the pattern, with all values set to 0. Set the first value, lps[0], to 0 because a single character has no proper prefix or suffix.
- Iterate Through the Pattern: Move through the pattern using the following logic:
- If the characters at positions i and len match, increment both i and len, and set lps[i] = len.
- If the characters do not match and len is not 0, set len to lps[len-1] and do not increment i.
- If len is 0, set lps[i] = 0 and increment i.
Example of LPS Array Construction
Consider the pattern “ABABAC”:
- Initialise i = 1 and len = 0. The LPS array starts as [0, 0, 0, 0, 0, 0].
- Compare pattern[0] (‘A’) with pattern[1] (‘B’). They don’t match, so set lps[1] = 0 and increment i to 2.
- Compare pattern[0] (‘A’) with pattern[2] (‘A’). They match, so set lps[2] = 1, increment len to 1, and i to 3.
- Compare pattern[1] (‘B’) with pattern[3] (‘B’). They match, so set lps[3] = 2, increment len to 2, and i to 4.
- Compare pattern[2] (‘A’) with pattern[4] (‘A’). They match, so set lps[4] = 3, increment len to 3, and i to 5.
- Compare pattern[3] (‘B’) with pattern[5] (‘C’). They don’t match, so set len to lps[2] = 1 and compare again. They don’t match; set len to lps[0] = 0. Set lps[5] = 0 and increment i to 6.
The final LPS array for “ABABAC” is [0, 0, 1, 2, 3, 0].
Phase 2: Pattern Matching
Once we have the LPS array, we can proceed with the pattern-matching phase. This phase involves using the LPS array to avoid unnecessary comparisons.
- Initialise Variables: Use two pointers, i for the text and j for the pattern. Start both at 0.
- Compare Characters:
- If the characters at text[i] and pattern[j] match, increment both i and j.
- If j equals the length of the pattern, we have found a match. Record the position, and reset j using lps[j-1].
- If the characters do not match and j is not 0, set j to lps[j-1]. If j is 0, increment i.
Example of Pattern Matching
Let’s match the pattern “ABABAC” in the text “ABABABAC”:
- Initialise i = 0 and j = 0.
- Compare text[0] (‘A’) with pattern[0] (‘A’). They match, increment i and j.
- Continue matching until i = 6 and j = 5. Text[6] (‘C’) matches pattern[5] (‘C’). Increment both i and j.
- j equals the length of the pattern, so we found a match at index i – j = 0.
- Reset j to lps[5] = 0 and continue.
We find the pattern “ABABAC” starting at index 0 in the text “ABABABAC”.
Time Complexity and Efficiency Analysis
The Knuth-Morris-Pratt algorithm is efficient because it processes the text and the pattern in linear time, O(n + m). This efficiency stems from the preprocessing phase, where we build the LPS array in O(m) time, and the matching phase, where each character in the text is compared at most once, resulting in O(n) time.
Comparison with the Naive Approach
In contrast, the naive pattern-matching approach can take up to O(n * m) time in the worst case. This inefficiency arises because it checks every possible starting position in the text, leading to many redundant comparisons.
Why KMP is Better
The KMP algorithm’s use of the LPS array allows it to skip portions of the text that have already been matched. This reduces the number of comparisons, making the algorithm faster and more efficient, especially for long texts and patterns.
Practical Examples and Code Implementations
Example 1: Matching DNA Sequences
Let’s consider a practical example of matching DNA sequences. Suppose we want to find the sequence “AGCT” in a long DNA string “AGCTTAGCTGAGCTAGCT”:
- Preprocess the pattern “AGCT” to create the LPS array [0, 0, 0, 0].
- Use the KMP algorithm to match the pattern in the text. We find matches at positions 0, 7, and 13.
Example 2: Searching in a Text File
Another practical use of the KMP algorithm is searching for a specific word in a large text file. Let’s search for the word “algorithm” in a document:
- Preprocess the pattern “algorithm” to create the LPS array.
- Use the KMP algorithm to find all occurrences of the word in the document.
Code Implementation in Java
Below is a Java implementation of the KMP algorithm:
What is the main advantage of the KMP algorithm over the naive pattern-searching algorithm?
How does the LPS array improve the efficiency of the KMP algorithm?
Can the KMP algorithm be used for searching multiple patterns in a text?
What are some real-world applications of the KMP algorithm?
Why is the KMP algorithm considered difficult to understand?
Updated on July 2, 2024
