Easy to understand Dynamic Programming – Edit distance

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2014/03/01/dynamic-programming-edit-distance

Following the topic of the last post, I will discuss another problem that can be solved efficiently using dynamic programming. Unlike the Fibonacci sequence that we saw on the introduction post, the problem that I present here is a bit more tricky to solve (which also makes it more interesting).

The reason I’ll review this problem and other similar ones in future posts is not just to show how to solve them. Anyone with access to google can search for one of the 100 already implemented solutions. The idea here is two show some key concepts that you can apply to any dynamic problem.

So lets start with the well known Edit distance problem. Basically, given two strings A and B, the edit distance measures the minimum number of operations required to transform one string into the other. In the most common version of this problem we can apply 3 different operations:

  • Insert a new character into one of the strings
  • Delete an existing character
  • Replace one character by another

For example, given the strings A = “cat” and B = “cars”, editDistance(A,B) = 2 because the minimum number of transformations that we need to make is replace the “t” in A by “r” and then remove the “s” from B. After that, both strings are equal to “car”. Notice that in this case another possibility would have been to replace the “t” by an “r” but then insert an “s” into A, making both strings equal to “cars”. Similarly we could change both into “cat” or “cats” and the edit distance is still 2.

So how would we go about solving this problem? Lets try to solve a much smaller and simpler problem. Imagine we have the two previous strings again, this time represented as an array of chars in the general form:
A = [A0, A1, A2,…, An]
B = [B0, B1, B2,…, Bm]

Notice that the lengths of A and B (n and m respectively) might be different.

We know that in the end, both strings will need to have the same length and match their characters on each position. So, if we take just the first character of A and B, what choices do we have? We have 3 different transformations that we can apply so we have 3 choices as well:

  • We can replace the first character of A by the first character of B. The cost of doing this replacement will be 1 if the characters are different and 0 if they are equal (because in that case we don’t need to do anything). At this point we know that both strings start with the same character so we can compute the edit distance of A[1..n] = [A1,…, An] and B[1..m] = [B1,…, Bm]. The final result will be the cost of transforming A[0] into B[0] plus the edit distance of the remaining substrings.
  • The second choice we have is inserting a character into A to match the character in B[0], which has a cost of 1. After we have done this, we still need to consider the original A string because we have added a new character to it but we haven’t process any of its original characters. However, we don’t need to consider B[0] anymore because we know that it matches A[0] at this point. Therefore, the only thing we need to do now is to compute the edit distance of the original A and B[1..m] = [B1,…, Bm]. The final result will be this value plus 1 (the cost of inserting the character into A).
  • The last case is symmetrical to the previous one. The third transformation we can apply is simply to delete A[0]. This operation has a cost of 1 as well. After doing this, we have consumed one character of A but we still have all the original characters of B to process. So we simply need to compute the edit distance of A[1..n] = [A1,…, An] and B. Again, the final value of the edit distance will be this value plus 1 (the cost of deleting A[0].

So which of the three choices should we pick initially? Well, we don’t really know which one will be better in the end so we have to try them all. The answer to our original problem will be the minimum value of those three alternatives.

The previous description is the key to solving this problem so read it again if it is not clear. Notice that, the last step on each of the three alternatives involves computing the edit distance of 2 substrings of A and B. This is exactly the same problem as the original one we are trying to solve, i.e., solving the edit distance for 2 strings. The difference is that after each decision we take, the problem becomes smaller because one or both input strings become smaller. We are solving the original problem by solving smaller sub-problems of exactly the same type. The fact that the sub-problems become smaller on each step is crucial to be sure that we’ll terminate the algorithm at some point. Otherwise, we could keep going solving sub-problems indefinitely.

So when do we stop solving sub-problems? We can stop as soon as we get to a case which is trivial to solve: a base case. In our case that is when one or both input strings are empty. If both strings are empty then the edit distance between them is 0, because they are already equal. Otherwise, the edit distance will be the length of the string that is not empty. For example, to transform “aab” into “” we can either remove the 3 characters from the first string or insert them into the second string. In any case, the edit distance is 3.

What we have done so far is defining the solution to our problem in terms of the solution to a smaller sub-problem of the same type and identified the base case when we already know what the result is. This should be a clear hint that a recursive algorithm can be applied.
But first, lets translate the three choices discussed previously describing the relationship between sub-problems into something that will be more helpful when trying to code this. The recurrence relation can be defined as:

editDistance(A,B) = min =\begin{cases} editDistance(A[1..n],B[1..m]) + replaceCost(A[0],B[0])\\editDistance(A[1..n],B[0..m]) + 1\\editDistance(A[0..n],B[1..m]) +1\end{cases}

The first case of the previous formula is when we replace A[0] by B[0], the second one is when we delete A[0] and the last one is when we insert into A.
The base cases are:
editDistance(A,"") = length(A)
editDistance("",B) = length(B)

Notice that the case where both strings are empty is already covered by the first base case because length(A) == length(“”) == 0.

Translating this definition into a recursive algorithm is relatively straightforward:

public class BruteForceEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        int replace = editDistance(word1.substring(1), word2.substring(1)) + Util.replaceCost(word1, word2, 0, 0);
        int delete = editDistance(word1.substring(1), word2) + 1;
        int insert = editDistance(word1, word2.substring(1)) + 1;
        return Util.min(replace, delete, insert);
    }
}

The Util class has some useful methods that we’ll use on the different implementations of the algorithm and it looks something like this:

public class Util {
    /**
     * Prevent instantiation
     */
    private Util(){}

    public static int replaceCost(String w1, String w2, int w1Index, int w2Index) {
        return (w1.charAt(w1Index) == w2.charAt(w2Index)) ? 0 : 1;
    }

    public static int min(int... numbers) {
        int result = Integer.MAX_VALUE;
        for (int each : numbers) {
            result = Math.min(result, each);
        }
        return result;
    }
}

The code looks concise and it gives the expected answer. But as you can probably imagine, given that we are talking about dynamic programming, this brute force approach is far from efficient. As with the fibonacci example that we saw on the last post, this algorithm computes the same answer multiple times causing an exponential explosion of different paths that we need to explore.
To see this, lets consider the sequence of calls that are made when we invoke this method with word1 = “ABD” and word2 = “AE”:

Edit distance example

And that is only for two strings of length 3 and 2. Imagine what that picture looks like when you have proper words or even sentences. In my laptop, for any two strings with 10 or more characters the method never finishes. This approach clearly won’t scale.

So, can we apply dynamic programming to this problem? Remember the two basic properties of a dynamic problem that we discussed in the previous post: overlapping sub-problems and optimal substructure.
As we just saw on the example of the previous figure, the edit distance problem clearly has overlapping sub-problems because we are solving smaller sub-problems of the same type to get to the final solution and we need to solve the same sub-problems multiple times.
What about optimal substructure? Can we compute the optimal solution if we have optimal solutions for the sub-problems? Of course! In our case we are trying to minimize the number of transformations, so if we have the optimal solution to the three cases we consider (replace, insert and delete) then we get the minimum from those 3 and that’s our optimal solution.

Remember from the previous post that we could have a top-down dynamic programming approach where we memoize the recursive implementation or a bottom-up approach. The latter tends to be more efficient because you avoid the recursive calls. More importantly, the choice between these two can be the difference between a working and a non-working algorithm if the number of recursive calls you need to make to get to the base case is too large. If this is not a problem, then the choice of which one to use usually depends on personal preference or style.
Here’s a possible top-down algorithm first:

public class DPMemoizedEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        return editDistance(word1, word2, new HashMap<StringTuple, Integer>());
    }
    
    private int editDistance(String word1, String word2, Map<StringTuple, Integer> computedSolutions) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        StringTuple replaceTuple = new StringTuple(word1.substring(1), word2.substring(1));
        StringTuple deleteTuple = new StringTuple(word1.substring(1), word2);
        StringTuple insertTuple = new StringTuple(word1, word2.substring(1));

        int replace = Util.replaceCost(word1, word2, 0, 0) + transformationCost(replaceTuple, computedSolutions);
        int delete = 1 + transformationCost(deleteTuple, computedSolutions);
        int insert = 1 + transformationCost(insertTuple, computedSolutions);

        int minEditDistance = Util.min(replace, delete, insert);
        computedSolutions.put(new StringTuple(word1, word2), minEditDistance);
        return minEditDistance;
    }
    
    private int transformationCost(StringTuple tuple, Map<StringTuple, Integer> solutions) {
       if (solutions.containsKey(tuple)) return solutions.get(tuple);
       
       int result = editDistance(tuple.s1, tuple.s2, solutions);
       solutions.put(tuple, result);
       return result;
    }
    
    /**
     * Helper class to save previous solutions
     *
     */
    private class StringTuple {
        private final String s1;
        private final String s2;
        public StringTuple(String s1, String s2) {
            this.s1 = s1;
            this.s2 = s2;
        }
        
        @Override
        public int hashCode() {
            return HashCodeBuilder.reflectionHashCode(this);
        }
        
        @Override
        public boolean equals(Object obj) {
            return EqualsBuilder.reflectionEquals(this,obj);
        }
    }
}

Lets see what is going on here. First of all, we are using a Map to store the computed solutions. Since the input to our method are two Strings, we created an auxiliary class with the two Strings that is going to serve as the key to the Map. Our public method has the same interface as before, receiving the 2 inputs and returning the distance. In this memoized version we create the Map here and delegate the calculation to the private method that will make the necessary recursive calls. This private method will use the computedSolutions Map to avoid doing duplicated work. The rest of the algorithm works exactly as the brute force approach we saw before. On each step, we compute (or get the result if it was already computed) the edit distance for the three different possibilities: replace, delete and insert. Now after computing these distances, we save them, take the minimum of them, save the result for the original input and return that.

This algorithm is a huge improvement over the naive recursive one we saw before. Just to give an example, the same invocation that never ended before now takes 0.075 seconds to complete. The good thing about this approach is that we are able to reuse the recursive method that we already had before with some minor modifications. The bad part is that we are doing a lot of comparisons and manipulations of Strings and this tends to be slow.

Since the two strings that we receive as input might be large, lets try to use a bottom-up approach:

public class DPBottomUpEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        int word1Length = word1.length();
        int word2Length = word2.length();
        
        //minCosts[i][j] represents the edit distance of the substrings
        //word1.substring(i) and word2.substring(j)
        int[][] minCosts = new int[word1Length][word2Length];

        //This is the edit distance of the last char of word1 and the last char of word2
        //It can be 0 or 1 depending on whether the two are different or equal
        minCosts[word1Length - 1][word2Length - 1] = Util.replaceCost(word1, word2, word1Length - 1, word2Length - 1);
        
        for (int j = word2Length - 2; j >= 0; j--) {
            minCosts[word1Length - 1][j] = 1 + minCosts[word1Length - 1][j + 1];
        }

        for (int i = word1Length - 2; i >= 0; i--) {
            minCosts[i][word2Length - 1] = 1 + minCosts[i + 1][word2Length - 1];
        }

        for (int i = word1Length - 2; i >= 0; i--) {
            for (int j = word2Length - 2; j >= 0; j--) {
                int replace = Util.replaceCost(word1, word2, i, j) + minCosts[i + 1][j + 1];
                int delete = 1 + minCosts[i + 1][j];
                int insert = 1 + minCosts[i][j + 1];
                minCosts[i][j] = Util.min(replace, delete, insert);
            }
        }
        return minCosts[0][0];
    }
}

Here we create a matrix to hold the values of the edit distances of the different substrings. Instead of keeping references to all the different substrings, like we did on the memoized version, we just keep 2 indices. So minCosts[i][j] is the value for the edit distance between word1[i..n] and word2[j..m]. Given this structure, what is the smallest problem for which the solution is trivial? The one that considers the two last characters of each String: if both characters are equal then their edit distance is 0, otherwise is 1.
Lets follow the algorithm through the original example of “cat” and “cars” to better understand how it works. Suppose word1 = [c, a, t] and word2 = [c, a, r, s]. Then, our matrix will have a size of 3×4 with all places initially set to 0:

\begin{bmatrix}  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0   \end{bmatrix}

Next we compare the two last characters of both Strings, so “t” and “s”. Since they are different we update minCosts[2][3] with 1:

\begin{bmatrix}  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 1   \end{bmatrix}

Once we have that value, we can calculate all the other values for the last row and last column. For instance, what does minCosts[2][2] mean? According to our definition is the edit distance between word1[2..2] and word2[2..3], which in our case means the edit distance between “t” and “rs”. But since we already know that the edit distance between “t” and “s” is 1 (because we can look for that value on the matrix) any extra letter we add to the second string while leaving the first one fixed can only increase the distance by 1. So minCosts[2][2] is equal to minCosts[2][3] + 1, minCosts[2][1] is equal to minCosts[2][2] + 1 and minCosts[2][0] is equal to minCosts[2][1] + 1.
The same reasoning applies if we leave the column fixed and move up through the rows. After this two initial loops our matrix looks like this:

\begin{bmatrix}  0 & 0 & 0 & 3 \\  0 & 0 & 0 & 2 \\  4 & 3 & 2 & 1   \end{bmatrix}

Now we can easily fill in our matrix by following the recurrence formula we defined in the beginning. For each cell we will need the value of the cell to its right, the cell directly below and the cell on its right diagonal. So we can traverse the matrix from bottom to top and from right to left. Applying the recurrence formula to minCosts[1][2] for example, we get that its value is 2. With this value we can calculate minCosts[1][1], minCosts[1][0] and the values for the first row.
Our final matrix is:

\begin{bmatrix}  2 & 3 & 3 & 3 \\  3 & 2 & 2 & 2 \\  4 & 3 & 2 & 1   \end{bmatrix}

So now that we have all our matrix filled up, what is the answer to our original problem? Remember once again that minCosts[i][j] is the value for the edit distance between word1.substring(i) and word2.substring(j). Therefore, since word1.substring(0) == word1, our final answer is the value sitting at minCosts[0][0].

What are the advantages of this bottom-up approach against the memoized version? First, we don’t need the recursive calls. Our implementation is a completely iterative method that just traverses a matrix. Second, we don’t need to keep track of all the possible substrings and operate on them. We simply use the indices of the matrix to represent the substrings which is considerably faster.

To conclude this post, lets recap what we did in order to solve this problem:

  • We identified that we could solve the original problem by splitting it into smaller sub-problems of the same type
  • We assumed that we somehow knew what the answers to the sub-problems were and thought about how we would use those answers to get to the answer for the original problem
  • From the previous step we defined a general recurrence formula that represented the relationships between the different sub-problems. This recurrence formulation is usually the most important part of solving a dynamic programming formula. Once we have it, translating that into the algorithm is usually straightforward
  • We implemented a naive recursive solution and we identified that we were solving the same sub-problems over and over again
  • We modified the previous solution to save and reuse the results by using a memoized recursive method
  • We identified that the recursive calls could be an issue for long strings and so we developed a bottom-up approach. In this version we defined a matrix to hold the results and we figured in which way we needed to fill it by using the exact same recurrence formula defined before

You can find all the implementations we saw here, together with automated tests for each one showing the differences in execution time on https://github.com/jlordiales/edit-distance

In the next posts we’ll see that this same steps can be applied with minor modifications to a huge amount of different problems. Once you get familiar with the basic tips and tricks of dynamic programming most problems are quite similar.

Cheers!!!

Advertisements

Dynamic Programming – Introduction

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2014/02/20/dynamic-programming-introduction

Wow, it’s been a while since I’ve written anything here. Between changing jobs, working on my PhD and moving to a new country I guess you could say I’ve been pretty busy.

But at the same time, together with all these changes in my life there’s a ton of new things that I’m learning almost every day. And, as many others, I feel that writing about them in a way that is easy for others to understand is one of the best ways to consolidate my own learning. So I’ll try to write here more often from now on.

So with that covered lets move to the important stuff. For some reason, lately I’ve been very interested in refreshing old concepts from algorithms and data structures that were a little rusty. Together with these old concepts also came some new concepts and techniques which I’ve never heard about before. But I will address them on separate posts.

On this post I wanted to talk about Dynamic Programming. If you ever took an algorithms university course then you have probably heard about it. If you haven’t, here’s a good chance to learn something extremely useful, easy and intuitive (at least when you understand it properly).

So what exactly is Dynamic Programming? I won’t go into much theory because you can already find all that if you do a simple google search. I will define it as “smart recursion” and, as usual I’ll clarify this with an example.

The classic example to explain dynamic programming is the fibonacci computation, so I’ll also go with that. The definition of the fibonacci number of a number is clearly a recursive one:
F(n) = F(n-1) + F(n-2) and F(1) = F(2) = 1
This means that the sequence of the first 10 fibonacci numbers would go:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55

You might also find it defined as:
F(0) = F(1) = 1
And so the sequence would be:
0,1, 1, 2, 3, 5, 8, 13, 21, 34, 55

For the purposes of this post this difference is irrelevant but I’ll stick to the first one.

Now, this recursive definition translates naturally and quite neatly into the following recursive method:

public static long fibonacci(int n) {
    if (n < 3) return 1;
    return fibonacci(n-2) + fibonacci(n-1);
}

You can even make that a one liner:

public static long fibonacci(int n) {
    return (n < 3) ? 1 : fibonacci(n-2) + fibonacci(n-1);
}

Now that method works and it is certainly elegant but, what happens with the execution time as you start increasing the parameter n. Well, in my laptop it returns almost immediately for any value between 0 and 30. For an n of 40 it takes a bit longer: 0.5 seconds. But for an n equal to 50, it takes almost a full minute to compute the correct value.
You might think that a full minute is not a lot, but any value greater that 70 kills the application for me and any value between 50 and 70 takes too much to finish. How much? I don’t really know because I don’t have the patience to wait and see, but it is certainly more than 30 minutes.

So what is wrong with this method? Well, I haven’t talked about algorithms time and space complexity yet (I probably will in another post) so for now I’ll just say that the execution time of the algorithm grows exponentially as n increases. That’s why there’s such a big difference in time when you execute that method with n = 40 and with n = 50, because there’s a huge difference between 2^40 and 2^50.

The reason why this algorithm behaves like that is also rather easy to see by just following its execution stack for any value of n. Let’s do that for n = 6 to keep it short. The following image shows the sequence of calls that get made.

fibo

Looking at the code, we can clearly see that to compute the value for 6, we first compute the values for 5 and 4. But, similarly, to compute the value of 5 we need the values for 4 and 3 and to compute the values for 4 we need the values for 3 and 2. Once we get to 2 we can end the recursion because we know the result (which is 1).
And here is the problem with this method, notice how many times we are calling fibonacci(4) and how many times we are calling fibonacci(3). This is completely duplicated work that we are doing. Why calculate the same results over and over again if they are never going to change? Once we calculated fibonacci(3) or fibonacci(4) for the first time, we can save that result and reuse it whenever we need to.

And this is exactly what I meant by smart recursion. You can have the natural and simple recursive solution but you need to identify these situations where you are repeating work and avoid them. For n = 6 it wasn’t such a big deal, but as n grows the amount of duplicated work also grows exponentially until it renders the application useless.

So how can we go about improving this? We only need to store previously computed values and we can use any structure that we want for that. In this case, I’ll just use a map:

public static long fibonacci(int n) {
    if (n < 3) return 1;
        
    //Map to store the previous results
    Map<Integer,Long> computedValues = new HashMap<Integer, Long>();
    //The two edge cases
    computedValues.put(1, 1L);
    computedValues.put(2, 1L);
        
    return fibonacci(n,computedValues);
}
    
private static long fibonacci(int n, Map<Integer, Long> computedValues) {
    if (computedValues.containsKey(n)) return computedValues.get(n);
        
    computedValues.put(n-1, fibonacci(n-1,computedValues));
    computedValues.put(n-2, fibonacci(n-2,computedValues));
        
    long newValue = computedValues.get(n-1) + computedValues.get(n-2);
    computedValues.put(n, newValue);
    return newValue;
}

This version is obviously a bit longer that the first one-liner but it is still pretty easy to understand. We now have 2 methods, the main public one that clients call with just the n parameter and a private one that makes the recursive calls. The first method is a useful place to initialize all the necessary information we need to call the second one. This is a pretty common pattern when working with recursive algorithms.
In this case we use a Map to hold the already computed results. We initialize this map with the 2 base cases in the first method and then call the second method with this map. Now, instead of always computing the value we first check if it already is on the map. If it is then we just return that value, otherwise we compute and store the fibonacci number for n-1 and n-2. Before returning their sum, we make sure to store the final value of n.

Notice that we are still following the same structure as the first method we saw. That is, we start from n and we compute smaller results as we need them in order to solve the original problem. That’s why this approach is called top-down. Later we will see a bottom-up approach and compare both. This technique of following a top-down approach and saving previously computed resulted is also known as memoization.

How much better is this version? Well, while the first version took almost a minute to compute the value for n = 50 and never ended for values higher than that, the second memoized version gives an instant answer for any n up to 7000. That is a huge improvement but, as usual, we can do better.

The problem with this new memoized version is that, even when we are saving the results to reuse them later, we still need to go all the way down to the base case with the recursion the first time (when we haven’t computed any values yet and so we don’t have anything stored). So, imagine that we call the method with n = 10000. Because we don’t have that result yet we call the method recursively with 9999, 9998, 9997,…,2. After we start coming back from the recursion, all the n-2 values that we need will be already there so that part is pretty fast.

As with any recursive algorithm, each recursive call takes up some space on the stack. And, if we have enough of those recursive calls, the stack will eventually blow up throwing a StackOverflowException. And this is precisely what happens with our second method when we use values over 10000.

So, what’s the alternative? I mentioned before that the memoized version followed a top-down approach. The obvious thing to do is to go in the opposite direction in a bottom-up fashion. Starting from the small values of n and building the results up to our goal. We’ll still save the already computed values to use them in later stages and avoid duplication of work. This solution looks something like:

public static long fibonacciDP(int n) {
    long[] results = new long[n+1];
    results[1] = 1;
    results[2] = 1;
    for (int i = 3; i <= n; i++) {
        results[i] = results[i-1] + results[i-2];
    }
    return results[n];
}

That actually looks simpler that our memoized version. We just create an array to hold the results, initialize it with the 2 base cases and then start iterating from 3 up to n. At each step, we use the 2 previously computed values to compute the current one. In the end, we return the correct value for n.

In terms of complexity and number of computations, this version is exactly the same as the second one. The difference here is that the last version is iterative and, therefore, doesn’t take space on the stack as the recursive one. We can now compute the fibonacci sequence up to n = 500000 or more and the response time is almost instantaneous.

But we are not entirely over yet, there’s one more thing we can improve. Even though we went from exponential to linear time complexity, we also increased the amount of space required. In the 2 last versions of the algorithm, the space required to store the previous solutions is proportional to n. This is probably pretty clear in our last method, were we create an array of length n. The bigger the n the more space we need.
But you can actually see that the only two values we ever need are the last two (n-1 and n-2). So we don’t really need to keep track of all the previous solutions, just the last two. We can modify the last method to do this:

public static long fibonacciDP(int n) {
    long n1 = 1;
    long n2 = 1;
    long current = 1;
    for (int i = 3; i <= n; i++) {
        current = n1 + n2;
        n2 = n1;
        n1 = current;
    }
    return current;
}

Here, we replaced the array of length n by just 3 variables: the current and the 2 previous values. And so, this last method has a linear time complexity and a constant space complexity, because the number of variables we need to declare doesn’t depend on the size of n.

And that’s as good as we can get with dynamic programming. There’s actually a logarithmic complexity algorithm but I won’t be discussing that one here.

So as we were able to see from these examples, there’s nothing mysterious or inherently hard about dynamic programming. It only requires you to analyze your initial solution, identify places where you are doing duplicated work and avoid that by storing already computed results. This very simple optimization allows you to go from an exponential solution that is inviable for most practical input values to a polynomial solution. Specifically, you want to look for 2 distinctive characteristics of problems that might be solved by dynamic programming: overlapping subproblems and optimal substructure.

Overlapping subproblems refers to the nature of problems like the fibonacci sequence that we saw here, where the main problem (computing fibonacci of n) can be solved by having the solutions to smaller sub-problems and the solutions of these smaller sub-problems are needed again and again. Is in those cases where it makes sense to store the results and re-use them later.
If, however, the sub-problems are completely independent and we only need their results once then it doesn’t make any sense to save them. An example of a problem that can be divided in sub-problems but where those sub-problems are not overlapping is Binary Search. Once we discard the half that we don’t care about, we never visit it again.

Optimal substructure is closely related and it basically means that the optimal solution can be constructed efficiently from optimal solutions of its subproblems. This property wasn’t so obvious in the fibonacci example because there’s only 1 solution for any given n. But this will be more evident in future examples involving optimization of some sort.

I’ll conclude this post with some hints that I use when trying to decide whether or not to use dynamic programming for a given problem:

  • Can the problem be divided into subproblems of the same kind?
  • Can I define the previous division by a recurrence definition? That is, define F(n) as a function of F(n-1)
  • Will I need the results to the sub-problems multiple times or just once?
  • Does it make more sense to use a top-down or a bottom-up approach?
  • Do I need to worry about the stack if I use a memoized recursive approach?
  • Do I need to keep all previous results or can I optimize the space and keep just some of them?

On the next posts I’ll address some classic and pretty interesting problems that can be efficiently solved using dynamic programming.

Cheers!