Configuration management with Archaius (from Netflix)

Posted on October 7, 2014 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2014/10/07/configuration-management-with-archaius-from-netflix
So you usually need to read some configuration variables from your app. Maybe you read them from the system properties like:

String prop = System.getProperty(&quot;myProperty&quot;);
int x = DEFAULT_VALUE;
try {
    x = Integer.parseInt(prop);
} catch (NumberFormatException e) {
    // handle format issues
}
myMethod(x);

Or maybe you have a properties file that you read using Spring. Or maybe you have a simple key/value table in your DB with some properties that you read from there? Or you get them from an external REST endpoint? Or from other type of key/value store like Redis or Memcached?

Whatever the case might be your configuration variables might be coming from a lot of different sources and, specially if your app uses more than one, this can become difficult to maintain. Additionally, you don’t want to do a re-deploy every time you need to change the value of one of your properties, particularly for feature toogles.

Luckily, the Netflix folks already had these issues and came up with a solution that they were kind enough to open source. If you haven’t seen Netflix Github repository I strongly recommend that you take a look. They have some serious cool projects that could be just the thing your application needs. One of those projects is the one that concerns us today: Archaius.

Their wiki and examples should give you a very good idea of what Archaius is and what it is useful for. For now I’ll just say that Archaius is an extension of Apache’s Common Configuration library that allows you to retrieve properties from several dynamic sources and that it solves all the issues mentioned previously (heterogeneous sources of properties, run-time changes, etc.).

Rather than going into much details about what the tool is and does I’ll go with some examples instead.
What is the simplest possible example to use Archaius to read a property file? The well known “Hello World”:

public class ApplicationConfig {

    public String getStringProperty(String key, String defaultValue) {
        final DynamicStringProperty property = DynamicPropertyFactory.getInstance().getStringProperty(key,
            defaultValue);
        return property.get();
    }
}

public class ApplicationConfigTest {
    private ApplicationConfig appConfig = new ApplicationConfig();

    @Test
    public void shouldRetrieveThePropertyByKey() {
        String property = appConfig.getStringProperty(&quot;hello.world.message&quot;, &quot;default message&quot;);

        assertThat(property, is(&quot;Hello Archaius World!&quot;));
    }

    @Test
    public void shouldRetrieveDefaultValueWhenKeyIsNotPresent() {
        String property = appConfig.getStringProperty(&quot;some.key&quot;, &quot;default message&quot;);

        assertThat(property, is(&quot;default message&quot;));
    }
}

That code, together with a “config.properties” file somewhere in your classpath (src/main/resources by convention in Maven and Gradle for example).

Notice that you don’t need to tell Archaius where to find your properties file because the default name that he is going to be looking for is “config.properties”. This example doesn’t seem like a big improvement over what you would usually do with Spring or any other tool but remember that this is just a “Hello World” and bear with me to see more interesting use cases.

What if you don’t want or can’t name your property file “config.property”? In that case you need to tell Archaius where to look for this file. You can easily do this changing the system property “archaius.configurationSource.defaultFileName”, either passing it as a parameter to the vm when you start your application (java … -Darchaius.configurationSource.defaultFileName=customName.properties) or in the code itself:

public class ApplicationConfig {
    static {
        System.setProperty(&quot;archaius.configurationSource.defaultFileName&quot;, &quot;customConfig.properties&quot;);
    }

    public String getStringProperty(String key, String defaultValue) {
        final DynamicStringProperty property = DynamicPropertyFactory.getInstance().getStringProperty(key,
            defaultValue);
        return property.get();
    }
}

Simple enough, right?

Now, what if you want to read several properties files? You can easily define a chain of property files and the order in which they should be loaded starting from the default file which is loaded first. From there, you can specify a special property with key “@next=nextFile.properties” to tell Archaius which is the next file that should be loaded.
In our example so far, we could add the following line to our “customConfig.properties” file:
“@next=secondConfig.properties” and add the corresponding “secondConfig.properties” to our resources folder with the following content:
“cascade.property=cascade value”.

We can see this working by adding the following test to our ApplicationConfigTest class:

@Test
public void shouldReadCascadeConfigurationFiles() {
   String property = appConfig.getStringProperty(&quot;cascade.property&quot;, &quot;not found&quot;);

   assertThat(property, is(&quot;cascade value&quot;));
}

Note that we are getting the property from the new file without any additional change to our ApplicationConfig class. This is completely transparent from the point of view of our client.

Until now, we have been reading properties just from different property files but what if you want to read them from a different source?
In the most general case, you can code your own logic by implementing “com.netflix.config.PolledConfigurationSource”.
If that new source is a database that can be accessed through JDBC then Archaius already provides a “JDBCConfigurationSource” that you can use. You only need to tell him what query he should use to get the properties and which columns represent the property key and property value.

So our example would look like:

@Component
public class ApplicationConfig {
    static {
        System.setProperty(&quot;archaius.configurationSource.defaultFileName&quot;, &quot;customConfig.properties&quot;);
    }

    private final DataSource dataSource;

    @Autowired
    public ApplicationConfig(DataSource dataSource) {
        this.dataSource = dataSource;
        installJdbcSource();
    }

    public String getStringProperty(String key, String defaultValue) {
        final DynamicStringProperty property = DynamicPropertyFactory.getInstance().getStringProperty(key,
            defaultValue);
        return property.get();
    }

    private void installJdbcSource() {
        if (!isConfigurationInstalled()) {
            PolledConfigurationSource source = new JDBCConfigurationSource(dataSource,
                &quot;select distinct property_key, property_value from properties&quot;, &quot;property_key&quot;, &quot;property_value&quot;);
            DynamicConfiguration configuration = new DynamicConfiguration(source,
                new FixedDelayPollingScheduler(100, 1000, true));

            ConfigurationManager.install(configuration);
        }
    }
}

We are using Spring to autowire a data source that will use an in-memory H2 database with a simple key/value table. Note how we create a new PolledConfigurationSource using the JDBCConfigurationSource already provided by Archaius and then we register the new configuration using the ConfigurationManager. After doing this we can get any property from the DB exactly the same way we do it for properties files (i.e., using the DynamicPropertyFactory).

We can now add a couple of tests to make sure that we are actually reading properties from the DB and that we can update their values and see the changes reflected in our dynamic configuration.

    @Test
    public void shouldRetrievePropertyFromDB() {
        String property = appConfig.getStringProperty(&quot;db.property&quot;, &quot;default message&quot;);

        assertThat(property, is(&quot;this is a db property&quot;));
    }

    @Test
    public void shouldReadTheNewValueAfterTheSpecifiedDelay() throws InterruptedException {
        template.update(&quot;update properties set property_value = 'changed value' where property_key = 'db.property'&quot;);
        String property = appConfig.getStringProperty(&quot;db.property&quot;, &quot;default message&quot;);

        //We updated the value on the DB but Archaius polls for changes every 1000 milliseconds so it still sees the old value
        assertThat(property, is(&quot;this is a db property&quot;));

        Thread.sleep(1500);

        property = appConfig.getStringProperty(&quot;db.property&quot;, &quot;default message&quot;);
        assertThat(property, is(&quot;changed value&quot;));
    }

To conclude this post, another really cool feature offered by Archaius is the possibility to register our configurations as MBeans via JMX. We can do this by default setting the system property archaius.dynamicPropertyFactory.registerConfigWithJMX=true or programmatically with ConfigJMXManager.registerConfigMbean(config);.

After doing this we can connect via JConsole and not only get the value of all properties but also update them and see their new value reflected in Archaius. This would allow us, for instance, to change the values of properties defined statically in property files during runtime without the need for a server push.
We can modify our ApplicationConfig class a little bit to add a main method that will keep running printing the values of different properties, allowing us to play around in JConsole.

public class ApplicationConfig extends Thread {

    private final DataSource dataSource;

    @Autowired
    public ApplicationConfig(DataSource dataSource) {
        this.dataSource = dataSource;
        cascadeDefaultConfiguration();
        DynamicConfiguration jdbcSource = installJdbcSource();
        registerMBean(jdbcSource);
    }

    public String getStringProperty(String key, String defaultValue) {
        final DynamicStringProperty property = DynamicPropertyFactory.getInstance().getStringProperty(key,
            defaultValue);
        return property.get();
    }

    @Override
    public void run() {
        while (true) {
            try {
                sleep(100);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }

        }
    }

    private void registerMBean(DynamicConfiguration jdbcSource) {
        setDaemon(false);
        ConfigJMXManager.registerConfigMbean(jdbcSource);
    }

    public static void main(String[] args) {
        ApplicationContext applicationContext = new ClassPathXmlApplicationContext(&quot;archaiusContext.xml&quot;);
        ApplicationConfig applicationConfig = (ApplicationConfig) applicationContext.getBean(&quot;applicationConfig&quot;);

        applicationConfig.start();

        while (true) {
            try {
                System.out.println(applicationConfig.getStringProperty(&quot;hello.world.message&quot;, &quot;default message&quot;));
                System.out.println(applicationConfig.getStringProperty(&quot;cascade.property&quot;, &quot;default message&quot;));
                System.out.println(applicationConfig.getStringProperty(&quot;db.property&quot;, &quot;default message&quot;));
                sleep(3000);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }

        }
    }
}

And that’s it for now. You can find all the code showed in this post at https://github.com/jlordiales/archaius-example.

There’s a lot more you can do with Archaius, like Callbacks everytime a property changes, integration with Zookeeper and other services. Check out the Archaius project and examples at their Github repo.

Easy to understand Dynamic Programming – Edit distance

Posted on March 1, 2014 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2014/03/01/dynamic-programming-edit-distance

Following the topic of the last post, I will discuss another problem that can be solved efficiently using dynamic programming. Unlike the Fibonacci sequence that we saw on the introduction post, the problem that I present here is a bit more tricky to solve (which also makes it more interesting).

The reason I’ll review this problem and other similar ones in future posts is not just to show how to solve them. Anyone with access to google can search for one of the 100 already implemented solutions. The idea here is two show some key concepts that you can apply to any dynamic problem.

So lets start with the well known Edit distance problem. Basically, given two strings A and B, the edit distance measures the minimum number of operations required to transform one string into the other. In the most common version of this problem we can apply 3 different operations:

Insert a new character into one of the strings
Delete an existing character
Replace one character by another

For example, given the strings A = “cat” and B = “cars”, editDistance(A,B) = 2 because the minimum number of transformations that we need to make is replace the “t” in A by “r” and then remove the “s” from B. After that, both strings are equal to “car”. Notice that in this case another possibility would have been to replace the “t” by an “r” but then insert an “s” into A, making both strings equal to “cars”. Similarly we could change both into “cat” or “cats” and the edit distance is still 2.

So how would we go about solving this problem? Lets try to solve a much smaller and simpler problem. Imagine we have the two previous strings again, this time represented as an array of chars in the general form:
A = [A0, A1, A2,…, An]
B = [B0, B1, B2,…, Bm]

Notice that the lengths of A and B (n and m respectively) might be different.

We know that in the end, both strings will need to have the same length and match their characters on each position. So, if we take just the first character of A and B, what choices do we have? We have 3 different transformations that we can apply so we have 3 choices as well:

We can replace the first character of A by the first character of B. The cost of doing this replacement will be 1 if the characters are different and 0 if they are equal (because in that case we don’t need to do anything). At this point we know that both strings start with the same character so we can compute the edit distance of A[1..n] = [A1,…, An] and B[1..m] = [B1,…, Bm]. The final result will be the cost of transforming A[0] into B[0] plus the edit distance of the remaining substrings.
The second choice we have is inserting a character into A to match the character in B[0], which has a cost of 1. After we have done this, we still need to consider the original A string because we have added a new character to it but we haven’t process any of its original characters. However, we don’t need to consider B[0] anymore because we know that it matches A[0] at this point. Therefore, the only thing we need to do now is to compute the edit distance of the original A and B[1..m] = [B1,…, Bm]. The final result will be this value plus 1 (the cost of inserting the character into A).
The last case is symmetrical to the previous one. The third transformation we can apply is simply to delete A[0]. This operation has a cost of 1 as well. After doing this, we have consumed one character of A but we still have all the original characters of B to process. So we simply need to compute the edit distance of A[1..n] = [A1,…, An] and B. Again, the final value of the edit distance will be this value plus 1 (the cost of deleting A[0].

So which of the three choices should we pick initially? Well, we don’t really know which one will be better in the end so we have to try them all. The answer to our original problem will be the minimum value of those three alternatives.

The previous description is the key to solving this problem so read it again if it is not clear. Notice that, the last step on each of the three alternatives involves computing the edit distance of 2 substrings of A and B. This is exactly the same problem as the original one we are trying to solve, i.e., solving the edit distance for 2 strings. The difference is that after each decision we take, the problem becomes smaller because one or both input strings become smaller. We are solving the original problem by solving smaller sub-problems of exactly the same type. The fact that the sub-problems become smaller on each step is crucial to be sure that we’ll terminate the algorithm at some point. Otherwise, we could keep going solving sub-problems indefinitely.

So when do we stop solving sub-problems? We can stop as soon as we get to a case which is trivial to solve: a base case. In our case that is when one or both input strings are empty. If both strings are empty then the edit distance between them is 0, because they are already equal. Otherwise, the edit distance will be the length of the string that is not empty. For example, to transform “aab” into “” we can either remove the 3 characters from the first string or insert them into the second string. In any case, the edit distance is 3.

What we have done so far is defining the solution to our problem in terms of the solution to a smaller sub-problem of the same type and identified the base case when we already know what the result is. This should be a clear hint that a recursive algorithm can be applied.
But first, lets translate the three choices discussed previously describing the relationship between sub-problems into something that will be more helpful when trying to code this. The recurrence relation can be defined as:

$editDistance(A,B) = min =\begin{cases} editDistance(A[1..n],B[1..m]) + replaceCost(A[0],B[0])\\editDistance(A[1..n],B[0..m]) + 1\\editDistance(A[0..n],B[1..m]) +1\end{cases}$

The first case of the previous formula is when we replace A[0] by B[0], the second one is when we delete A[0] and the last one is when we insert into A.
The base cases are:
$editDistance(A,"") = length(A)$
$editDistance("",B) = length(B)$

Notice that the case where both strings are empty is already covered by the first base case because length(A) == length(“”) == 0.

Translating this definition into a recursive algorithm is relatively straightforward:

public class BruteForceEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        int replace = editDistance(word1.substring(1), word2.substring(1)) + Util.replaceCost(word1, word2, 0, 0);
        int delete = editDistance(word1.substring(1), word2) + 1;
        int insert = editDistance(word1, word2.substring(1)) + 1;
        return Util.min(replace, delete, insert);
    }
}

The Util class has some useful methods that we’ll use on the different implementations of the algorithm and it looks something like this:

public class Util {
    /**
     * Prevent instantiation
     */
    private Util(){}

    public static int replaceCost(String w1, String w2, int w1Index, int w2Index) {
        return (w1.charAt(w1Index) == w2.charAt(w2Index)) ? 0 : 1;
    }

    public static int min(int... numbers) {
        int result = Integer.MAX_VALUE;
        for (int each : numbers) {
            result = Math.min(result, each);
        }
        return result;
    }
}

The code looks concise and it gives the expected answer. But as you can probably imagine, given that we are talking about dynamic programming, this brute force approach is far from efficient. As with the fibonacci example that we saw on the last post, this algorithm computes the same answer multiple times causing an exponential explosion of different paths that we need to explore.
To see this, lets consider the sequence of calls that are made when we invoke this method with word1 = “ABD” and word2 = “AE”:

Edit distance example

And that is only for two strings of length 3 and 2. Imagine what that picture looks like when you have proper words or even sentences. In my laptop, for any two strings with 10 or more characters the method never finishes. This approach clearly won’t scale.

So, can we apply dynamic programming to this problem? Remember the two basic properties of a dynamic problem that we discussed in the previous post: overlapping sub-problems and optimal substructure.
As we just saw on the example of the previous figure, the edit distance problem clearly has overlapping sub-problems because we are solving smaller sub-problems of the same type to get to the final solution and we need to solve the same sub-problems multiple times.
What about optimal substructure? Can we compute the optimal solution if we have optimal solutions for the sub-problems? Of course! In our case we are trying to minimize the number of transformations, so if we have the optimal solution to the three cases we consider (replace, insert and delete) then we get the minimum from those 3 and that’s our optimal solution.

Remember from the previous post that we could have a top-down dynamic programming approach where we memoize the recursive implementation or a bottom-up approach. The latter tends to be more efficient because you avoid the recursive calls. More importantly, the choice between these two can be the difference between a working and a non-working algorithm if the number of recursive calls you need to make to get to the base case is too large. If this is not a problem, then the choice of which one to use usually depends on personal preference or style.
Here’s a possible top-down algorithm first:

public class DPMemoizedEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        return editDistance(word1, word2, new HashMap<StringTuple, Integer>());
    }
    
    private int editDistance(String word1, String word2, Map<StringTuple, Integer> computedSolutions) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        StringTuple replaceTuple = new StringTuple(word1.substring(1), word2.substring(1));
        StringTuple deleteTuple = new StringTuple(word1.substring(1), word2);
        StringTuple insertTuple = new StringTuple(word1, word2.substring(1));

        int replace = Util.replaceCost(word1, word2, 0, 0) + transformationCost(replaceTuple, computedSolutions);
        int delete = 1 + transformationCost(deleteTuple, computedSolutions);
        int insert = 1 + transformationCost(insertTuple, computedSolutions);

        int minEditDistance = Util.min(replace, delete, insert);
        computedSolutions.put(new StringTuple(word1, word2), minEditDistance);
        return minEditDistance;
    }
    
    private int transformationCost(StringTuple tuple, Map<StringTuple, Integer> solutions) {
       if (solutions.containsKey(tuple)) return solutions.get(tuple);
       
       int result = editDistance(tuple.s1, tuple.s2, solutions);
       solutions.put(tuple, result);
       return result;
    }
    
    /**
     * Helper class to save previous solutions
     *
     */
    private class StringTuple {
        private final String s1;
        private final String s2;
        public StringTuple(String s1, String s2) {
            this.s1 = s1;
            this.s2 = s2;
        }
        
        @Override
        public int hashCode() {
            return HashCodeBuilder.reflectionHashCode(this);
        }
        
        @Override
        public boolean equals(Object obj) {
            return EqualsBuilder.reflectionEquals(this,obj);
        }
    }
}

Lets see what is going on here. First of all, we are using a Map to store the computed solutions. Since the input to our method are two Strings, we created an auxiliary class with the two Strings that is going to serve as the key to the Map. Our public method has the same interface as before, receiving the 2 inputs and returning the distance. In this memoized version we create the Map here and delegate the calculation to the private method that will make the necessary recursive calls. This private method will use the computedSolutions Map to avoid doing duplicated work. The rest of the algorithm works exactly as the brute force approach we saw before. On each step, we compute (or get the result if it was already computed) the edit distance for the three different possibilities: replace, delete and insert. Now after computing these distances, we save them, take the minimum of them, save the result for the original input and return that.

This algorithm is a huge improvement over the naive recursive one we saw before. Just to give an example, the same invocation that never ended before now takes 0.075 seconds to complete. The good thing about this approach is that we are able to reuse the recursive method that we already had before with some minor modifications. The bad part is that we are doing a lot of comparisons and manipulations of Strings and this tends to be slow.

Since the two strings that we receive as input might be large, lets try to use a bottom-up approach:

public class DPBottomUpEditDistance implements EditDistance {

    public int editDistance(String word1, String word2) {
        if (word1.isEmpty()) return word2.length();
        if (word2.isEmpty()) return word1.length();

        int word1Length = word1.length();
        int word2Length = word2.length();
        
        //minCosts[i][j] represents the edit distance of the substrings
        //word1.substring(i) and word2.substring(j)
        int[][] minCosts = new int[word1Length][word2Length];

        //This is the edit distance of the last char of word1 and the last char of word2
        //It can be 0 or 1 depending on whether the two are different or equal
        minCosts[word1Length - 1][word2Length - 1] = Util.replaceCost(word1, word2, word1Length - 1, word2Length - 1);
        
        for (int j = word2Length - 2; j >= 0; j--) {
            minCosts[word1Length - 1][j] = 1 + minCosts[word1Length - 1][j + 1];
        }

        for (int i = word1Length - 2; i >= 0; i--) {
            minCosts[i][word2Length - 1] = 1 + minCosts[i + 1][word2Length - 1];
        }

        for (int i = word1Length - 2; i >= 0; i--) {
            for (int j = word2Length - 2; j >= 0; j--) {
                int replace = Util.replaceCost(word1, word2, i, j) + minCosts[i + 1][j + 1];
                int delete = 1 + minCosts[i + 1][j];
                int insert = 1 + minCosts[i][j + 1];
                minCosts[i][j] = Util.min(replace, delete, insert);
            }
        }
        return minCosts[0][0];
    }
}

Here we create a matrix to hold the values of the edit distances of the different substrings. Instead of keeping references to all the different substrings, like we did on the memoized version, we just keep 2 indices. So minCosts[i][j] is the value for the edit distance between word1[i..n] and word2[j..m]. Given this structure, what is the smallest problem for which the solution is trivial? The one that considers the two last characters of each String: if both characters are equal then their edit distance is 0, otherwise is 1.
Lets follow the algorithm through the original example of “cat” and “cars” to better understand how it works. Suppose word1 = [c, a, t] and word2 = [c, a, r, s]. Then, our matrix will have a size of 3×4 with all places initially set to 0:

$\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}$

Next we compare the two last characters of both Strings, so “t” and “s”. Since they are different we update minCosts[2][3] with 1:

$\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$

Once we have that value, we can calculate all the other values for the last row and last column. For instance, what does minCosts[2][2] mean? According to our definition is the edit distance between word1[2..2] and word2[2..3], which in our case means the edit distance between “t” and “rs”. But since we already know that the edit distance between “t” and “s” is 1 (because we can look for that value on the matrix) any extra letter we add to the second string while leaving the first one fixed can only increase the distance by 1. So minCosts[2][2] is equal to minCosts[2][3] + 1, minCosts[2][1] is equal to minCosts[2][2] + 1 and minCosts[2][0] is equal to minCosts[2][1] + 1.
The same reasoning applies if we leave the column fixed and move up through the rows. After this two initial loops our matrix looks like this:

$\begin{bmatrix} 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 2 \\ 4 & 3 & 2 & 1 \end{bmatrix}$

Now we can easily fill in our matrix by following the recurrence formula we defined in the beginning. For each cell we will need the value of the cell to its right, the cell directly below and the cell on its right diagonal. So we can traverse the matrix from bottom to top and from right to left. Applying the recurrence formula to minCosts[1][2] for example, we get that its value is 2. With this value we can calculate minCosts[1][1], minCosts[1][0] and the values for the first row.
Our final matrix is:

$\begin{bmatrix} 2 & 3 & 3 & 3 \\ 3 & 2 & 2 & 2 \\ 4 & 3 & 2 & 1 \end{bmatrix}$

So now that we have all our matrix filled up, what is the answer to our original problem? Remember once again that minCosts[i][j] is the value for the edit distance between word1.substring(i) and word2.substring(j). Therefore, since word1.substring(0) == word1, our final answer is the value sitting at minCosts[0][0].

What are the advantages of this bottom-up approach against the memoized version? First, we don’t need the recursive calls. Our implementation is a completely iterative method that just traverses a matrix. Second, we don’t need to keep track of all the possible substrings and operate on them. We simply use the indices of the matrix to represent the substrings which is considerably faster.

To conclude this post, lets recap what we did in order to solve this problem:

We identified that we could solve the original problem by splitting it into smaller sub-problems of the same type
We assumed that we somehow knew what the answers to the sub-problems were and thought about how we would use those answers to get to the answer for the original problem
From the previous step we defined a general recurrence formula that represented the relationships between the different sub-problems. This recurrence formulation is usually the most important part of solving a dynamic programming formula. Once we have it, translating that into the algorithm is usually straightforward
We implemented a naive recursive solution and we identified that we were solving the same sub-problems over and over again
We modified the previous solution to save and reuse the results by using a memoized recursive method
We identified that the recursive calls could be an issue for long strings and so we developed a bottom-up approach. In this version we defined a matrix to hold the results and we figured in which way we needed to fill it by using the exact same recurrence formula defined before

You can find all the implementations we saw here, together with automated tests for each one showing the differences in execution time on https://github.com/jlordiales/edit-distance

In the next posts we’ll see that this same steps can be applied with minor modifications to a huge amount of different problems. Once you get familiar with the basic tips and tricks of dynamic programming most problems are quite similar.

Cheers!!!

Dynamic Programming – Introduction

Posted on February 20, 2014 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2014/02/20/dynamic-programming-introduction

Wow, it’s been a while since I’ve written anything here. Between changing jobs, working on my PhD and moving to a new country I guess you could say I’ve been pretty busy.

But at the same time, together with all these changes in my life there’s a ton of new things that I’m learning almost every day. And, as many others, I feel that writing about them in a way that is easy for others to understand is one of the best ways to consolidate my own learning. So I’ll try to write here more often from now on.

So with that covered lets move to the important stuff. For some reason, lately I’ve been very interested in refreshing old concepts from algorithms and data structures that were a little rusty. Together with these old concepts also came some new concepts and techniques which I’ve never heard about before. But I will address them on separate posts.

On this post I wanted to talk about Dynamic Programming. If you ever took an algorithms university course then you have probably heard about it. If you haven’t, here’s a good chance to learn something extremely useful, easy and intuitive (at least when you understand it properly).

So what exactly is Dynamic Programming? I won’t go into much theory because you can already find all that if you do a simple google search. I will define it as “smart recursion” and, as usual I’ll clarify this with an example.

The classic example to explain dynamic programming is the fibonacci computation, so I’ll also go with that. The definition of the fibonacci number of a number is clearly a recursive one:
F(n) = F(n-1) + F(n-2) and F(1) = F(2) = 1
This means that the sequence of the first 10 fibonacci numbers would go:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55

You might also find it defined as:
F(0) = F(1) = 1
And so the sequence would be:
0,1, 1, 2, 3, 5, 8, 13, 21, 34, 55

For the purposes of this post this difference is irrelevant but I’ll stick to the first one.

Now, this recursive definition translates naturally and quite neatly into the following recursive method:

public static long fibonacci(int n) {
    if (n < 3) return 1;
    return fibonacci(n-2) + fibonacci(n-1);
}

You can even make that a one liner:

public static long fibonacci(int n) {
    return (n < 3) ? 1 : fibonacci(n-2) + fibonacci(n-1);
}

Now that method works and it is certainly elegant but, what happens with the execution time as you start increasing the parameter n. Well, in my laptop it returns almost immediately for any value between 0 and 30. For an n of 40 it takes a bit longer: 0.5 seconds. But for an n equal to 50, it takes almost a full minute to compute the correct value.
You might think that a full minute is not a lot, but any value greater that 70 kills the application for me and any value between 50 and 70 takes too much to finish. How much? I don’t really know because I don’t have the patience to wait and see, but it is certainly more than 30 minutes.

So what is wrong with this method? Well, I haven’t talked about algorithms time and space complexity yet (I probably will in another post) so for now I’ll just say that the execution time of the algorithm grows exponentially as n increases. That’s why there’s such a big difference in time when you execute that method with n = 40 and with n = 50, because there’s a huge difference between 2^40 and 2^50.

The reason why this algorithm behaves like that is also rather easy to see by just following its execution stack for any value of n. Let’s do that for n = 6 to keep it short. The following image shows the sequence of calls that get made.

Looking at the code, we can clearly see that to compute the value for 6, we first compute the values for 5 and 4. But, similarly, to compute the value of 5 we need the values for 4 and 3 and to compute the values for 4 we need the values for 3 and 2. Once we get to 2 we can end the recursion because we know the result (which is 1).
And here is the problem with this method, notice how many times we are calling fibonacci(4) and how many times we are calling fibonacci(3). This is completely duplicated work that we are doing. Why calculate the same results over and over again if they are never going to change? Once we calculated fibonacci(3) or fibonacci(4) for the first time, we can save that result and reuse it whenever we need to.

And this is exactly what I meant by smart recursion. You can have the natural and simple recursive solution but you need to identify these situations where you are repeating work and avoid them. For n = 6 it wasn’t such a big deal, but as n grows the amount of duplicated work also grows exponentially until it renders the application useless.

So how can we go about improving this? We only need to store previously computed values and we can use any structure that we want for that. In this case, I’ll just use a map:

public static long fibonacci(int n) {
    if (n < 3) return 1;
        
    //Map to store the previous results
    Map<Integer,Long> computedValues = new HashMap<Integer, Long>();
    //The two edge cases
    computedValues.put(1, 1L);
    computedValues.put(2, 1L);
        
    return fibonacci(n,computedValues);
}
    
private static long fibonacci(int n, Map<Integer, Long> computedValues) {
    if (computedValues.containsKey(n)) return computedValues.get(n);
        
    computedValues.put(n-1, fibonacci(n-1,computedValues));
    computedValues.put(n-2, fibonacci(n-2,computedValues));
        
    long newValue = computedValues.get(n-1) + computedValues.get(n-2);
    computedValues.put(n, newValue);
    return newValue;
}

This version is obviously a bit longer that the first one-liner but it is still pretty easy to understand. We now have 2 methods, the main public one that clients call with just the n parameter and a private one that makes the recursive calls. The first method is a useful place to initialize all the necessary information we need to call the second one. This is a pretty common pattern when working with recursive algorithms.
In this case we use a Map to hold the already computed results. We initialize this map with the 2 base cases in the first method and then call the second method with this map. Now, instead of always computing the value we first check if it already is on the map. If it is then we just return that value, otherwise we compute and store the fibonacci number for n-1 and n-2. Before returning their sum, we make sure to store the final value of n.

Notice that we are still following the same structure as the first method we saw. That is, we start from n and we compute smaller results as we need them in order to solve the original problem. That’s why this approach is called top-down. Later we will see a bottom-up approach and compare both. This technique of following a top-down approach and saving previously computed resulted is also known as memoization.

How much better is this version? Well, while the first version took almost a minute to compute the value for n = 50 and never ended for values higher than that, the second memoized version gives an instant answer for any n up to 7000. That is a huge improvement but, as usual, we can do better.

The problem with this new memoized version is that, even when we are saving the results to reuse them later, we still need to go all the way down to the base case with the recursion the first time (when we haven’t computed any values yet and so we don’t have anything stored). So, imagine that we call the method with n = 10000. Because we don’t have that result yet we call the method recursively with 9999, 9998, 9997,…,2. After we start coming back from the recursion, all the n-2 values that we need will be already there so that part is pretty fast.

As with any recursive algorithm, each recursive call takes up some space on the stack. And, if we have enough of those recursive calls, the stack will eventually blow up throwing a StackOverflowException. And this is precisely what happens with our second method when we use values over 10000.

So, what’s the alternative? I mentioned before that the memoized version followed a top-down approach. The obvious thing to do is to go in the opposite direction in a bottom-up fashion. Starting from the small values of n and building the results up to our goal. We’ll still save the already computed values to use them in later stages and avoid duplication of work. This solution looks something like:

public static long fibonacciDP(int n) {
    long[] results = new long[n+1];
    results[1] = 1;
    results[2] = 1;
    for (int i = 3; i <= n; i++) {
        results[i] = results[i-1] + results[i-2];
    }
    return results[n];
}

That actually looks simpler that our memoized version. We just create an array to hold the results, initialize it with the 2 base cases and then start iterating from 3 up to n. At each step, we use the 2 previously computed values to compute the current one. In the end, we return the correct value for n.

In terms of complexity and number of computations, this version is exactly the same as the second one. The difference here is that the last version is iterative and, therefore, doesn’t take space on the stack as the recursive one. We can now compute the fibonacci sequence up to n = 500000 or more and the response time is almost instantaneous.

But we are not entirely over yet, there’s one more thing we can improve. Even though we went from exponential to linear time complexity, we also increased the amount of space required. In the 2 last versions of the algorithm, the space required to store the previous solutions is proportional to n. This is probably pretty clear in our last method, were we create an array of length n. The bigger the n the more space we need.
But you can actually see that the only two values we ever need are the last two (n-1 and n-2). So we don’t really need to keep track of all the previous solutions, just the last two. We can modify the last method to do this:

public static long fibonacciDP(int n) {
    long n1 = 1;
    long n2 = 1;
    long current = 1;
    for (int i = 3; i <= n; i++) {
        current = n1 + n2;
        n2 = n1;
        n1 = current;
    }
    return current;
}

Here, we replaced the array of length n by just 3 variables: the current and the 2 previous values. And so, this last method has a linear time complexity and a constant space complexity, because the number of variables we need to declare doesn’t depend on the size of n.

And that’s as good as we can get with dynamic programming. There’s actually a logarithmic complexity algorithm but I won’t be discussing that one here.

So as we were able to see from these examples, there’s nothing mysterious or inherently hard about dynamic programming. It only requires you to analyze your initial solution, identify places where you are doing duplicated work and avoid that by storing already computed results. This very simple optimization allows you to go from an exponential solution that is inviable for most practical input values to a polynomial solution. Specifically, you want to look for 2 distinctive characteristics of problems that might be solved by dynamic programming: overlapping subproblems and optimal substructure.

Overlapping subproblems refers to the nature of problems like the fibonacci sequence that we saw here, where the main problem (computing fibonacci of n) can be solved by having the solutions to smaller sub-problems and the solutions of these smaller sub-problems are needed again and again. Is in those cases where it makes sense to store the results and re-use them later.
If, however, the sub-problems are completely independent and we only need their results once then it doesn’t make any sense to save them. An example of a problem that can be divided in sub-problems but where those sub-problems are not overlapping is Binary Search. Once we discard the half that we don’t care about, we never visit it again.

Optimal substructure is closely related and it basically means that the optimal solution can be constructed efficiently from optimal solutions of its subproblems. This property wasn’t so obvious in the fibonacci example because there’s only 1 solution for any given n. But this will be more evident in future examples involving optimization of some sort.

I’ll conclude this post with some hints that I use when trying to decide whether or not to use dynamic programming for a given problem:

Can the problem be divided into subproblems of the same kind?
Can I define the previous division by a recurrence definition? That is, define F(n) as a function of F(n-1)
Will I need the results to the sub-problems multiple times or just once?
Does it make more sense to use a top-down or a bottom-up approach?
Do I need to worry about the stack if I use a memoized recursive approach?
Do I need to keep all previous results or can I optimize the space and keep just some of them?

On the next posts I’ll address some classic and pretty interesting problems that can be efficiently solved using dynamic programming.

Cheers!

When git ignores your… .gitignore?

Posted on December 28, 2012 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2012/12/28/when-git-ignores-your-gitignore

I feel like I should start this post saying that I absolutely love git. If you’ve never heard of it, is a source control system like CVS or Subversion but, unlike those two, is a distributed version control system. I’m not going to get into much details about the history and capabilities of git but if you’re curious about it you can go to http://git-scm.com/book which is an amazing resource with everything from intro to advanced concepts.

I imagine that by now most professional software developers use some form of version control at their daily jobs but you shouldn’t stop there. I use git for personal, one-man projects as well. While some people might think that this could be an overkill, I completely disagree. There is nothing like the comfort in knowing that all the history of all your files is safe and ready to be brought back if you ever need them. We all make mistakes sometimes after all. With git this is as easy as writing 3 simple commands:

mkdir myNewProject
cd myNewProject
git init

That’s it! Every file you create or modify on “myNewProject” will be now tracked by git.

A pretty useful feature that you get with every source control tool is the possibility to ignore certain files from the tool’s tracking. Generally speaking, you don’t want to get into your code repository any file that can be computed as a result of another file. In a typical Java Maven project this would be for example the “target” directory, or if you are using Eclipse the “.metadata”, “.project” or “.settings” files. In git the easiest way to do this is to have a special file named “.gitignore” at the root of your project with all the exclusion rules you want to set. The syntax of this file is fairly straightforward. You can also have a “.gitignore” file for each subdirectory in your project, but this is less common.

A tricky thing with git and ignore rules is that if the file you want to ignore is already being tracked by git then adding it to “.gitignore” won’t make git to automatically forget about the file. To illustrate this point, consider the following example. First we create a repository with two initial files and commit them:

mkdir gitExample
cd gitExample
touch file1 file2
git init
git add .
git commit -m "Initial commit"

Let’s create now the .gitignore file to try to ignore “file2” and commit that:

echo file2 > .gitignore
git add .
git commit -m "Added gitignore for file2"

Now, let’s modify “file2” and see what happens:

echo "Hello World" >> file2
git status

We get:
# On branch master # Changes not staged for commit: # (use "git add ..." to update what will be committed) # (use "git checkout -- ..." to discard changes in working directory) # # modified: file2 # no changes added to commit (use "git add" and/or "git commit -a")

Git is effectively still tracking file2 even though is already on our .gitignore. Like I said before, this happens because git was already tracking the file when we added it to our ignores. So let’s see what happens when we add the “.gitignore” before adding the file to git:

mkdir gitExample
cd gitExample
touch file1 file2
git init
echo "file2" > .gitignore
git status

And now we get:
# On branch master # # Initial commit # # Untracked files: # (use "git add ..." to include in what will be committed) # # .gitignore # file1 nothing added to commit but untracked files present (use "git add" to track)

Cool! No mention of file2 anywhere! But if, as our first example, we forgot to add the files to our .gitignore initially? How do we stop git from tracking them? A nice command we can use for this cases is git rm --cached file. In our first example:

git rm --cached file2
git commit -m "Removed file2"

If we now modify the file again and do a git status we get:

echo "Hello World Again" >> file2
git status

# On branch master nothing to commit (working directory clean)

Exactly what we wanted!

Note that this little command will remove the file from git index but it won’t do anything with your working copy. That means that you will still have the file on your directory but the file won’t be a part of the git repository anymore. This also implies that the next time you do a push to a remote repository the file won’t be pushed and, if it already existed on the remote repository, it will be deleted.

This is fine for a typical use case when you added a file that you never intended to have on the repository. But consider this other use case. In a lot of projects the developers upload into the remote central repository a set of config files for their IDE with default values for things like code formatting and style checking. But at the same time, when developers clone the repository they customize those config files to suit their personal preferences. However, those changes that apply to each particular developer should not be commited and pushed back to the repository. The problem with using git rm --cached here is that, while each developer will still have their own copy, the next time they push to the server they’ll remove the default config from there. In cases like this, there is another pretty useful command that will do the trick: git update-index --assume-unchanged <file>. Let’s see that with an example:

mkdir gitExample
cd gitExample
touch file1 file2
git init
git add .
git commit -m "Initial commit"

There we have our default “file2”. Now let’s use the git update-index command and make some changes to the file:

git update-index --assume-unchanged file2
echo "Hello World" >> file2
git status

The result:
# On branch master nothing to commit (working directory clean)

Magic! Changes to our file are no longer seen by git and the original file is still on the repository to be cloned by other users with its default value.

Hope this helps someone! Cheers!

Static factory methods vs traditional constructors

Posted on December 26, 2012 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2012/12/26/static-factory-methods-vs-traditional-constructors

I’ve previously talked a little bit about the Builder Pattern, a useful pattern to instantiate classes with several (possibly optional) attributes that results in easier to read, write and maintain client code, among other benefits. Today, I’m going to continue exploring object creation techniques but this time for a more general case.

Take the following example, which is by no means a useful class other than to make my point. We have a RandomIntGenerator class that, as the name suggests, generates random int numbers. Something like:

public class RandomIntGenerator {
    private final int min;
    private final int max;

    public int next() {...}
}

Our generator takes a minimum and maximum and then generates random numbers between those 2 values. Notice that the two attributes are declared final so we have to initialize them either on their declaration or in the class constructor. Let’s go with the constructor:

    public RandomIntGenerator(int min, int max) {
        this.min = min;
        this.max = max;
    }

Now, we also want to give our clients the possibility to specify just a minimum value and then generate random values between that minimum and the max possible value for ints. So we add a second constructor:

    public RandomIntGenerator(int min) {
        this.min = min;
        this.max = Integer.MAX_VALUE;
    }

So far so good, right? But in the same way that we provided a constructor to just specify the minimum value, we want to do the same for just the maximum. We’ll just add a third constructor like:

    public RandomIntGenerator(int max) {
        this.min = Integer.MIN_VALUE;
        this.max = max;
    }

If you try that, you’ll get a compilation error that goes: Duplicate method RandomIntGenerator(int) in type RandomIntGenerator. What’s wrong?

The problem is that constructors, by definition, have no names. As such, a class can only have one constructor with a given signature in the same way that you can’t have two methods with the same signature (same return type, name and parameters type). That is why when we tried to add the RandomIntGenerator(int max) constructor we got that compilation error, because we already had the RandomIntGenerator(int min) one.

Is there something we can do in cases like this one? Not with constructors but fortunately there’s something else we can use: static factory methods, which are simply public static methods that return an instance of the class. You’ve probably used this technique without even realizing it. Have you ever used Boolean.valueOf? It looks something like:

    public static Boolean valueOf(boolean b) {
        return (b ? TRUE : FALSE);
    }

Applying static factories to our RandomIntGenerator example, we could get:

public class RandomIntGenerator {
    private final int min;
    private final int max;

    private RandomIntGenerator(int min, int max) {
        this.min = min;
        this.max = max;
    }
    
    public static RandomIntGenerator between(int max, int min) {
        return new RandomIntGenerator(min, max);
    }
    
    public static RandomIntGenerator biggerThan(int min) {
        return new RandomIntGenerator(min, Integer.MAX_VALUE);
    }
    
    public static RandomIntGenerator smallerThan(int max) {
        return new RandomIntGenerator(Integer.MIN_VALUE, max);
    }

    public int next() {...}
}

Note how the constructor was made private to ensure that the class is only instantiated through its public static factory methods. Also note how your intent is clearly expressed when you have a client with RandomIntGenerator.between(10,20) instead of new RandomIntGenerator(10,20)

It’s worth mentioning that this technique is not the same as the Factory method Design Pattern from the Gang of Four. Any class can provide static factory methods instead of, or in addition to, constructors. So what are the advantages and disadvantages of this technique?

We already mentioned the first advantage of static factory methods: unlike constructors they have names. This has two direct consequences,

We can provide a meaningful name for our constructors.
We can provide several constructors with the same number and type of parameters, something that as we saw earlier we can’t do with class constructors.

Another advantage of static factories is that, unlike constructors, they are not required to return a new object every time they are invoked. This is extremely useful when working with immutable classes to provide constant objects for common used values and avoid creating unnecessary duplicate objects. The Boolean.valueOf code that I showed previously illustrates this point perfectly. Notice that this static method returns either TRUE or FALSE, both immutable Boolean objects.

A third advantage of static factory methods is that they can return an object of any subtype of their return type. This gives you the possibility to change the return type freely without affecting clients. Moreover, you can hide implementation classes and have an interface-based API, which is usually a really good idea. But I think this can be better seen by an example.

Remember the RandomIntGenerator at the beginning of this post? Let’s make that a little bit more complicated. Imagine that we now want to provide random generators not just for integers but for other data-types like String, Double or Long. They are all going to have a next() method that returns a random object of a particular type, so we could start with an interface like:

public interface RandomGenerator<T> {
    T next();
}

Our first implementation of the RandomIntGenerator now becomes:

class RandomIntGenerator implements RandomGenerator<Integer> {
    private final int min;
    private final int max;

    RandomIntGenerator(int min, int max) {
        this.min = min;
        this.max = max;
    }

    public Integer next() {...}
}

We could also have a String generator:

class RandomStringGenerator implements RandomGenerator<String> {
    private final String prefix;

    RandomStringGenerator(String prefix) {
        this.prefix = prefix;
    }

    public String next() {...}
}

Notice how all the classes are declared package-private (default scope) and so are their constructors. This means that no client outside of their package can create instances of these generators. So what do we do? Tip: It starts with “static” and ends with “methods”.
Consider the following class:

public final class RandomGenerators {
    // Suppresses default constructor, ensuring non-instantiability.
    private RandomGenerators() {}
    
    public static final RandomGenerator<Integer> getIntGenerator() {
        return new RandomIntGenerator(Integer.MIN_VALUE, Integer.MAX_VALUE);
    }
    
    public static final RandomGenerator<String> getStringGenerator() {
        return new RandomStringGenerator("");
    }
}

RandomGenerators is just a noninstantiable utility class with nothing else than static factory methods. Being on the same package as the different generators this class can effectively access and instantiate those classes. But here comes the interesting part. Note that the methods only return the RandomGenerator interface, and that’s all the clients need really. If they get a RandomGenerator<Integer> they know that they can call next() and get a random integer.
Imagine that next month we code a super efficient new integer generator. Provided that this new class implements RandomGenerator<Integer> we can change the return type of the static factory method and all clients are now magically using the new implementation without them even noticing the change.

Classes like RandomGenerators are quite common both on the JDK and on third party libraries. You can see examples in Collections (in java.util), Lists, Sets or Maps in Guava. The naming convention is usually the same: if you have an interface named Type you put your static factory methods in a noninstantiable class named Types.

A final advantage of static factories is that they make instantiating parameterized classes a lot less verbose. Have you ever had to write code like this?

Map<String, List<String>> map = new HashMap<String, List<String>>();

You are repeating the same parameters twice on the same line of code. Wouldn’t it be nice if the right side of the assign could be inferred from the left side? Well, with static factories it can. The following code is taken from Guava’s Maps class:

  public static <K, V> HashMap<K, V> newHashMap() {
    return new HashMap<K, V>();
  }

So now our client code becomes:

Map<String, List<String>> map = Maps.newHashMap();

Pretty nice, isn’t it? This capability is known as Type inference. It’s worth mentioning that Java 7 introduced type inference through the use of the diamond operator. So if you’re using Java 7 you can write the previous example as:

Map<String, List<String>> map = new HashMap<>();

The main disadvantage of static factories is that classes without public or protected constructors cannot be extended. But this might be actually a good thing in some cases because it encourages developers to favor composition over inheritance.

To summarize, static factory methods provide a lot of benefits and just one drawback that might actually not be a problem when you think about it. Therefore, resist the urge to automatically provide public constructors and evaluate if static factories are a better fit for your class.

The ins and outs of immutability

Posted on December 24, 2012 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2012/12/24/the-ins-and-outs-of-immutability

So in my first post I talked a little bit about the builder pattern and I mentioned a really powerful but yet overlooked concept: immutability.

What is an immutable class? It’s simply a class whose instances can’t be modified. Every value for the class’ attributes is set on their declaration or in its constructor and they keep those values for the rest of the object’s life-cycle. Java has quite a few immutable classes, such as String, all the boxed primitives (Double, Integer, Float, etc), BigInteger and BigDecimal among others. There is a good reason for this: immutable classes are easier to design, implement and use than mutable classes. Once they are instantiated they can only be in one state so they are less error prone and, as we’ll see later in this post, they are more secure.

How do you ensure that a class is immutable? Just follow these 5 simple steps:

Don’t provide any public methods that modify the object’s state, also known as mutators (such as setters).
Prevent the class from being extended. This doesn’t allow any malicious or careless class to extend our class and compromise its immutable behavior. The usual and easier way to do this is to mark the class as final, but there’s another way that I’ll mention in this post.
Make all fields final. This is a way to let the compiler enforce point number 1 for you. Additionally, it clearly lets anyone who sees your code know that you don’t want those fields to change their values once they are set.
Make all fields private. This one should be pretty obvious and you should follow it regardless of whether you’re taking immutability into consideration or not, but I’m mentioning this just in case.
Never provide access to any mutable attribute. If your class has a mutable object as one of its properties (such as a List, a Map or any other mutable object from your domain problem) make sure that clients of your class can never get a reference to that object. This means that you should never directly return a reference to them from an accessor (e.g., a getter) and you should never initialize them on your constructor with a reference passed as parameter from a client. You should always make defensive copies in this case.

That’s a lot of theory and no code, so lets see what a simple immutable class looks like and how it deals with the 5 steps I mentioned before:

public class Book {
    private final String isbn;
    private final int publicationYear;
    private final List reviews;
    private Book(BookBuilder builder) {
        this.isbn = builder.isbn;
        this.publicationYear = builder.publicationYear;
        this.reviews = Lists.newArrayList(builder.reviews);
    }
    public String getIsbn() {
        return isbn;
    }
    public int getPublicationYear() {
        return publicationYear;
    }
    public List getReviews() {
        return Lists.newArrayList(reviews);
    }
    public static class BookBuilder {
        private String isbn;
        private int publicationYear;
        private List reviews;
        public BookBuilder isbn(String isbn) {
            this.isbn = isbn;
            return this;
        }
        public BookBuilder publicationYear(int year) {
            this.publicationYear = year;
            return this;
        }
        public BookBuilder reviews(List reviews) {
            this.reviews = reviews == null ? new ArrayList() : reviews;
            return this;
        }
        public Book build() {
            return new Book(this);
        }
    }
}

We’ll go through the important points in this pretty simple class. First of all, as you’ve probably noticed, I’m using the builder pattern again. This is not just because I’m a big fan of it but also because I wanted to illustrate a few points that I didn’t want to get into my previous post without first giving you a basic understanding of the concept of immutability. Now, let’s go through the 5 steps that I mentioned you need to follow to make a class immutable and see if they hold valid for this Book example:

Don’t provide any public methods that modify the object’s state. Notice that the only methods on the class are its private constructor and getters for its properties but no method to change the object’s state.
Prevent the class from being extended. This one is quite tricky. I mentioned that the easiest way to ensure this was to make the class final but the Book class is clearly not final. However, notice that the only constructor available is private. The compiler makes sure that a class without public or protected constructors cannot be subclassed. So in this case the final keyword on the class declaration is not necessary but it might be a good idea to include it anyway just to make your intention clear to anyone who sees your code.
Make all fields final. Pretty straightforward, all attributes on the class are declared as final.
Never provide access to any mutable attribute. This one is actually quite interesting. Notice how the Book class has a List<String> attribute that is declared as final and whose value is set on the class constructor. However, this List is a mutable object. That is, while the reviews reference cannot change once it is set, the content of the list can. A client with a reference to the same list could add or delete an element and, as a result, change the state of the Book object after its creation. For this reason, note that on the Book constructor we don’t assign the reference directly. Instead, we use the Guava library to make a copy of the list by calling “this.reviews = Lists.newArrayList(builder.reviews);“. The same situation can be seen on the getReviews method, where we return a copy of the list instead of the direct reference. It is worth noting that this example might be a bit oversimplified, because the reviews list can only contain strings, which are immutable. If the type of the list is a mutable class then you would also have to make a copy of each object in the list and not just the list itself.

That last point illustrates why immutable classes result in cleaner designs and easier to read code. You can just share around those immutable objects without having to worry about defensive copies. In fact, you should never make any copies at all because any copy of the object would be forever equal to the original. A corollary is that immutable objects are just plain simple. They can be in only one state and they keep that state for their entire life. You can use the class constructor to check any invariants (i,e,. conditions that need to be valid on the class like range of values for one of its attributes) and then you can ensure that those invariants will remain true without any effort on your part or your clients.

Another huge benefit of immutable objects is that they are inherently thread-safe. They cannot be corrupted by multiple threads accessing the objects concurrently. This is, by far, the easiest and less error prone approach to provide thread safety in your application.

But what if you already have a Book instance and you want to change the value of one of its attributes? In other words, you want to change the state of the object. On an immutable class this is, by definition, not possible. But, as with most things in software, there’s always a workaround. In this particular case there’s actually two.

The first option is to use the Fluent Interface technique on the Book class and have setter-like methods that actually create an object with the same values for all its attributes except for the one you want to change. In our example we would have to add the following to the Book class:

    private Book(BookBuilder builder) {
        this(builder.isbn, builder.publicationYear, builder.reviews);
    }
    private Book(String isbn, int publicationYear, List reviews) {
        this.isbn = isbn;
        this.publicationYear = publicationYear;
        this.reviews = Lists.newArrayList(reviews);
    }
    public Book withIsbn(String isbn) {
        return new Book(isbn,this.publicationYear, this.reviews);
    }

Note that we added a new private constructor where we can specify the value of each attribute and modified the old constructor to use the new one. Additionally, we added a new method that returns a new Book object with the value we wanted for the isbn attribute. The same concept applies to the rest of the class’ attributes. This is known as a functional approach because methods return the result of operating on their parameters without modifying them. This is to contrast it from the procedural or imperative approach where methods apply a procedure to their operands, thus changing their state.

This approach to generate new objects shows the only real disadvantage of immutable classes: they require us to create a new object for each distinct value we need and this can produce a considerable overhead in performance and memory consumption. This problem is magnified if you want to change several attributes of the object because you are generating a new object in each step and you end up discarding all intermediate objects and keeping just the last result.

We can provide a better alternative for the case of multi-step operations like the one I described on the last paragraph with the help of the builder pattern. Basically, we add a new constructor to the builder that takes an already created instance to set all its initial values. Then, the client can use the builder in the usual way to set all the desired values and then use the build method to create the final object. That way, we avoid creating intermediate objects with only some of the values we need. In our example this technique would look something like this on the builder side:

public BookBuilder(Book book) {
    this.isbn = book.getIsbn();
    this.publicationYear = book.getPublicationYear();
    this.reviews = book.getReviews();
}

Then, on our clients, we can have:

Book originalBook = getRandomBook();

Book modifiedBook = new BookBuilder(originalBook).isbn("123456").publicationYear(2011).build();

Now, obviously the builder is not thread-safe so you have to take all the usual precautions, such as not sharing a builder with multiple threads.

I mentioned that the fact that we have to create a new object for every change in state can have an overhead in performance and this is the only real disadvantage of immutable classes. However, object creation is one of the aspects of the JVM that is under continuous improvement. In fact, except for exceptional cases, object creation is a lot more efficient than you probably think. In any case, it’s usually a good idea to come up with a simple and clear design and then, only after measuring, refactor for performance. Nine out of ten times when you try to guess where your code is taking so much time you’ll discover that you were wrong. Additionally, the fact that immutable objects can be shared freely without having to worry about the consequences gives you the chance to encourage clients to reuse existing instances wherever possible, thus reducing considerably the number of objects created. A common way to do this is to provide public static final constants for the most common values. This technique is heavily used on the JDK, for example in Boolean.FALSE or BigDecimal.ZERO.

To conclude this post, if you want to take something out of it let it be this: classes should be immutable unless there’s a very good reason to make them mutable. Don’t automatically add a setter for every class attribute. If for some reason you absolutely can’t make your class immutable, then limit its mutability as much as possible. The fewer states in which an object can be, the easier it is to think about the object and its invariants. And don’t worry about the performance overhead of immutability, chances are that you won’t have to worry about it.

The builder pattern in practice

Posted on December 13, 2012 by jlordiales

IMPORTANT: This blog has been moved to jlordiales.me. The content here might be outdated. To see this post on the new blog go to http://jlordiales.me/2012/12/13/the-builder-pattern-in-practice

So, this is my first post (and my first blog for that matter). I can’t remember exactly where I read this (although I’m almost sure it was on Practices of an Agile Developer), but writing in a blog is supposed to help you get your thoughts together. Concretely, by taking the time to explain what you know, you get a better understanding of it yourself.

And that’s exactly what I’m going to try to do here, explain things to get a better understanding of them. And, as a bonus, it will also serve me as centralized place to go to when I want to revisit something I’ve done in the past. Hopefully, this will help some of you in the process.

With the introduction out of the way, lets jump straight into this first post which, as the title so eloquently says :), is about the builder pattern. I’m not going to dive into much details about the pattern because there’s already tons of posts and books that explain it in fine detail. Instead, I’m going to tell you why and when you should consider using it. However, it is worth mentioning that this pattern is a bit different to the one presented in the Gang of Four book. While the original pattern focuses on abstracting the steps of construction so that by varying the builder implementation used we can get a different result, the pattern explained in this post deals with removing the unnecessary complexity that stems from multiple constructors, multiple optional parameters and overuse of setters.

Imagine you have a class with a substantial amount of attributes like the User class below. Let’s assume that you want to make the class immutable (which, by the way, unless there’s a really good reason not to you should always strive to do. But we’ll get to that in a different post).


public class User {
    private final String firstName;    //required
    private final String lastName;    //required
    private final int age;    //optional
    private final String phone;    //optional
    private final String address;    //optional
...
}

Now, imagine that some of the attributes in your class are required while others are optional. How would you go about building an object of this class? All attributes are declared final so you have to set them all in the constructor, but you also want to give the clients of this class the chance of ignoring the optional attributes.

A first and valid option would be to have a constructor that only takes the required attributes as parameters, one that takes all the required attributes plus the first optional one, another one that takes two optional attributes and so on. What does that look like? Something like this:

    public User(String firstName, String lastName) {
        this(firstName, lastName, 0);
    }

    public User(String firstName, String lastName, int age) {
        this(firstName, lastName, age, "");
    }

    public User(String firstName, String lastName, int age, String phone) {
        this(firstName, lastName, age, phone, "");
    }

    public User(String firstName, String lastName, int age, String phone, String address) {
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
        this.phone = phone;
        this.address = address;
    }

The good thing about this way of building objects of the class is that it works. However, the problem with this approach should be pretty obvious. When you only have a couple of attributes is not such a big deal, but as that number increases the code becomes harder to read and maintain. More importantly, the code becomes increasingly harder for clients. Which constructor should I invoke as a client? The one with 2 parameters? The one with 3? What is the default value for those parameters where I don’t pass an explicit value? What if I want to set a value for address but not for age and phone? In that case I would have to call the constructor that takes all the parameters and pass default values for those that I don’t care about. Additionally, several parameters with the same type can be confusing. Was the first String the phone number or the address?

So what other choice do we have for these cases? We can always follow the JavaBeans convention, where we have a default no-arg constructor and have setters and getters for every attribute. Something like:

public class User {
	private String firstName; // required
	private String lastName; // required
	private int age; // optional
	private String phone; // optional
	private String address;  //optional

	public String getFirstName() {
		return firstName;
	}
	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}
	public String getLastName() {
		return lastName;
	}
	public void setLastName(String lastName) {
		this.lastName = lastName;
	}
	public int getAge() {
		return age;
	}
	public void setAge(int age) {
		this.age = age;
	}
	public String getPhone() {
		return phone;
	}
	public void setPhone(String phone) {
		this.phone = phone;
	}
	public String getAddress() {
		return address;
	}
	public void setAddress(String address) {
		this.address = address;
	}
}

This approach seems easier to read and maintain. As a client I can just create an empty object and then set only the attributes that I’m interested in. So what’s wrong with it? There are two main problems with this solution. The first issue has to do with having an instance of this class in an inconsistent state. If you want to create an User object with values for all its 5 attributes then the object will not have a complete state until all the setX methods have been invoked. This means that some part of the client application might see this object and assume that is already constructed while that’s actually not the case.
The second disadvantage of this approach is that now the User class is mutable. You’re loosing all the benefits of immutable objects.

Fortunately there is a third choice for these cases, the builder pattern. The solution will look something like the following.

public class User {
	private final String firstName; // required
	private final String lastName; // required
	private final int age; // optional
	private final String phone; // optional
	private final String address; // optional

	private User(UserBuilder builder) {
		this.firstName = builder.firstName;
		this.lastName = builder.lastName;
		this.age = builder.age;
		this.phone = builder.phone;
		this.address = builder.address;
	}

	public String getFirstName() {
		return firstName;
	}

	public String getLastName() {
		return lastName;
	}

	public int getAge() {
		return age;
	}

	public String getPhone() {
		return phone;
	}

	public String getAddress() {
		return address;
	}

	public static class UserBuilder {
		private final String firstName;
		private final String lastName;
		private int age;
		private String phone;
		private String address;

		public UserBuilder(String firstName, String lastName) {
			this.firstName = firstName;
			this.lastName = lastName;
		}

		public UserBuilder age(int age) {
			this.age = age;
			return this;
		}

		public UserBuilder phone(String phone) {
			this.phone = phone;
			return this;
		}

		public UserBuilder address(String address) {
			this.address = address;
			return this;
		}

		public User build() {
			return new User(this);
		}

	}
}

A couple of important points worth noting:

The User constructor is private, which means that this class can not be directly instantiated from the client code.
The class is once again immutable. All attributes are final and they’re set on the constructor. Additionally, we only provide getters for them.
The builder uses the Fluent Interface idiom to make the client code more readable (we’ll see an example of this in a moment).
The builder constructor only receives the required attributes and this attributes are the only ones that are defined “final” on the builder to ensure that their values are set on the constructor.

The use of the builder pattern has all the advantages of the first two approaches I mentioned at the beginning and none of their shortcomings. The client code is easier to write and, more importantly, to read. The only critique that I’ve heard about the pattern is the fact that you have to duplicate the class’ attributes on the builder. However, given the fact that the builder class is usually a static member class of the class it builds, they can evolve together fairly easy.

Now, how does the client code trying to create a new User object looks like? Let’s see:

	public User getUser() {
		return new
				User.UserBuilder("Jhon", "Doe")
				.age(30)
				.phone("1234567")
				.address("Fake address 1234")
				.build();
	}

Pretty neat, isn’t it? You can build a User object in 1 line of code and, most importantly, is very easy to read. Moreover, you’re making sure that whenever you get an object of this class is not going to be on an incomplete state.

This pattern is really flexible. A single builder can be used to create multiple objects by varying the builder attributes between calls to the “build” method. The builder could even auto-complete some generated field between each invocation, such as an id or serial number.

An important point is that, like a constructor, a builder can impose invariants on its parameters. The build method can check these invariants and throw an IllegalStateException if they are not valid.
It is critical that they be checked after copying the parameters from the builder to the object, and that they be checked on the object fields rather than the builder fields. The reason for this is that, since the builder is not thread-safe, if we check the parameters before actually creating the object their values can be changed by another thread between the time the parameters are checked and the time they are copied. This period of time is known as the “window of vulnerability”. In our User example this could look like the following:

public User build() {
    User user = new user(this);
    if (user.getAge() 120) {
        throw new IllegalStateException(“Age out of range”); // thread-safe
    }
    return user;
}

The previous version is thread-safe because we first create the user and then we check the invariants on the immutable object. The following code looks functionally identical but it’s not thread-safe and you should avoid doing things like this:

public User build() {
    if (age 120) {
        throw new IllegalStateException(“Age out of range”); // bad, not thread-safe
    }
    // This is the window of opportunity for a second thread to modify the value of age
    return new User(this);
}

A final advantage of this pattern is that a builder could be passed to a method to enable this method to create one or more objects for the client, without the method needing to know any kind of details about how the objects are created. In order to do this you would usually have a simple interface like:

public interface Builder {
    T build();
}

In the previous User example, the UserBuilder class could implement Builder<User>. Then, we could have something like:

UserCollection buildUserCollection(Builder<? extends User> userBuilder){...}

Well, that was a pretty long first post. To sum it up, the Builder pattern is an excellent choice for classes with more than a few parameters (is not an exact science but I usually take 4 attributes to be a good indicator for using the pattern), especially if most of those parameters are optional. You get client code that is easier to read, write and maintain. Additionally, your classes can remain immutable which makes your code safer.

UPDATE: if you use Eclipse as your IDE, it turns out that you have quite a few plugins to avoid most of the boiler plate code that comes with the pattern. The three I’ve seen are:

I haven’t tried any of them personally so I can’t really give an informed decision on which one is better. I reckon that similar plugins should exist for other IDEs.

This blog has been moved to http://jlordiales.me