In a previous blog post, I talked about the problem of ever expanding feature sets in open source software projects. I discussed the hidden costs and liabilities of adding new features and proposed practical ways of tackling these problems. However, I think that story is incomplete. Of course open source projects will want to add features, even some with massive costs and liabilities, but how should that happen? How should projects decide which features to add? What criteria should be used? To answer these questions, I want to describe a few of my own experiences in dealing with features and scope in IPython and PyZMQ.
First, IPython. Anyone who is familiar with IPython will probably read my previous blog post and say, “wait, hasn’t IPython experienced a massive amount of feature creep over the last few years. It started out as a simple terminal based interactive shell, but has grown to include parallel computing, a web application and a GUI application. The IPython developers, including yourself, have shown little, if any restraint in adding features.”
Back in 2004, IPython was the 3 year old child of Fernando Perez. At that point, it was already a popular, enhanced interactive Python shell. I had started to work on parallel computing libraries for Python, when Fernando visited me at Santa Clara University (we had been classmates in graduate school starting in 1996). We stayed up until 3 am talking about Python’s rapidly evolving ecosystem of scientific computing libraries, interactive computing environments and the future of IPython and my parallel computing libraries. IPython was a fantastic tool, but both of us wanted more. We wanted a web based IPython Notebook. We wanted integrated parallel computing tools. We began to see that all of these things required the same architecture: a stateful computational engine, the IPython Kernel, that could run user code and send back results over a network. Once that existed, everything else could be layered on top, even the terminal based IPython. The vision that emerged that night was specific and, while ambitious, was ultimately finite. As Fernando describes, in this blog post, it took us ~6 years and multiple attempts to build this architecture. Now, we have a terminal, QtConsole, web-based notebook and parallel computing library all built on top of a common IPython Kernel and message specification. While these things may look like feature creep from the outside, they have been part of a deliberate and calculated plan.
What I have learned from this experience is that it is absolutely critical for the core developers of an open source project to consciously and deliberately set the scope and vision for the project. This vision can even be ambitious. Once that scope is set, choosing which features to add becomes easy: to first order, you add features within that scope. That doesn’t mean you don’t count the cost of those features or that you add all possible features within the scope. The scope provides an upper bound on the feature set. So why does IPython have parallel computing tools? Because we set the scope early on to include that.
Second, PyZMQ. As we developed IPython’s architecture, we realized that we needed better networking/messaging libraries. This led to the creation of PyZMQ in 2010, which is a set of Python bindings for ZeroMQ. Notice the scope of PyZMQ: Python bindings for ZeroMQ. Fast forward to Spring of 2012. In the course of working on the IPython Notebook, I had written some additional code for building PyZMQ based web applications: a module for simple, PyZMQ-based RPC and a module for running Tornado event handlers in a scalable and distributed manner using PyZMQ. I submitted two pull requests to include these modules in PyZMQ. One of them was quickly merged and the other languished in code review for almost a year. Min began to sense that these modules, while useful, were not a good match for inclusion in PyZMQ. Their API was very unstable and they hadn’t been well tested. They would require a much faster release cycle compared to the rest of PyZMQ, which was very stable and well tested. This Christmas break, I finally woke up to the real problem. These modules were outside the scope of PyZMQ. They involved an implicit and undiscussed increase in scope from “Python bindings for ZeroMQ” to “A general place for ZeroMQ based tools in Python.” Once I realized this, Min and I pulled them out of PyZMQ and created separate projects for them. Again, the important thing is to set the scope of a project.