In a previous blog post, I talked about the problem of ever expanding feature sets in open source software projects.  I discussed the hidden  costs and liabilities of adding new features and proposed practical ways of tackling these problems.  However, I think that story is incomplete.  Of course open source projects will want to add features, even some with massive costs and liabilities, but how should that happen?  How should projects decide which features to add?  What criteria should be used?  To answer these questions, I want to describe a few of my own experiences in dealing with features and scope in IPython and PyZMQ.

First, IPython.  Anyone who is familiar with IPython will probably read my previous blog post and say, “wait, hasn’t IPython experienced a massive amount of feature creep over the last few years.  It started out as a simple terminal based interactive shell, but has grown to include parallel computing, a web application and a GUI application.  The IPython developers, including yourself, have shown little, if any restraint in adding features.”

Back in 2004, IPython was the 3 year old child of Fernando Perez.  At that point, it was already a popular, enhanced interactive Python shell.  I had started to work on parallel computing libraries for Python, when Fernando visited me at Santa Clara University (we had been classmates in graduate school starting in 1996).  We stayed up until 3 am talking about Python’s rapidly evolving ecosystem of scientific computing libraries, interactive computing environments and the future of IPython and my parallel computing libraries.  IPython was a fantastic tool, but both of us wanted more.  We wanted a web based IPython Notebook.  We wanted integrated parallel computing tools.  We began to see that all of these things required the same architecture: a stateful computational engine, the IPython Kernel, that could run user code and send back results over a network.  Once that existed, everything else could be layered on top, even the terminal based IPython.  The vision that emerged that night was specific and, while ambitious, was ultimately finite.  As Fernando describes, in this blog post, it took us ~6 years and multiple attempts to build this architecture.  Now, we have a terminal, QtConsole, web-based notebook and parallel computing library all built on top of a common IPython Kernel and message specification.  While these things may look like feature creep from the outside, they have been part of a deliberate and calculated plan.

What I have learned from this experience is that it is absolutely critical for the core developers of an open source project to consciously and deliberately set the scope and vision for the project.  This vision can even be ambitious.  Once that scope is set, choosing which features to add becomes easy: to first order, you add features within that scope.  That doesn’t mean you don’t count the cost of those features or that you add all possible features within the scope.  The scope provides an upper bound on the feature set.  So why does IPython have parallel computing tools?  Because we set the scope early on to include that.

Second, PyZMQ.  As we developed IPython’s architecture, we realized that we needed better networking/messaging libraries.  This led to the creation of PyZMQ in 2010, which is a set of Python bindings for ZeroMQ.  Notice the scope of PyZMQ: Python bindings for ZeroMQ.  Fast forward to Spring of 2012.  In the course of working on the IPython Notebook, I had written some additional code for building PyZMQ based web applications: a module for simple, PyZMQ-based RPC and a module for running Tornado event handlers in a scalable and distributed manner using PyZMQ.  I submitted two pull requests to include these modules in PyZMQ.  One of them was quickly merged and the other languished in code review for almost a year.  Min began to sense that these modules, while useful, were not a good match for inclusion in PyZMQ.  Their API was very unstable and they hadn’t been well tested.  They would require a much faster release cycle compared to the rest of PyZMQ, which was very stable and well tested.  This Christmas break, I finally woke up to the real problem.  These modules were outside the scope of PyZMQ.  They involved an implicit and undiscussed increase in scope from “Python bindings for ZeroMQ” to “A general place for ZeroMQ based tools in Python.”  Once I realized this, Min and I pulled them out of PyZMQ and created separate projects for them.  Again, the important thing is to set the scope of a project.




3 Responses to Personal Reflections on Features and Scope

  1. Aaron Meurer says:

    This is basically what I was saying you should do in my comment to your other post. I think the only thing missing is to write the project philosophy down, so that you can easily reference it.

    By the way, is there a way to subscribe to comment updates by email in this blog?

  2. Evan Misshula says:


    I think the idea of plugins as a way avoiding/managing feature creep for iPython is a great idea. I just started using/following iPython after the project was featured in R-bloggers. As grad students are looking at bigger and bigger data sets the computational power of R is appearing more limited and iPython feels very familiar.

    I think plugins would give you and the core team more time and data to estimate usage and maintenace costs. Thank you for all your hard work on this project.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...


All entries, chronologically...

Set your Twitter account name in your settings to use the TwitterBar Section.
Sitemap | Posts