Follow up: I have created another blog post to clarify some of these issues here.

The IPython project, through UC Berkeley and Cal Poly San Luis Obispo, just received a $1.15 million dollar grant from the Alfred P. Sloan Foundation to develop the IPython Notebook over a two year period.  More details about this grant can be found on the IPython website.  This is really exciting for us because, so far, we have mostly developed IPython in our spare time.  But I think there is also a potential danger here.  The danger is that we will add lots of new features.  What, you say, lots of features will endanger IPython?  What else are you going to do with a million dollars if you are not going to add lots of new features?  The answer is simple: we are going to add as few features as possible and knock each of them out of the park.  The future of the project depends on this.

This is a topic that I have been thinking about a lot lately: how do open source projects decide which features to implement.  Most active open source projects I am involved in see a continual stream of new features.  Just hop onto the GitHub pages for SymPy or IPython and watch the activity.  Anyone with the right skill set can fork a project on GitHub and submit a pull request within a few hours.  Amazingly, this is happening all the time; apparently people love to code.  While each new feature is an asset for the project, it also brings a cost, or liability, with it.  If a project ignores those costs, it can have long term, detrimental effects on the project.  What are these liabilities and costs associated with new features?

  • Each new feature adds complexity to the code base.  Complexity makes a code base less hackable, maintainable, extensible.
  • Each new feature increases the “bug surface” of the project.  When a feature also adds complexity, those bugs become harder to find and fix.
  • Each new feature requires documentation to be written and maintained.
  • Each new feature requires support over email or IRC.
  • Endless feature expansion, or feature creep, requires developers to specialize.  They can’t follow the entire project, so they have to focus on a subset that can fit into their brain and schedule.
  • Each new feature has to be tested on a wide variety on platforms (Linux, Mac, Windows) and environments (PyPy, Python 2, Python 3).
  • Each new feature adds complexity to the user experience.  Sometimes it’s the documentation, other times the UI or configuration options.
  • When you spend on one feature, another feature or bug fix didn’t get worked on.  If you didn’t prioritize things beforehand, you just spent time on something less important to your users.  Either that or you did shoddy work while trying to do it all.
  • Features multiply like bunnies.  How many times have you heard, “wow, that new feature is really cool, could you make it do X as well?”
  • Features are easy to add, difficult to remove.  Once you add a feature, you are stuck with the costs and liabilities.

For a typical open source project, what percentage of features get used regularly by most users?  15%? 40%?.  Let’s be really optimistic and say that number is 60%.  That means that developers are spending 40% of their time and effort on features that don’t really get used.  Ouch.  Then why do open source projects keep adding features without counting the cost?  I think there are a number of factors that lead to this, but one in particular comes to my mind.  It is hard to say no.  When an end user submits a feature request over email or on GitHub issues, it is hard to tell them, “great idea, but we are not doing that.”  It is even more difficult to say no after someone submits a pull request that implements a new feature.  It is difficult to build a vibrant project community if you are always saying no in these contexts.

Clearly, we need a better way of limiting feature expansion in open source projects.  What can we do to better evaluate the hidden costs of adding new features so we can make informed, strategic decisions about which features to add?  Here are some ideas that have emerged out of my recent reading and conversations.

  • Create a wiki page for your project, where you list all of the features you are not going to implement.  Publicize this list, discuss it and make it an important part of the development workflow.  Another way of phrasing this is to decide on a finite scope for the project.  When you are going through this exercise, come up with an initial scope and then force yourself to reduce it.
  • Make features fight hard to be accepted and implemented.  Communicate to the community and developers that the default answer to feature requests is no (it’s not personal!) and don’t even consider implementation until the much of the community is crying “we absolutely must have this.” Even then, you don’t have to say yes.
  • Create a workflow that separates feature requests from other tickets/issues.  When people submit new feature requests, encourage discussion, but don’t automatically promote the feature to the project’s todo list.  Then you can promote them, as needed, to the project’s todo list in an informed and prioritized manner.
  • When new feature requests appear, discuss the specific costs and liabilities associated with the feature.  Build this thinking into your development DNA.
  • Communicate to the community and its developers why it is important to fight against feature expansion.  Focus on the benefits of waging this war: smaller, simpler code base, fewer bugs, more time to focus on important features, easier to support, etc.
  • Remove features that have too great a cost or that few users actually use.  Maybe even create a special exception you can raise (FeatureReductionWarning?) to help people transition away from them.
  • Refactor the codebase to reduce complexity.  While this doesn’t directly reduce the number of features, it can mitigate the cost of existing and future features.  Extra bonus points if you can implement a new feature while dramatically reducing the complexity of the code base.
  • Improve testing.  Again, this is mitigation.

As you discuss and evaluate features, here are some questions you can ask yourself and the community:

  • What fraction of your user base will use the feature?  How often will they use it?  If it won’t be used by most of your users, just say no.
  • Can the feature be added as a third party plugin or library?  This is especially useful if the new feature would increase the overall scope of the project, but make a great standalone project.
  • How difficult will it be to test, debug, document, and maintain the feature?  What fraction of your development team is capable or interested in doing this work?  If the maintanence is huge and only one person is willing to do it, it is time to rethink it.
  • Can you implement the functionality in a more limited, but much simpler manner?  Developers absolutely love to implement features in the most general way possible.  It requires developer discipline and focus to resist this temptation.
  • One way that developers over engineer things is by making every conceivable thing configurable.  Can you simplify the feature by removing configurability and just choosing great defaults?

One thoughtful discussion of these issues is in the book Getting Real, by some of the folks at 37signals.  They propose something quite radical for handling feature requests in commercial web applications.  Here is what they say: “How do you manage them?  You don’t.  Just read them and then throw them away.”  Their experience is that the important features will keep coming up.  When this happens, you know they are important and you don’t have to write them down to keep track of them.  This is definitely my experience in developing the IPython Notebook.  The most important features, the ones that I am going to spend time on this year, are probably not written down anywhere, but everyone in the community is discussing them and couldn’t possibly forget them.  So why on earth do we currently have 177 open new feature issues (we call them “enhancements”) on IPython’s GitHub site?

In an open source project, I don’t think it makes sense to literally throw away feature requests; sometimes the ensuing discussion is valuable and worth preserving.  But what about allowing the discussion to occur, for example in the context of a GitHub issue, but then closing the issue.  If someone wants to re-open the feature request at a later time to voice their support, they should be encouraged to do that.  But again, once discussion stops, the issue should be re-closed.  As this process repeats itself, the community is essentially voting for the features they want.  This would also dramatically reduce the number of open issues, which helps developers to better manage the active work on the project.

I don’t think this is the only way to manage feature requests intelligently in an open source project.  I would love to hear other ideas.  How are you managing these things in your own projects?  I realize that I am far from the first person to write about these things.  What other resources do you know of that address these problems?

 

15 Responses to Features and Scope in Open Source Software

  1. Samuel says:

    I fully agree.
    … and wish that github had a place to “move” the feature request issues away so that they don’t spam the issues list. I kind of feel pressed to close issues. And closing “feature request” issues is done by implementing or outright rejecting. The latter is hard to do.

    • Brian Granger says:

      I think part of the solution may be social. For example, with IPython we have created a culture where it is OK to close, then later re-open pull requests. Closing a pull request doesn’t mean rejection, but more of a “this is not ready yet.” If people want to address the review comments and re-open the pull request, they are encouraged to do so. But you are right, closing a feature request does mean we are not going to implement this, at least not yet.” That can be hard for people to take, regardless of the details on GitHub.

  2. Francesc Alted says:

    Nice reflections Brian. It is good to see how you guys at IPython put a lot of thinking in the workflow of the package.

    In my opinion, preventing or not complexity out of a library/package/project depends more on the goals and philosophy of the leaders than anything else. For example, I don’t think preventing complexity would have been good for a project like the Linux kernel: an OS is a complex thing that demands complex features.

    On the other hand, there are many possible interpretations on what features can be considered as ‘added complexity’. For example, for the IPython case, I have always considered that the parallel engine that comes with it is completely ortogonal (and hence, and added ‘complexity’) to the IPython features. I have discussed that with Fernando before, but in his mind, the parallel engine had to be at the core of IPython. Different sensibilities and backgrounds also give different perceptions of a ‘core feature’ or an ‘added complexity’.

    At any rate, I strongly believe that it is extremely important that the leaders put a lot of thought on the features they like, and *express* their conclusions publicly. People will generally understand (and eventually buy) any reason that is exposed from the deepest leader’s convictions. This is what you are doing in this blog, and this is *great*.

    • Brian Granger says:

      Francesc, thanks for the comments. You make a great point about the goals and philosophy of the project leaders, they are really important in determining which features make sense. I don’t always think that complexity is a bad thing for the reason you give: sometimes a project is tackling something that is complex.

      Also, you are not the first person to ask about IPython’s parallel computing tools. I have written an additional blog post that discusses and clarifies these issues:

      http://brianegranger.com/?p=261

  3. jared says:

    Excellent read.
    Thoroughly written.
    Thanks for posting.

  4. Aaron Meurer says:

    Two things I want to say here. One is that I think the best way to avoid feature creep is to have a clear, not necessarily concise, philosophy for the project, which is written down somewhere. The core developers need to all agree on this. This will then serve as the acid test for any new feature: does it fit in the scope of the project? Well, what is the scope of the project? It’s anything that matches the project’s philosophy.

    In a way, this doesn’t really get you anywhere because you still have to decide what is and isn’t within the scope of the project. Much of this is philosophical, which is why I think it’s a good place to start, but in the end, you may have to categorically reject some things in your philosophy because of the reasons you mention.

    Secondly, I’d like to consider the flip side of this. You mention that extensions are a good place for features that can’t be supported by the project. In my experience, this is where things go to die. How often do you see code for some extension, perhaps on a wiki (not just for IPython), that no longer works because it is too old? Take for an example in the back if your mind some ipython magic. Granted, keeping code in github rather than on a wiki can help with this, as can testing, but the point is that if it is in the main code base, it is likely to be kept up to date. Supposing that a feature is going to be implemented anyway (say as an extension), then it’s not an issue of supporting it. If it’s in a separate project, someone has to support it anyway (or if no one does, it will die), but the support cost is generally lower if it’s in the main code base.

    Of course, sometimes, spinoffs gain lives of their own, like matplotlib. I think the solution here is to not be afraid to split off features into separate projects if they start gaining their own little subcommunities (I haven’t seen this happen with IPython yet, but I imagine something like it would have happened if matplotlib had developed within IPython).

    By the way, to add a specific suggestion here, I think any feature that directly relates to another scientific python project should be part it that project, not IPython. For example, we moved the SymPy printing extension to SymPy. I’ve never understood why the parts of pylab that enable asynchronous plotting in the terminal are in IPython and not matplotlib (maybe there’s a good technical reason for this?).

    • Brian Granger says:

      Aaron, thanks for your comments. I agree that documenting the vision/philosophy for the project is the place to start.

      But about the issue of putting feature into third party extensions or plugins. You raise the issue of bit rotting, unmaintained extensions and mini-project. Why does this happen? Because the feature/code is not important to enough to the community for people to maintain it. If people were really using it, it would be maintained, regardless of which repo it is in. I don’t think that won’t magically change if the code is part of a larger project. But I can think of two reasons why it can make sense to not spawn separate projects for new features: i) as you mention, there is overhead to creating and maintaining a separate project, ii) inclusion in the larger project can at least provide a promise that the code will be tested – even if it later sits untouched.

  5. I agree with Aaron. For SymPy, I think everybody is using different parts of it. So I would say that each user maybe uses 10% of all SymPy regularly. But each user is using different 10% of SymPy. By having all the features in one project, well tested, it attracts more developers. If the culture of the project is to help each other out to maintain SymPy as a whole, it works great. Note that things in SymPy are easy to test, while in IPython not so easy. So I agree with your conclusion for IPython, but I think the other approach is not necessarily bad either.

  6. Matthias says:

    I Myself find that IPython is a really big project, and in particular I think there are a lot of great feature that would gain of beeing at least in a standalone repository.

    The first is IMHO the configurable/application/traitlets with command line parsing. It is right now almost impossible to get without pulling all IPython with it.

    The other part is most of the extensions.
    It could perfectly be separate with a much wider range of user that have access to it. Why do you need the approbation of the core dev tu update %%R magic ? Why can’t the community take care of a IPython-magic repo ?

    The third that IMHO will start to grow soon is IPython.lib.display (anything not low level). Why do we have a Youtube Object in that ? Why not Vimeo, Or DailyMotion or MegaUpload , or … It would gain on beeing a separate project that would be able to evolve much more rapidely than IPython.

    The solution I propose would be a much more opened organisation where at least the Extension and Lib.display object lives as separate project.

    • Brian Granger says:

      I agree that splitting IPython into subprojects might help in some ways. Min brought up the possibility of doing this for each of the frontends. But we probably want things to stabilize more before we do this.

  7. Jason Grout says:

    Great post. We see this feature creep and maintenance burden in Sage, for example.

    It would be nice if people could vote on Github issues (like Google issue stars), which would allow users easier access to vote for features.

    (Also, a picky typo correction: “The you can promote them” -> “Then you…”)

  8. Norfeldt says:

    As a newcomer to python and IPython (previous MatLab user) I must say that I’m very happy that IPy comes with such a big “package” as it does. It’s very difficult with all the different modules that have to be downloaded and installed (especially on a windows machine).
    Just figuring out that IPy can solve most of my problems took some time and installing it on windows is still very very difficult – if you ask me.

    But this is properly just because I just want the full package since I don’t know what I need until I need it – like to play around with the different features.

  9. Yaroslav Halchenko says:

    Very nice post, Brian – thank you for discussing future directions of IPython development in public !

    For a change I would also like to slightly disagree with an aggressive policy against feature expansion — whenever it is sound for a smaller and more specialized project, IMHO it might be sub-optimal for large general-purpose projects.

    In other words you seems to be tackling the problem of “how to spend a 1M$ effectively so that achieved expansion is not jeopardized in the future by possibly limited funding”, or “how to achieve a manageable/controlled growth”. As it was pointed out by others, as a project of general applicability grows it its pretty much unavoidable to have a situation where no single developer knows the whole code base; neither the single (or a couple) of developers grasping all the needs of the community from the product. Limiting an influx of features could indeed be a (suboptimal) solution to such problem. Alternatively you could invest this funded time concentrating more on infrastructure. You have mentioned “plugins” (or call them extensions or modules) — and IMHO improving modularity and management of such “modules” could be a sustainable long-term solution. The goal though should be not to really jump into “3rd party plugins” but rather to workout a sound workflow for their incorporation within an official IPython.

    Prominent examples among software development projects could be Linux, XOrg and Apache, where, whenever ready (tested, reviewed) a new module becomes a part of the official project. Being modular, those could easily be managed atomically while still maintaining the complete system as a whole. Unused, superseeded, heavily broken modules get deprecated. I could even expand that example further into the domain of software (primarily GNU/Linux) distributions. RHEL — great modular distribution BUT concentrated on commercial applicability, thus following a bit closer suggested by you approach of limiting what gets accepted/maintained, and allowing “foster” (Fedora) or derivative (ScientificLinux) projects expanding the coverage. Not sure if IPython should follow RHEL’s model since after all it is a community-oriented (not enterprise) project. Debian — accepting virtually anything worthwhile, open, of sufficient quality, and having a person to maintain a package. Due to a clear modular organization, formalized practices and policies, Debian project is maintaining a huge (>30000 source packages IIRC) amount of inter-dependent software as a single integrated entity. Now consider Ubuntu PPAs — although useful for various occasions, they are also a reason of pain for many users since they are not part of the distribution itself, thus lack needed integration testing etc., hence I am against ’3rd party plugins’ as the core way for expansion.

    So what if IPython had
    1. clear specification (interdependencies, API)/infrastructure (testing, etc)/requirements (active maintainer, code QA, etc) for additional modules to be submitted for inclusion.

    2. [just an idea] to achieve easy modularity with GIT, utilize submodules to “bring-in” new modules or drop existing ones (may be even temporarily for a release) from the “distribution”? This would allow per-module bug tracking in their own repositories on GITHUB thus avoiding pollution of the “core” bug tracking (ideally would need seeking additional features from github to provide overview of bugreports in submodules, allowing forwarding issues from one repo to another etc)

    3. possibly even create (instead of waiting for a spin-off) an official “unstable” flavor of IPython distribution which would incorporate some perspective extensions on top of the official release (should be easy to achieve with submodules)

    3.1 IMHO it would be worth spending some portion of those 1M$ to stabilize releases a bit more, allowing to maintain an official “stable” series which would provide only bug fixes without extension with new features — IPython is becoming the “kernel” of scientific computing, so having limiting in features but extremely stable one might be of preference over fresh flakier releases. Once again, modular organization might help here quite a bit.

    4. develop a ‘feedback’ system reporting back from volunteers basic statistics on which modules are “popular” and actually being used, so you could easily and objectively assess the impact/importance of specific modules. Running surveys is another way but I would expect it to be an inadequate reflection of the end-user needs. E.g. in Debian world we have a Popularity contest, which helps to decide which packages have higher priority and guestimate the impact of dropping any particular package.

    Of cause, nothing from above would become possible without the growth of a healthy ‘developers community’ as well, but IMHO restriction of feature expansion and diving into 3rd party plugins would not help there either.

    Sorry for the long(est) post, hopefully some bits would make sense to you — thanks for IPython ;)

  10. Matthew says:

    I think this can be resolved through project organization and interproject testing.

    NumPy/SciPy/SciKits is a good example of a strong algorithmic core interacting increasing levels of special case projects. If all of SciKits were part of NumPy then testing would be better integrated but communication and support would be all-to-all.

    Of course, changing NumPy now is a _big deal_. It would be interesting to have inter-project testing. Something like travis that runs infrequently but checks interactions among projects. If I make a new library numpyfoo I would like to be able to register a set of tests with the numpy project for the behavior on which my project depends. The numpy community can decide whether or not they want to support me as a separable decision.

    This discussion highlights the fact that the project abstraction we use is clunky. We’re choosing where to place boundaries – the existence of this arbitrary choice is a sign that we’re not doing the simplest solution. I think that minimally sized projects with inter-project testing would enable more rapid long-term growth.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Looking for something?

Use the form below to search the site:


Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...

Archives

All entries, chronologically...

Set your Twitter account name in your settings to use the TwitterBar Section.