Tue, 26 Nov 2013 15:14:59 UTC - Charlie Robbins - npm

We know the availability and overall health of The npm Registry is paramount to everyone using Node.js as well as the larger JavaScript community and those of your using it for some awesome projects and ideas. Between November 4th and November 15th 2013 The npm Registry had several hours of downtime over three distinct time periods:

  1. November 4th -- 16:30 to 15:00 UTC
  2. November 13th -- 15:00 to 19:30 UTC
  3. November 15th -- 15:30 to 18:00 UTC

The root cause of these downtime was insufficient resources: both hardware and human. This is a full post-mortem where we will be look at how npmjs.org works, what went wrong, how we changed the previous architecture of The npm Registry to fix it, as well next steps we are taking to prevent this from happening again.

All of the next steps require additional expenditure from Nodejitsu: both servers and labor. This is why along with this post-mortem we are announcing our crowdfunding campaign: scalenpm.org! Our goal is to raise enough funds so that Nodejitsu can continue to run The npm Registry as a free service for you, the community.

Please take a minute now to donate at https://scalenpm.org!

How does npmjs.org work?

There are two distinct components that make up npmjs.org operated by different people:

  • http://registry.npmjs.org: The main CouchApp (Github: isaacs/npmjs.org) that stores both package tarballs and metadata. It is operated by Nodejitsu since we acquired IrisCouch in May. The primary system administrator is Jason Smith, the current CTO at Nodejitsu, cofounder of IrisCouch, and the System Administrator of registry.npmjs.org since 2011.
  • http://npmjs.org: The npmjs website that you interact with using a web browser. It is a Node.js program (Github: isaacs/npm-www) maintained and operated by Isaac and running on a Joyent Public Cloud SmartMachine.

Here is a high-level summary of the old architecture:

old npm architecture

Diagram 1. Old npm architecture

What went wrong and how was it fixed?

As illustrated above, before November 13th, 2013, npm operated as a single CouchDB server with regular daily backups. We briefly ran a multi-master CouchDB setup after downtime back in August, but after reports that npm login no longer worked correctly we rolled back to a single CouchDB server. On both November 13th and November 15th CouchDB became unresponsive on requests to the /registry database while requests to all other databases (e.g. /public_users) remained responsive. Although the root cause of the CouchDB failures have yet to be determined given that only requests to /registry were slow and/or timed out we suspect it is related to the massive number of attachments stored in the registry.

The incident on November 4th was ultimately resolved by a reboot and resize of the host machine, but when the same symptoms reoccured less than 10 days later additional steps were taken:

  1. The registry was moved to another machine of equal resources to exclude the possibility of a hardware issue.
  2. The registry database itself was compacted.

When neither of these yielded a solution Jason Smith and I decided to move to a multi-master architecture with continuous replication illustrated below:

current npm architecture

Diagram 2. Current npm architecture -- Red-lines denote continuous replication

This should have been the end of our story but unfortunately our supervision logic did not function properly to restart the secondary master on the morning of November 15th. During this time we moved briefly back to a single master architecture. Since then the secondary master has been closely monitored by the entire Nodejitsu operations team to ensure it's continued stability.

What is being done to prevent future incidents?

The public npm registry simply cannot go down. Ever. We gained a lot of operational knowledge about The npm Registry and about CouchDB as a result of these outages. This new knowledge has made clear several steps that we need to take to prevent future downtime:

  1. Always be in multi-master: The multi-master CouchDB architecture we have setup will scale to more than just two CouchDB servers. As npm grows we'll be able to add additional capacity!
  2. Decouple www.npmjs.org and registry.npmjs.org: Right now www.npmjs.org still depends directly on registry.npmjs.org. We are planning to add an additional replica to the current npm architecture so that Isaac can more easily service requests to www.npmjs.org. That means it won't go down if the registry goes down.
  3. Always have a spare replica: We need have a hot spare replica running continuous replication from either to swap out when necessary. This is also important as we need to regularly run compaction on each master since the registry is growing ~10GB per week on disk.
  4. Move attachments out of CouchDB: Work has begun to move the package tarballs out of CouchDB and into Joyent's Manta service. Additionally, MaxCDN has generously offered to provide CDN services for npm, once the tarballs are moved out of the registry database. This will help improve delivery speed, while dramatically reducing the file system I/O load on the CouchDB servers. Work is progressing slowly, because at each stage in the plan, we are making sure that current replication users are minimally impacted.

When these new infrastructure components are in-place The npm Registry will look like this:

planned npm architecture

Diagram 3. Planned npm architecture -- Red-lines denote continuous replication

You are npm! And we need your help!

The npm Registry has had a 10x year. In November 2012 there were 13.5 million downloads. In October 2013 there were 114.6 million package downloads. We're honored to have been a part of sustaining this growth for the community and we want to see it continue to grow to a billion package downloads a month and beyond.

But we need your help! All of these necessary improvements require more servers, more time from Nodejitsu staff and an overall increase to what we spend maintaining the public npm registry as a free service for the Node.js community.

Please take a minute now to donate at https://scalenpm.org!

Fri, 08 Feb 2013 00:00:00 UTC - Domenic Denicola - npm

Reposted from Domenic's blog with permission. Thanks!

npm is awesome as a package manager. In particular, it handles sub-dependencies very well: if my package depends on request version 2 and some-other-library, but some-other-library depends on request version 1, the resulting dependency graph looks like:

├── request@2.12.0
└─┬ some-other-library@1.2.3
  └── request@1.9.9

This is, generally, great: now some-other-library has its own copy of request v1 that it can use, while not interfering with my package's v2 copy. Everyone's code works!

The Problem: Plugins

There's one use case where this falls down, however: plugins. A plugin package is meant to be used with another "host" package, even though it does not always directly use the host package. There are many examples of this pattern in the Node.js package ecosystem already:

Even if you're not familiar with any of those use cases, surely you recall "jQuery plugins" from back when you were a client-side developer: little <script>s you would drop into your page that would attach things to jQuery.prototype for your later convenience.

In essence, plugins are designed to be used with host packages. But more importantly, they're designed to be used with particular versions of host packages. For example, versions 1.x and 2.x of my chai-as-promised plugin work with chai version 0.5, whereas versions 3.x work with chai 1.x. Or, in the faster-paced and less-semver–friendly world of Grunt plugins, version 0.3.1 of grunt-contrib-stylus works with grunt 0.4.0rc4, but breaks when used with grunt 0.4.0rc5 due to removed APIs.

As a package manager, a large part of npm's job when installing your dependencies is managing their versions. But its usual model, with a "dependencies" hash in package.json, clearly falls down for plugins. Most plugins never actually depend on their host package, i.e. grunt plugins never do require("grunt"), so even if plugins did put down their host package as a dependency, the downloaded copy would never be used. So we'd be back to square one, with your application possibly plugging in the plugin to a host package that it's incompatible with.

Even for plugins that do have such direct dependencies, probably due to the host package supplying utility APIs, specifying the dependency in the plugin's package.json would result in a dependency tree with multiple copies of the host package—not what you want. For example, let's pretend that winston-mail 0.2.3 specified "winston": "0.5.x" in its "dependencies" hash, since that's the latest version it was tested against. As an app developer, you want the latest and greatest stuff, so you look up the latest versions of winston and of winston-mail, putting them in your package.json as

  "dependencies": {
    "winston": "0.6.2",
    "winston-mail": "0.2.3"

But now, running npm install results in the unexpected dependency graph of

├── winston@0.6.2
└─┬ winston-mail@0.2.3
  └── winston@0.5.11

I'll leave the subtle failures that come from the plugin using a different Winston API than the main application to your imagination.

The Solution: Peer Dependencies

What we need is a way of expressing these "dependencies" between plugins and their host package. Some way of saying, "I only work when plugged in to version 1.2.x of my host package, so if you install me, be sure that it's alongside a compatible host." We call this relationship a peer dependency.

The peer dependency idea has been kicked around for literally years. After volunteering to get this done "over the weekend" nine months ago, I finally found a free weekend, and now peer dependencies are in npm!

Specifically, they were introduced in a rudimentary form in npm 1.2.0, and refined over the next few releases into something I'm actually happy with. Today Isaac packaged up npm 1.2.10 into Node.js 0.8.19, so if you've installed the latest version of Node, you should be ready to use peer dependencies!

As proof, I present you the results of trying to install jitsu 0.11.6 with npm 1.2.10:

npm ERR! peerinvalid The package flatiron does not satisfy its siblings' peerDependencies requirements!
npm ERR! peerinvalid Peer flatiron-cli-config@0.1.3 wants flatiron@~0.1.9
npm ERR! peerinvalid Peer flatiron-cli-users@0.1.4 wants flatiron@~0.3.0

As you can see, jitsu depends on two Flatiron-related packages, which themselves peer-depend on conflicting versions of Flatiron. Good thing npm was around to help us figure out this conflict, so it could be fixed in version 0.11.7!

Using Peer Dependencies

Peer dependencies are pretty simple to use. When writing a plugin, figure out what version of the host package you peer-depend on, and add it to your package.json:

  "name": "chai-as-promised",
  "peerDependencies": {
    "chai": "1.x"

Now, when installing chai-as-promised, the chai package will come along with it. And if later you try to install another Chai plugin that only works with 0.x versions of Chai, you'll get an error. Nice!

One piece of advice: peer dependency requirements, unlike those for regular dependencies, should be lenient. You should not lock your peer dependencies down to specific patch versions. It would be really annoying if one Chai plugin peer-depended on Chai 1.4.1, while another depended on Chai 1.5.0, simply because the authors were lazy and didn't spend the time figuring out the actual minimum version of Chai they are compatible with.

The best way to determine what your peer dependency requirements should be is to actually follow semver. Assume that only changes in the host package's major version will break your plugin. Thus, if you've worked with every 1.x version of the host package, use "~1.0" or "1.x" to express this. If you depend on features introduced in 1.5.2, use ">= 1.5.2 < 2".

Now go forth, and peer depend!

Mon, 27 Feb 2012 18:51:59 UTC - Dave Pacheco - npm

Photo by Luc Viatour (flickr)

Managing dependencies is a fundamental problem in building complex software. The terrific success of github and npm have made code reuse especially easy in the Node world, where packages don't exist in isolation but rather as nodes in a large graph. The software is constantly changing (releasing new versions), and each package has its own constraints about what other packages it requires to run (dependencies). npm keeps track of these constraints, and authors express what kind of changes are compatible using semantic versioning, allowing authors to specify that their package will work with even future versions of its dependencies as long as the semantic versions are assigned properly.

This does mean that when you "npm install" a package with dependencies, there's no guarantee that you'll get the same set of code now that you would have gotten an hour ago, or that you would get if you were to run it again an hour later. You may get a bunch of bug fixes now that weren't available an hour ago. This is great during development, where you want to keep up with changes upstream. It's not necessarily what you want for deployment, though, where you want to validate whatever bits you're actually shipping.

Put differently, it's understood that all software changes incur some risk, and it's critical to be able to manage this risk on your own terms. Taking that risk in development is good because by definition that's when you're incorporating and testing software changes. On the other hand, if you're shipping production software, you probably don't want to take this risk when cutting a release candidate (i.e. build time) or when you actually ship (i.e. deploy time) because you want to validate whatever you ship.

You can address a simple case of this problem by only depending on specific versions of packages, allowing no semver flexibility at all, but this falls apart when you depend on packages that don't also adopt the same principle. Many of us at Joyent started wondering: can we generalize this approach?

Shrinkwrapping packages

That brings us to npm shrinkwrap[1]:

       npm-shrinkwrap -- Lock down dependency versions

       npm shrinkwrap

       This  command  locks down the versions of a package's dependencies so
       that you can control exactly which versions of each  dependency  will
       be used when your package is installed.

Let's consider package A:

    "name": "A",
    "version": "0.1.0",
    "dependencies": {
        "B": "<0.1.0"

package B:

    "name": "B",
    "version": "0.0.1",
    "dependencies": {
        "C": "<0.1.0"

and package C:

    "name": "C,
    "version": "0.0.1"

If these are the only versions of A, B, and C available in the registry, then a normal "npm install A" will install:

└─┬ B@0.0.1
  └── C@0.0.1

Then if B@0.0.2 is published, then a fresh "npm install A" will install:

└─┬ B@0.0.2
  └── C@0.0.1

assuming the new version did not modify B's dependencies. Of course, the new version of B could include a new version of C and any number of new dependencies. As we said before, if A's author doesn't want that, she could specify a dependency on B@0.0.1. But if A's author and B's author are not the same person, there's no way for A's author to say that she does not want to pull in newly published versions of C when B hasn't changed at all.

In this case, A's author can use

# npm shrinkwrap

This generates npm-shrinkwrap.json, which will look something like this:

    "name": "A",
    "dependencies": {
        "B": {
            "version": "0.0.1",
            "dependencies": {
                "C": {  "version": "0.1.0" }

The shrinkwrap command has locked down the dependencies based on what's currently installed in node_modules. When "npm install" installs a package with a npm-shrinkwrap.json file in the package root, the shrinkwrap file (rather than package.json files) completely drives the installation of that package and all of its dependencies (recursively). So now the author publishes A@0.1.0, and subsequent installs of this package will use B@0.0.1 and C@0.1.0, regardless the dependencies and versions listed in A's, B's, and C's package.json files. If the authors of B and C publish new versions, they won't be used to install A because the shrinkwrap refers to older versions. Even if you generate a new shrinkwrap, it will still reference the older versions, since "npm shrinkwrap" uses what's installed locally rather than what's available in the registry.

Using shrinkwrapped packages

Using a shrinkwrapped package is no different than using any other package: you can "npm install" it by hand, or add a dependency to your package.json file and "npm install" it.

Building shrinkwrapped packages

To shrinkwrap an existing package:

  1. Run "npm install" in the package root to install the current versions of all dependencies.
  2. Validate that the package works as expected with these versions.
  3. Run "npm shrinkwrap", add npm-shrinkwrap.json to git, and publish your package.

To add or update a dependency in a shrinkwrapped package:

  1. Run "npm install" in the package root to install the current versions of all dependencies.
  2. Add or update dependencies. "npm install" each new or updated package individually and then update package.json.
  3. Validate that the package works as expected with the new dependencies.
  4. Run "npm shrinkwrap", commit the new npm-shrinkwrap.json, and publish your package.

You can still use npm outdated(1) to view which dependencies have newer versions available.

For more details, check out the full docs on npm shrinkwrap, from which much of the above is taken.

Why not just check node_modules into git?

One previously proposed solution is to "npm install" your dependencies during development and commit the results into source control. Then you deploy your app from a specific git SHA knowing you've got exactly the same bits that you tested in development. This does address the problem, but it has its own issues: for one, binaries are tricky because you need to "npm install" them to get their sources, but this builds the [system-dependent] binary too. You can avoid checking in the binaries and use "npm rebuild" at build time, but we've had a lot of difficulty trying to do this.[2] At best, this is second-class treatment for binary modules, which are critical for many important types of Node applications.[3]

Besides the issues with binary modules, this approach just felt wrong to many of us. There's a reason we don't check binaries into source control, and it's not just because they're platform-dependent. (After all, we could build and check in binaries for all supported platforms and operating systems.) It's because that approach is error-prone and redundant: error-prone because it introduces a new human failure mode where someone checks in a source change but doesn't regenerate all the binaries, and redundant because the binaries can always be built from the sources alone. An important principle of software version control is that you don't check in files derived directly from other files by a simple transformation.[4] Instead, you check in the original sources and automate the transformations via the build process.

Dependencies are just like binaries in this regard: they're files derived from a simple transformation of something else that is (or could easily be) already available: the name and version of the dependency. Checking them in has all the same problems as checking in binaries: people could update package.json without updating the checked-in module (or vice versa). Besides that, adding new dependencies has to be done by hand, introducing more opportunities for error (checking in the wrong files, not checking in certain files, inadvertently changing files, and so on). Our feeling was: why check in this whole dependency tree (and create a mess for binary add-ons) when we could just check in the package name and version and have the build process do the rest?

Finally, the approach of checking in node_modules doesn't really scale for us. We've got at least a dozen repos that will use restify, and it doesn't make sense to check that in everywhere when we could instead just specify which version each one is using. There's another principle at work here, which is separation of concerns: each repo specifies what it needs, while the build process figures out where to get it.

What if an author republishes an existing version of a package?

We're not suggesting deploying a shrinkwrapped package directly and running "npm install" to install from shrinkwrap in production. We already have a build process to deal with binary modules and other automateable tasks. That's where we do the "npm install". We tar up the result and distribute the tarball. Since we test each build before shipping, we won't deploy something we didn't test.

It's still possible to pick up newly published versions of existing packages at build time. We assume force publish is not that common in the first place, let alone force publish that breaks compatibility. If you're worried about this, you can use git SHAs in the shrinkwrap or even consider maintaining a mirror of the part of the npm registry that you use and require human confirmation before mirroring unpublishes.

Final thoughts

Of course, the details of each use case matter a lot, and the world doesn't have to pick just one solution. If you like checking in node_modules, you should keep doing that. We've chosen the shrinkwrap route because that works better for us.

It's not exactly news that Joyent is heavy on Node. Node is the heart of our SmartDataCenter (SDC) product, whose public-facing web portal, public API, Cloud Analytics, provisioning, billing, heartbeating, and other services are all implemented in Node. That's why it's so important to us to have robust components (like logging and REST) and tools for understanding production failures postmortem, profile Node apps in production, and now managing Node dependencies. Again, we're interested to hear feedback from others using these tools.

Dave Pacheco blogs at dtrace.org.

[1] Much of this section is taken directly from the "npm shrinkwrap" documentation.

[2] We've had a lot of trouble with checking in node_modules with binary dependencies. The first problem is figuring out exactly which files not to check in (.o, .node, .dynlib, .so, *.a, ...). When Mark went to apply this to one of our internal services, the "npm rebuild" step blew away half of the dependency tree because it ran "make clean", which in dependency ldapjs brings the repo to a clean slate by blowing away its dependencies. Later, a new (but highly experienced) engineer on our team was tasked with fixing a bug in our Node-based DHCP server. To fix the bug, we went with a new dependency. He tried checking in node_modules, which added 190,000 lines of code (to this repo that was previously a few hundred LOC). And despite doing everything he could think of to do this correctly and test it properly, the change broke the build because of the binary modules. So having tried this approach a few times now, it appears quite difficult to get right, and as I pointed out above, the lack of actual documentation and real world examples suggests others either aren't using binary modules (which we know isn't true) or haven't had much better luck with this approach.

[3] Like a good Node-based distributed system, our architecture uses lots of small HTTP servers. Each of these serves a REST API using restify. restify uses the binary module node-dtrace-provider, which gives each of our services deep DTrace-based observability for free. So literally almost all of our components are or will soon be depending on a binary add-on. Additionally, the foundation of Cloud Analytics are a pair of binary modules that extract data from DTrace and kstat. So this isn't a corner case for us, and we don't believe we're exceptional in this regard. The popular hiredis package for interfacing with redis from Node is also a binary module.

[4] Note that I said this is an important principle for software version control, not using git in general. People use git for lots of things where checking in binaries and other derived files is probably fine. Also, I'm not interested in proselytizing; if you want to do this for software version control too, go ahead. But don't do it out of ignorance of existing successful software engineering practices.

Sun, 01 May 2011 15:09:45 UTC - Isaac Schlueter - npm

npm 1.0 has been released. Here are the highlights:

  • Global vs local installation
  • ls displays a tree, instead of being a remote search
  • No more “activation” concept - dependencies are nested
  • Updates to link command
  • Install script cleans up any 0.x cruft it finds. (That is, it removes old packages, so that they can be installed properly.)
  • Simplified “search” command. One line per package, rather than one line per version.
  • Renovated “completion” approach
  • More help topics
  • Simplified folder structure

The focus is on npm being a development tool, rather than an apt-wannabe.

Installing it

To get the new version, run this command:

curl http://npmjs.org/install.sh | sh 

This will prompt to ask you if it’s ok to remove all the old 0.x cruft. If you want to not be asked, then do this:

curl http://npmjs.org/install.sh | clean=yes sh 

Or, if you want to not do the cleanup, and leave the old stuff behind, then do this:

curl http://npmjs.org/install.sh | clean=no sh 

A lot of people in the node community were brave testers and helped make this release a lot better (and swifter) than it would have otherwise been. Thanks :)

Code Freeze

npm will not have any major feature enhancements or architectural changes for at least 6 months. There are interesting developments planned that leverage npm in some ways, but it’s time to let the client itself settle. Also, I want to focus attention on some other problems for a little while.

Of course, bug reports are always welcome.

See you at NodeConf!

Page 2 →