Uncategorized

Tue, 04 Oct 2011 22:39:56 UTC - ryandahl - Uncategorized

Suppose you're writing a web server which does video encoding on each file upload. Video encoding is very much compute bound. Some recent blog posts suggest that Node.js would fail miserably at this.

Using Node does not mean that you have to write a video encoding algorithm in JavaScript (a language without even 64 bit integers) and crunch away in the main server event loop. The suggested approach is to separate the I/O bound task of receiving uploads and serving downloads from the compute bound task of video encoding. In the case of video encoding this is accomplished by forking out to ffmpeg. Node provides advanced means of asynchronously controlling subprocesses for work like this.

It has also been suggested that Node does not take advantage of multicore machines. Node has long supported load-balancing connections over multiple processes in just a few lines of code - in this way a Node server will use the available cores. In coming releases we'll make it even easier: just pass --balance on the command line and Node will manage the cluster of processes.

Node has a clear purpose: provide an easy way to build scalable network programs. It is not a tool for every problem. Do not write a ray tracer with Node. Do not write a web browser with Node. Do however reach for Node if tasked with writing a DNS server, DHCP server, or even a video encoding server.

By relying on the kernel to schedule and preempt computationally expensive tasks and to load balance incoming connections, Node appears less magical than server platforms that employ userland scheduling. So far, our focus on simplicity and transparency has paid off: the number of success stories from developers and corporations who are adopting the technology continues to grow.

Fri, 23 Sep 2011 19:45:50 UTC - ryandahl - Uncategorized

We announced back in July that with Microsoft's support Joyent would be porting Node to Windows. This effort is ongoing but I thought it would be nice to make a status report post about the new platform library libuv which has resulted from porting Node to Windows.

libuv's purpose is to abstract platform-dependent code in Node into one place where it can be tested for correctness and performance before bindings to V8 are added. Since Node is totally non-blocking, libuv turns out to be a rather useful library itself: a BSD-licensed, minimal, high-performance, cross-platform networking library.

We attempt to not reinvent the wheel where possible. The entire Unix backend sits heavily on Marc Lehmann's beautiful libraries libev and libeio. For DNS we integrated with Daniel Stenberg's C-Ares. For cross-platform build-system support we're relying on Chrome's GYP meta-build system.

The current implemented features are:

  • Non-blocking TCP sockets (using IOCP on Windows)
  • Non-blocking named pipes
  • UDP
  • Timers
  • Child process spawning
  • Asynchronous DNS via c-ares or uvgetaddrinfo.
  • Asynchronous file system APIs uv_fs*
  • High resolution time uv_hrtime
  • Current executable path look up uv_exepath
  • Thread pool scheduling uv_queue_work
The features we are working on still are

  • File system events (Currently supports inotify, ReadDirectoryChangesW and will support kqueue and event ports in the near future.) uv_fs_event_t
  • VT100 TTY uv_tty_t
  • Socket sharing between processes uv_ipc_t (planned API)
For complete documentation see the header file: include/uv.h. There are a number of tests in the test directory which demonstrate the API.

libuv supports Microsoft Windows operating systems since Windows XP SP2. It can be built with either Visual Studio or MinGW. Solaris 121 and later using GCC toolchain. Linux 2.6 or better using the GCC toolchain. Macinotsh Darwin using the GCC or XCode toolchain. It is known to work on the BSDs but we do not check the build regularly.

In addition to Node v0.5, a number of projects have begun to use libuv:

We hope to see more people contributing and using libuv in the future!

Thu, 08 Sep 2011 21:25:43 UTC - mcavage - Uncategorized

This post has been about 10 years in the making. My first job out of college was at IBM working on the Tivoli Directory Server, and at the time I had a preconceived notion that working on anything related to Internet RFCs was about as hot as you could get. I spent a lot of time back then getting "down and dirty" with everything about LDAP: the protocol, performance, storage engines, indexing and querying, caching, customer use cases and patterns, general network server patterns, etc. Basically, I soaked up as much as I possibly could while I was there. On top of that, I listened to all the "gray beards" tell me about the history of LDAP, which was a bizarre marriage of telecommunications conglomerates and graduate students. The point of this blog post is to give you a crash course in LDAP, and explain what makes ldapjs different. Allow me to be the gray beard for a bit...

What is LDAP and where did it come from?

Directory services were largely pioneered by the telecommunications companies (e.g., AT&T) to allow fast information retrieval of all the crap you'd expect would be in a telephone book and directory. That is, given a name, or an address, or an area code, or a number, or a foo support looking up customer records, billing information, routing information, etc. The efforts of several telcos came to exist in the X.500 standard(s). An X.500 directory is one of the most complicated beasts you can possibly imagine, but on a high note, there's probably not a thing you can imagine in a directory service that wasn't thought of in there. It is literally the kitchen sink. Oh, and it doesn't run over IP (it's actually on the OSI model).

Several years after X.500 had been deployed (at telcos, academic institutions, etc.), it became clear that the Internet was "for real." LDAP, the "Lightweight Directory Access Protocol," was invented to act purely as an IP-accessible gateway to an X.500 directory.

At some point in the early 90's, a graduate student at the University of Michigan (with some help) cooked up the "grandfather" implementation of the LDAP protocol, which wasn't actually a "gateway," but rather a stand-alone implementation of LDAP. Said implementation, like many things at the time, was a process-per-connection concurrency model, and had "backends" (aka storage engine) for the file system and the Unix DB API. At some point the Berkeley Database (BDB) was put in, and still remains the de facto storage engine for most LDAP directories.

Ok, so some a graduate student at UM wrote an LDAP server that wasn't a gateway. So what? Well, that UM code base turns out to be the thing that pretty much every vendor did a source license for. Those graduate students went off to Netscape later in the 90's, and largely dominated the market of LDAP middleware until Active Directory came along many years later (as far as I know, Active Directory is "from scratch", since while it's "almost" LDAP, it's different in a lot of ways). That Netscape code base was further bought and sold over the years to iPlanet, Sun Microsystems, and Red Hat (I'm probably missing somebody in that chain). It now lives in the Fedora umbrella as '389 Directory Server.' Probably the most popular fork of that code base now is OpenLDAP.

IBM did the same thing, and the Directory Server I worked on was a fork of the UM code too, but it heavily diverged from the Netscape branches. The divergence was primarily due to: (1) backing to DB2 as opposed to BDB, and (2) needing to run on IBM's big iron like OS/400 and Z series mainframes.

Macro point is that there have actually been very few "fresh" implementations of LDAP, and it gets a pretty bad reputation because at the end of the day you've got 20 years of "bolt-ons" to grad student code. Oh, and it was born out of ginormous telcos, so of course the protocol is overly complex.

That said, while there certainly is some wacky stuff in the LDAP protocol itself, it really suffered from poor and buggy implementations more than the fact that LDAP itself was fundamentally flawed. As engine yard pointed out a few years back, you can think of LDAP as the original NoSQL store.

LDAP: The Good Parts

So what's awesome about LDAP? Since it's a directory system it maintains a hierarchy of your data, which as an information management pattern aligns with a lot of use case (the quintessential example is white pages for people in your company, but subscriptions to SaaS applications, "host groups" for tracking machines/instances, physical goods tracking, etc., all have use cases that fit that organization scheme). For example, presumably at your job you have a "reporting chain." Let's say a given record in LDAP (I'll use myself as a guinea pig here) looks like:

    firstName: Mark
    lastName: Cavage
    city: Seattle
    uid: markc
    state: Washington
    mail: mcavagegmailcom
    phone: (206) 555-1212
    title: Software Engineer
    department: 123456
    objectclass: joyentPerson
The record for me would live under the tree of engineers I report to (and as an example some other popular engineers under said vice president) would look like:

                   uid=david
                    /
               uid=bryan
            /      |      \
      uid=markc  uid=ryah  uid=isaacs
Ok, so we've got a tree. It's not tremendously different from your filesystem, but how do we find people? LDAP has a rich search filter syntax that makes a lot of sense for key/value data (far more than tacking Map Reduce jobs on does, imo), and all search queries take a "start point" in the tree. Here's an example: let's say I wanted to find all "Software Engineers" in the entire company, a filter would look like:

     (title="Software Engineer")
And I'd just start my search from 'uid=david' in the example above. Let's say I wanted to find all software engineers who worked in Seattle:

     (&(title="Software Engineer")(city=Seattle))
I could keep going, but the gist is that LDAP has "full" boolean predicate logic, wildcard filters, etc. It's really rich.

Oh, and on top of the technical merits, better or worse, it's an established standard for both administrators and applications (i.e., most "shipped" intranet software has either a local user repository or the ability to leverage an LDAP server somewhere). So there's a lot of compelling reasons to look at leveraging LDAP.

ldapjs: Why do I care?

As I said earlier, I spent a lot of time at IBM observing how customers used LDAP, and the real items I took away from that experience were:

  • LDAP implementations have suffered a lot from never having been designed from the ground up for a large number of concurrent connections with asynchronous operations.
  • There are use cases for LDAP that just don't always fit the traditional "here's my server and storage engine" model. A lot of simple customer use cases wanted an LDAP access point, but not be forced into taking the heavy backends that came with it (they wanted the original gateway model!). There was an entire "sub" industry for this known as "meta directories" back in the late 90's and early 2000's.
  • Replication was always a sticking point. LDAP vendors all tried to offer a big multi-master, multi-site replication model. It was a lot of "bolt-on" complexity, done before the CAP theorem was written, and certainly before it was accepted as "truth."
  • Nobody uses all of the protocol. In fact, 20% of the features solve 80% of the use cases (I'm making that number up, but you get the idea).

For all the good parts of LDAP, those are really damned big failing points, and even I eventually abandoned LDAP for the greener pastures of NoSQL somewhere along the way. But it always nagged at me that LDAP didn't get it's due because of a lot of implementation problems (to be clear, if I could, I'd change some aspects of the protocol itself too, but that's a lot harder).

Well, in the last year, I went to work for Joyent, and like everyone else, we have several use problems that are classic directory service problems. If you break down the list I outlined above:

  • Connection-oriented and asynchronous: Holy smokes batman, node.js is a completely kick-ass event-driven asynchronous server platform that manages connections like a boss. Check!
  • Lots of use cases: Yeah, we've got some. Man, the sinatra/express paradigm is so easy to slap over anything. How about we just do that and leave as many use cases open as we can. Check!
  • Replication is hard. CAP is right: There are a lot of distributed databases out vying to solve exactly this problem. At Joyent we went with Riak. Check!
  • Don't need all of the protocol: I'm lazy. Let's just skip the stupid things most people don't need. Check!

So that's the crux of ldapjs right there. Giving you the ability to put LDAP back into your application while nailing those 4 fundamental problems that plague most existing LDAP deployments.

The obvious question is how it turned out, and the answer is, honestly, better than I thought it would. When I set out to do this, I actually assumed I'd be shipping a much smaller percentage of the RFC than is there. There's actually about 95% of the core RFC implemented. I wasn't sure if the marriage of this protocol to node/JavaScript would work out, but if you've used express ever, this should be really familiar. And I tried to make it as natural as possible to use "pure" JavaScript objects, rather than requiring the developer to understand ASN.1 (the binary wire protocol) or the LDAP RFC in detail (this one mostly worked out; ldap_modify is still kind of a PITA).

Within 24 hours of releasing ldapjs on Twitter, there was an implementation of an address book that works with Thunderbird/Evolution, by the end of that weekend there was some slick integration with CouchDB, and ldapjs even got used in one of the node knockout apps. Off to a pretty good start!

The Road Ahead

Hopefully you've been motivated to learn a little bit more about LDAP and try out ldapjs. The best place to start is probably the guide. After that you'll probably need to pick up a book from back in the day. ldapjs itself is still in its infancy; there's quite a bit of room to add some slick client-side logic (e.g., connection pools, automatic reconnects), easy to use schema validation, backends, etc. By the time this post is live, there will be experimental dtrace support if you're running on Mac OS X or preferably Joyent's SmartOS (shameless plug). And that nagging percentage of the protocol I didn't do will get filled in over time I suspect. If you've got an interest in any of this, send me some pull requests, but most importantly, I just want to see LDAP not just be a skeleton in the closet and get used in places where you should be using it. So get out there and write you some LDAP.

Mon, 29 Aug 2011 15:30:41 UTC - ryandahl - Uncategorized

  • Superfeedr released a Node XMPP Server. "Since astro had been doing an amazing work with his node-xmpp library to build Client, Components and even Server to server modules, the logical next step was to try to build a Client to Server module so that we could have a full blown server. That’s what we worked on the past couple days, and it’s now on Github!
  • Joyent's Mark Cavage released LDAP.js. "ldapjs is a pure JavaScript, from-scratch framework for implementing LDAP clients and servers in Node.js. It is intended for developers used to interacting with HTTP services in node and express.
  • Microsoft's Tomasz Janczuk released iisnode "The iisnode project provides a native IIS 7.x module that allows hosting of node.js applications in IIS.

    Scott Hanselman posted a detailed walkthrough of how to get started with iisnode

← Page 1

Page 3 →