Announcing Nodul.es: CPAN for Node.js

Last weekend, our team named “Ponies for Orphans” participated in the Node Knockout competition.  The team included 3 of my co-workers from Cloudkick, Russell, Tomaz, Logan, and myself. In 48 hours, we had to build a project based on Node.js.

We were brainstorming ideas before the competition, thinking about all the cool things we could do;  We even planned out some multiplayer game ideas.  We quickly figured out that none of us had done anything extensive with Canvas or SVG, and the existing 3rd party libraries aren’t very comprehensive, with the possible exception of Processing.js. We also felt that we wanted something that would continue to be used after the competition.  We refocused our ideas on projects that would work well with our team composition of being backend programers, and eventually settled on Nodul.es:

Nodul.es: CPAN for Node.js

Nodul.es is a web based view of the NPM package repository for Node.js.  Our goal was simple, implement what we liked about CPAN for Perl and Python’s PyPi in 48 hours of coding.

Currently you can browse by:

Let’s look at an example of a module page;  Tim Smart’s node-compress module is a good example.  We pull out metadata from both the NPM repository, the latest commit from Github, and find all modules that have a dependency upon it.

Internals of Nodul.es

Nodul.es is built around Node.js, using its asynchronous abilities extensively.

We split the system into 3 main components:

  • Indexer:  Indexes the raw data about packages from the NPM Registry.  This is just a raw JSON dump from NPM’s CouchDB backend.
  • Source Downloader: Downloads the latest releases of all NPM modules, and extracts them so we can get extra metadata out about the module.
  • Webapp: The simple part, pulls data out of our datastore, and displays html pages to end users.

All of these services interact MongoDB, which provides data storage for all of the indexed data, and ways to get it back out for webpages.

We also used several external dependencies in building Nodul.es:

  • async – For flow control of asynchronous operations.
  • clutch – For URL routing inside the webapp.
  • Mu – For HTML Templating in the webapp.
  • paperboy – For static file serving (ie, CSS/javascript) in the media subdirectory.
  • prettify – For code highlighting, for a feature not released!
  • sprintf – For string formatting, in the logs, nice logs are good.

What’s next for Nodul.es

We built Nodul.es in 48 hours, and until the voting is over, we aren’t allowed to change it.  But we have a ton of features partially completed that we had to pull because we didn’t want to ship broken and incomplete features, they include:

  • Source Browser:  We want to provide a similar source browsing experience to CPAN in this respect, letting you quickly see how someone is doing something.  We already have most of the infrastructure for this, because we have downloaded the source tarballs.
  • Sitemaps:  We are adding Sitemaps, so that all search engines can find the modules easily.  Currently finding modules is an odd combination of using command line tools or getting lucky with a web search.
  • More Github integration: The vast majority of Node.js modules are hosted on Github, so we want to do things like show module development activity, and use that to provide sorts on things like Category pages.
  • Your ideas: Nodul.es is open source.  We want to make it the best module browser for any language out there.  Submit Ideas, submit pull requests, lets get going!
Posted in Uncategorized | 2 Comments

Writing Node.js Native Extensions

Have a big blog post over on the Cloudkick Blog about Writing Node.js Native Extensions.

Posted in Uncategorized | 1 Comment

Async TLS

We started discussing TLS in Node.js at the meetup in Palo Alto tonight.

Lets imagine you wanted to implement SSL/TLS in an Asynchronous framework, like node.js.

For the sake of discussion, I will be using OpenSSL as an example.  At least as far as I know, these issues also apply equally to GnuTLS or NSS. I would be happy to be wrong!

The Goal

The goal is to provide both a TLS Client and Server API, allowing high level code to determine many of the common behavoirs you need to hook to provide a powerful TLS Platform.  This includes basics like verification of certificates chains, but should also include: SSL Session Caching, OCSP stapling, SNI Validation, SPDY Protocol hinting, and more.

The Problem

OpenSSL can decouple IO operations from sockets, using the BIO abstraction.  This means your process can handle the actual socket, and its buffers, which is good for Node.js, and for most other asynchronous systems that don’t want to block for SSL to do work.

While the IO operations has a good abstraction in OpenSSL, many common operations, rely upon a callback.

For example, lets consider the OpenSSL SSL Session Cache API:

SSL_CTX_sess_set_new_cb(ctx,    ssl_callback_NewSessionCacheEntry);
SSL_CTX_sess_set_get_cb(ctx,    ssl_callback_GetSessionCacheEntry);
SSL_CTX_sess_set_remove_cb(ctx, ssl_callback_DelSessionCacheEntry);

It is a basic caching API, you have 3 functions for caching an SSL Session object, Add new, Reading existing, and deletion.

If you examine the function signature for the get function, it returns an SSL_SESSION object directly, meaning when you return from the function you must either have the correct session, or return NULL to indicate a cache miss:

SSL_SESSION *ssl_callback_GetSessionCacheEntry(SSL *ssl,
                                               unsigned char *id,
                                               int idlen, int *do_copy)
{
  /* Your SSL Session cache goes here! */
  return NULL;
}

The difficulty for async systems here, is that they most likely want to now perform file IO, network IO, or potentially other operations that go outside the current C stack in order to fetch the Session.

In Node.js’ case, this means you cannot provide a callback as users expect it to work in Node — they expect to be able to make an async callback, and then notify the caller when they have found the data.

In an ideal world, the Node.js api would look something like the following:

var sslctx = crypto.createContext{key: privateKey, cert: certificate,
session_cache_get: function(session_id, result_callback) {
  memcached.get(session_id, function(data, err) {
    result_callback(data, err);
  })
}});
var server = http.createServer(..);
server.setSecure(sslctx);
server.listen(8443);

We started talking through the ideas. How could you accomplish this API for TLS in Node?

This cannot work with the standard OpenSSL callbacks, because of how Node.js works, after the initial cache get call returned undefined, we would unwind up the C-stack, and we have no way to notify OpenSSL later on that we got a Session Cache from memcached.

Possible Hacks

There are a few more hackish ways to solve this, they include:

  • Using Co-routines from C. Something like libtask could be used to jump out of the OpenSSL stack, back down to Node.js, and it could resume again once we go the response for the session.
  • Running every SSL Context inside a dedicated thread.  When a callback is invoked, dispatch a message to the main thread, where Node.js will notify the waiting thread once it has an answer.  I think this is actually one of the easier solutions, but it kills the promise of an Evented framework like Node.js, and not having a 1:1 client to thread mapping.

The Rabbit Hole

Hey guys, what if we just implemented the a TLS Protocol parser?

It wasn’t a new idea.  But then we started talking it through the idea of implementing a TLS protocol parser, but still using OpenSSL for all of the actual cryptography, it seemed to make more and more sense.  This would let an http-parser style API be used for TLS, which as far as any of us know, has not been done.  The parser could be written in C (or javascript, but thats irrelevant), the TLS record protocol itself isn’t too complex, it consistents of a few fixed width fields, a few optional fields, but most of the complexity comes from the implementation of all the cryptography, which none of us have an interest in replacing.

I am scared.  Reimplementing SSL or TLS just seems wrong.

But on the other hand, most SSL implementations are tightly coupled to their cryptographic libraries, GnuTLS perhaps being the least so, but these libraries we still designed before many evented style programing paradigms became popular.  It seems like there is a niche to be filled by a liberally licensed, TLS record protocol parser library, which provided stubs to use OpenSSL (or another) backend for the actual cryptography, but basing everything on callbacks to user code.

Is this insane?

Posted in Uncategorized | 5 Comments

Overclocking mod_ssl

At Velocity, I saw Adam Langley give a great presentation entitled Overclocking SSL. Last week Adam posted a distilled version of the Overclocking SSL presentation on his blog.

He covers many topics for improving SSL performance. Unfortunately, his recommendations are decidedly focused on how Google runs their servers, and not a practical guide to how to improve your performance with a more standard Apache 2 and mod_ssl setup. Since I don’t work at Google, but I like my web servers to be fast, I decided to try as many of his recommendations as possible with mod_ssl.

Disclaimer I am not a cryptanalyst. Be paranoid when you are messing with SSL, small mistakes can invalidate your entire security framework. Ask your local cryptanalyst about these changes!

Basic Configuration: Certificate Key Size

Google uses a 1024bit RSA key for their encrypted websites. However, Certificate Authorities are no longer issuing new 1024 bit keys, because the CAB Forum has required them to be phased out at all levels. It is believed these small keys are insecure, so for pratical purposes this means you will want a 2048bit key. Make sure you do not use a 4096 bit key, the key operations are about 5 times slower — make sure you have a 2048bit key, it strikes the balance of speed and security.

The Certificate key sizes doesn’t just affect how many CPU cycles that are used for the calculations, the public versions of the keys are sent to the client when it connects. I go into more detail about TCP round trips bellow, but if your certificate is a 4096 bit key, it means your clients need to download double the data to even get started.

Basic Configuration: Picking Ciphers

The SSLCipherSuite directive controls the ciphers that mod_ssl will negotiate with clients. The string parameter is complicated — it is a combination of aliases of ‘HIGH’, “LOW”, old names, specific names, etc. To see what OpenSSL actually enables, you’ll want to use the `openssl ciphers` command.

This is what you get for the default configuration of mod_ssl:

$ openssl ciphers 'ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP'

DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA:AES256-SHA:DHE-RSA-AES128-SHA:
DHE-DSS-AES128-SHA:AES128-SHA:EDH-RSA-DES-CBC3-SHA:EDH-DSS-DES-CBC3-SHA:
DES-CBC3-SHA:DHE-RSA-SEED-SHA:DHE-DSS-SEED-SHA:SEED-SHA:RC4-SHA:RC4-MD5:
EDH-RSA-DES-CBC-SHA:EDH-DSS-DES-CBC-SHA:DES-CBC-SHA:DES-CBC3-MD5:
RC2-CBC-MD5:RC4-MD5:DES-CBC-MD5:EXP-EDH-RSA-DES-CBC-SHA:
EXP-EDH-DSS-DES-CBC-SHA:EXP-DES-CBC-SHA:EXP-RC2-CBC-MD5:EXP-RC4-MD5:
EXP-RC2-CBC-MD5:EXP-RC4-MD5

The exact list will depend upon your version of OpenSSL, but on most modern operating systems, the first cipher that will be attempted to be used is AES-256. AES-256 is without a doubt a more secure selection, but it isn’t what Google is using. They are using the older RC4 (aka ARC4) cipher, with SHA1 hashing. There have been many different attacks on RC4, many due to bad implementations, but as long as it is used correctly, it is still secure enough. The selection of a cipher is still a judgement call for your product, but RC4 is approximately 3x faster than AES-256 on most machines right now.

In Apache, lets configure it to try to use RC4 w/ SHA1 hashing:

SSLCipherSuite RC4-SHA:AES128-SHA:ALL:!ADH:!EXP:!LOW:!MD5:!SSLV2:!NULL
SSLHonorCipherOrder on

The SSLHonorCipherOrder directive is used to force the server’s cipher choice on to the client.

And lets run the Cipher Suite string through `openssl ciphers` you can see the exact configurations that are being allowed:

$ openssl ciphers 'RC4-SHA:AES128-SHA:ALL:!ADH:!EXP:!LOW:!MD5:!SSLV2:!NULL'

RC4-SHA:AES128-SHA:DHE-RSA-SEED-SHA:DHE-DSS-SEED-SHA:SEED-SHA:
DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA:AES256-SHA:DHE-RSA-AES128-SHA:
DHE-DSS-AES128-SHA:EDH-RSA-DES-CBC3-SHA:EDH-DSS-DES-CBC3-SHA:DES-CBC3-SHA

This will use RC4, and fall back to AES-128, before going to other stronger ciphers, but over the defaults, it is significantly faster.

SSL Session Cache and Resumption

mod_ssl’s supports a plugable backend for storing client sessions with the SSLSessionCache directive. The two most commonly used are the shm and dbm on a single machine. The shm backend is faster than dbm, and should be used in almost all cases.

However, as Adam noted, most people have more than one machine doing SSL Termination. This means a distributed SSL session cache is needed. I wrote the patch for mod_ssl to support a memcached SSL Session cache 3 years ago. This patch wasn’t backported, so you’ll need to use Apache 2.3.x, which is currently in Alpha. To configure it, just pass a list of memcached nodes:

SSLSessionCache memcache:10.0.0.1,10.0.0.2,10.0.0.3

Reducing Round Trips

The best tool to measure this is Wireshark, so you can see both the volume of data, and the round trips. The easy way to test with this is using the `openssl s_client` command. This command lets you easily create SSL connections, and tune various things on both the client and server.

Here is a truncated example of using s_client against encrypted.google.com:

$ openssl s_client -debug -tls1 -host encrypted.google.com -port 443
..... data dumps .....
---
SSL handshake has read 1893 bytes and written 285 bytes
---
New, TLSv1/SSLv3, Cipher is RC4-SHA
Server public key is 1024 bit
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : RC4-SHA
....


The interesting parts you can see here are both the negotiated ciphers, and the total bytes written by each side to establish the connection. The majority of the data sent by the server is from the size of the server certificate.

As Adam discussed in depth, because many certificates have a chain, and most are at least 2048 bits long, it is very easy for a new TCP connection to overflow your initial TCP window. Your goal is to make sure you are sending the correct chain, but not sending too much or irrelevant certificates. Here is a example of www.cloudkick.com, which uses the GoDaddy CA, and an intermediate certificate:

$ openssl s_client -tls1 -host www.cloudkick.com -port 443 -debug

---
Certificate chain
 0 s:/O=*.cloudkick.com
      /OU=Domain Control Validated
      /CN=*.cloudkick.com
   i:/C=US
     /ST=Arizona
     /L=Scottsdale
     /O=GoDaddy.com, Inc.
     /OU=http://certificates.godaddy.com/repository
     /CN=Go Daddy Secure Certification Authority
     /serialNumber=07969287
 1 s:/C=US
       /ST=Arizona
       /L=Scottsdale
       /O=GoDaddy.com, Inc.
       /OU=http://certificates.godaddy.com/repository
       /CN=Go Daddy Secure Certification Authority
       /serialNumber=07969287
   i:/C=US
     /O=The Go Daddy Group, Inc.
     /OU=Go Daddy Class 2 Certification Authority
---
...............
---
SSL handshake has read 2974 bytes and written 422 bytes
---
...............


In this case, the server sent the both the certificate for *.cloudkick.com, and the Go Daddy intermediate certificate. Try as we might, the server in this case had to send 2974 bytes to get started, over 1000 bytes more than what encrypted.google.com needed. This is just a reality of using a chain certificate, and using 2048 bit keys. Just make sure you aren’t sending extra certificates, and to keep your data bellow 4kb to prevent an ACK being needed in the small windows as TCP connections are being started.

OCSP Stapling

One of the biggest problems with the existing SSL infrastructure is that validating the status of a certificate is hard and slow. OCSP Stapling doesn’t make it easier to understand, but it does at least make it faster. OCSP stapling support was originally funded from a grant by Mozilla. It has been added to Apache httpd 2.3, so you’ll need to download that alpha release in order to use it.

OCSP Stapling takes the Certificate’s Authorities OCSP response and bundles it in the initial response to the client. This OCSP response is a cryptographic signature verifying your certificate is still valid for X days. This means the client doesn’t need to resolve another DNS name, and hit another service just to validate your certificate.

In Apache 2.3 and above, the configuration to enable OCSP Stapling is quite simple; Just put these directives in your global scope:

SSLUseStapling on
SSLStaplingCache "shmcb:logs/stapling_cache(128000)"

You can test OCSP stapling using the `openssl s_client` command again and the -status parameter:

$ openssl s_client -host encrypted.google.com -port 443 -tls1  -tlsextdebug  -status
....
OCSP response: no response sent
....


Even Google hasn’t enabled OCSP stapling yet!

If OCSP stapling was enabled, you would see something like this as the output:

OCSP response:
======================================
OCSP Response Data:
    OCSP Response Status: successful (0x0)
    Response Type: Basic OCSP Response
    Version: 1 (0x0)
    Responder Id: C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.",
                           OU = http://certs.godaddy.com/repository/,
                           CN = Go Daddy Validation Authority
    Produced At: Jul 10 17:18:44 2010 GMT
    Responses:
    Certificate ID:
      Hash Algorithm: sha1
      Issuer Name Hash: 70292276537F1ABC8FD53C9484E914CB762A052A
      Issuer Key Hash: FDAC6132936C45D6E2EE855F9ABAE7769968CCE7
      Serial Number: 047C0A27B3C295
    Cert Status: good
    This Update: Jul 10 14:15:00 2010 GMT
    Next Update: Jul 10 23:18:44 2010 GMT


Here my server provided a signature from Go Daddy, saying that my certificate was valid for at least another 5 hours.

False Start, Snap Start and Next Protocol Extensions

Google has proposed a series of extensions and modifications to the TLS protocol in order to reduce round trips, both at the initial negotiation, and when to start sending client data.

TLS False Start is mostly a client change, but even if you wanted to implement the proposed server false start, it really depends upon OpenSSL updates to support it. The only recommendation here is to not use ancient versions of OpenSSL — which is important anyways because of the SSL Renegotiation attacks discovered last year.

The Snap Start proposal will need server support, but currently no released version of OpenSSL supports it yet.

Next Protocol Negotiation Extension lets the client tell the server that it is gong to change protocols once the SSL negotiation finishes. Conceptually to me this is similar to Server Name Indication, where the client is leaking application logic to the SSL layer. This will make upgrades to the SPDY protocol faster, but again there is not a released version of OpenSSL with support yet.

The missing patches

  • Adam mentions a patch reducing OpenSSL’s default buffer allocations from 50kb to 5kb, and suggests the Tor project has a similar patch. I have been unable to find it.
  • I was unable to find any patches for the Next Protocol Negotiation Extension.

Closing

Hopefully your mod_ssl site is faster after all of this, but if you have any recommendations or ideas to improve it further, please let me know!

Posted in Uncategorized | 2 Comments

The Illusion of Stability

Back at the May 2010 Board meeting of the Apache Software Foundation, there was a discussion about releases.  It got me thinking about how my own use of many open source projects has changed.

The Past – Long Cycles, few releases, software ships on physical media

Myst pushed the limits by being shipped on a CD-ROM instead of floppy disks, and Riven followed with pushing the adoption of DVDs by shipping on 1 DVD, or 5 CDs.  Physical media kept accelerating to a point, now days most software can be downloaded.  Even for game consoles, previously one of the last barriers for things like patches, games like Call of Duty: Modern Warfare 2 have a half dozen post-gold master patches, pushed down to internet connected consoles.

If you look at the development of software over the last 20 years, one of biggest changes for many products is the shift in distribution, lots of people talk about Software as a Service, but really you need to just look at software on desktops — products like Google Chrome automatically apply updates without bothering the user, and most products ship with an auto-update mechanism at a minimum.

But the fundamental difference in this is a shift in the software development and release models, that the software distribution systems have finally caught up to.

Release Cycles

I view Ubuntu as one of the first large projects to recognize this shift and embrace it.  Many large-scale projects in the mid-2000s had massive problems tracking dependencies, and synchronizing anything resembling a stable final product was a challenge.  Ubuntu’s model of picking a date, and shipping whatever was stable at that point shifted the responsibility model for stability.  Other projects have done this before Ubuntu, but Ubuntu has stuck to it and exposed so many more people to the model.  Every 6 months, Ubuntu drew a line in the sand, and whatever was stable before that date, became the next Ubuntu release.

This meant, you didn’t just wait for the new release of GNOME or KDE and then try to stabilize everything;  You certainly hoped for dependencies to add new features before your dates, but if they missed it, they would go into the next release.  Compare this to the traditional Linux distribution: multiple rounds of betas to squash out all the integration pain of bringing together thousands of dependencies into a stable final product.

But I don’t make a Linux distribution!

Most  software projects I’ve worked on have had large dependencies on Open Source Software.  Not thousands of projects like a Linux distribution. Some only had a few dependencies. Others had a few dozen projects that they were directly built on top of.  In the past, you took a recent stable release and built packages of it or pulled it into a vendor branch.  There was an expectation that releases were… Stable and would get maintenance patches for serious bugs.

At Bloglines, the product was built on top of 30+ open source projects, from BerkeleyDB, to libcurl,  to Clearsilver.  We tried to take the stable releases, and knit it all together into something that worked. For the most part we were successful.  However, we patched lots of projects, some of them were patches we pushed upstream, others were Bloglines specific modifications, but we thought it was okay, we were taking stable version of Foo, and appling a few patches.  We knew upgrading to the next version of Foo might be painful, but there normally was documentation explaining what changed.

The End of releases

But what the Apache Board meeting got me thinking about, was our dependencies at Cloudkick.  We use a ton of Python at Cloudkick, a few projects like Twisted, and Python itself, we generally use a stable release, and its fine and dandy.  But for many of our more esoteric dependencies like txAMQP, libcloud, scribe, a few Django applications, oauth, sales force libraries, etc, we are using snapshots, mostly from someone’s GitHub repository.

I am grateful for the projects we build on, and we try to contribute back to them whenever we can, but it is no longer taking a stable release and making a few local modifications — we are lucky if a project has releases at all, let alone stable releases!

I don’t think its the fault of things like GitHub, they have download areas, and some projects use them, but the majority don’t.  They give you a git url, and its up to you to pick a ‘stable’ point in time, and hope for the best.

What did those releases provide anyways?

On some levels, I miss stable releases.  It made me feel good, it let me judge at face value, some programer I probably will never meet in person felt good enough about some code, to call it ‘stable’.    But the reality was, any code I’ve pushed hard, I’ve found bugs, and then I patched those bugs or added new features.

Code that wasn’t pushed hard, it probably didn’t matter if someone else thought it was stable, it was good or bad, it worked or it didn’t.

Those releases from someone else provided me with the illusion of a stable product. Something I can build upon;  But the truth when I look hard at the projects I thought were stable, we ended up patching some of those the most!

This newer age paradigms for software releases mean you can move unbeilievability quickly, bringing together diverse peices of software and building communities around new software faster than ever.  I look at the Node.JS modules page,  and I’m blown away.  There are many projects that probally should be added to that list, many removed, but the sheer number of projects, most of them only a few months old, exploding in popularity, its all enabled because the expectations for an open source product have changed.

You no longer wait for the once a year stable release of any dependency, you grab the snapshot from GitHub, find the author on IRC or twitter if there is a problem, patch it locally, submit a pull request, and then keep on building your product.

This model does bring other problems, many of them core to traditional thought at the Apache Software Foundation. Code pulled from SVN trunk aren’t vetted in the same way at the ASF.  Many of the younger ASF projects have had more trouble making releases, and this is difficult for the slow moving foundation to always understand why releases are not a higher priority.  I believe this lack of stable products definitely has hurt the Ruby/Rails community in the last few years too.

We should embrace this change. We are all developing software at breakneck speeds. Software has always been unstable, nowadays we are just honest enough to admit it. What we need is better tooling, not just a distributed version control, but more on the deployment and packaging side for most web applications. The tools have not caught up to changes in development and dependency philosophy, when most sites are still deployed with some variation of a hacked together shell script.

———————–

Thanks to Geoff for giving feedback on this post.

Posted in Uncategorized | 3 Comments

Velocity Ignite

Gave an Ignite Velocity talk tonight at Velocity about Apache Libcloud.

The Ignite format is 20 slides, automatically advancing every 15 seconds — I think I did okay, though I had a few slides where I needed better timing without a doubt, but I was happy enough with it for my first presentation in that format.  It definitely makes you keep moving!

Its different compared to the more traditional slideless “Lightning Talks” — the Slides can be a hamper with bad timing for topics, but I liked that it kept you ontrack, while most lighting talks tended to get cut off early.

Anyways, I’ve posted slides for Apache Libcloud @ Velocity Ignite.

Posted in Uncategorized | Leave a comment

Drinking the Node.js Kool-Aid

The Past and Present

I’ve written dozens of event loops for network services, in C, C++, Python, Perl, Java, Lua, Go and probably other languages at this point.  They all make me reinvent handling of events, none of them are perfect, some are faster than others, but in the end, it is a waste of my time to rewrite them.

My recent favorites has been a combination of C for the low level event loop, and higher level Lua to provide scripting of event handlers;  This is what the Reconnoiter Monitoring system, the Cloudkick Agent and some proposals for Apache HTTP Server 3.0, are all built on.  It generally gives you a good combination of performance with the ability to bind down to EPoll or KQueue, and the light weightiness of Lua means your processes don’t get bogged down on the memory side.  But as Brian Akins was musing this week on dev@httpd, sometimes it just is not enough.  In addition, I have found that most people don’t know Lua all that well, and you end up stumbling on bad practices when its exposed to a wider audience — the tooling for Lua is still limited, although I did find LuaLint this week which relieved some pain.

At the same time at Cloudkick, most of our infrastructure is build around Twisted Python Application services, communicating over combination of AMQP and Apache Thrift.  Twisted Python’s name is well deserved, multi-layer callbacks can be difficult to wrap your head around, but after coding in it daily for almost a year, we can crank out mostly working code with minimal bugs — so there is some good behind it, and Inline deferreds generally make it easier to understand, but again the tooling is limited when it comes to debugging Twisted.  In addition, we are always fighting with the Standard Library and common Python modules, because of Twisted’s model, you either need to do everything the ‘twisted way‘, or you end up sending it off to another thread anyways.

For these reasons, I have been on the lookout for something better.

Enter Javascript

I have to admit, I had a bad first experience with server side Javascript.  At Joost, we used server side Javascript in a custom environment, built on top of Rhino and a proprietary Java framework.  It was painful, we were pushing things too hard, inventing too much ourselves, Rhino was too slow, and the JVM just isn’t a great platform for fast cycle web development.

Because you are always fighting the JVM and existing Java code to provide features inside the environment, you end up needing to write Java code too. In addition, almost all your JVM interfaces are blocking, meaning you are back to threading to get anywhere anyways.

Bring on Node.js

Node.js however doesn’t build on the JVM.  It builds from a clean room environment on top of Google’s v8 engine.  I played with v8 back when it was first released, cranking out an unmaintained mod_v8.  It was fast back then, and has only gotten better.  The best thing is its embedding API — Mozilla’s Spidermonkey has been around for ages, but it was always painful to embed and depend on it.

The main things Node has going for it:

  • Everything is Async:  Because the base environment has been built essentially from scratch, everything is asynchronous.  This means there is no ‘defer to thread’ like in Twisted Python;  You just can’t make blocking code.
  • No existing standard library: While this is somewhat a disadvantage today, because its harder to get going with ‘batteries included’ development, it means every bit of  Javascript is written specifically for Node.js, in a style that fits in with Node.
  • First Class Sockets and HTTP: The example Hello World is over HTTP.  Node keeps you focused on on dealing with the data, rather than spending all your time dealing with the sockets or protocols.

Writing network services in Node just feels natural. I don’t have a better way to explain it — I don’t feel like I do in Twisted Python, where it seems like I am always fighting with the Python environment.  Not only that, Javascript is a high productivity language, with lots of tooling like node-jslint and even debuggers coming along.

In addition because of all the competition on the browser side, Node.js is blazing fast.  No programing language has had the level of technical investment and innovation on their Virtual Machines in the last few years compared to Javascript.

Not a Webapp — an Application Server

Lots of people seem to be excited about building replacement’s for their Rails Website in Node, but I am not. Front end web applications aren’t that interesting to me.  You take a template, fill it with variables from various sources, and send it down to the client.  PHP, Django, Drupal, Rails, even Clearsilver, along with millions of other frameworks have had this figured out for a long time.  They all have special features and such, but they are mostly irrelevant to me — find something your developers will have high productivity in, and let em loose.

Node is exciting because it provides a framework for producing reliable backend services, with an easily built REST-style API, that makes accessing it from anywhere else trivial.  It lets you just write clean, async style code for possibly long running processes, in a garbage collected beautiful environment.

The backend engineers all too often reinvent everything every few years, with AMQP becoming more popular; And just as quickly it seems to be falling out of popularity.  The tooling on backend services seems to always lag behind, Java has giant complicated things you can use, but they aren’t the right fit for most projects.  Apache Thrift presented at least a common communications platform for services which is a good start, and hopefully Apache Avro will make them even easier to use.

Backend services lack a Rails.  They lack a Django.  They lack a JQuery or Dojo.  They lack a revolution in how things are structured and built.  Maybe it was SOA, or REST, or a million other terrible acronyms, but it all got mired in stupid marketing.  Node.js seems to have the possibility to change how I build application servers, and for that reason I am very excited for Node’s future.

What I’ve been hacking on!

Full drinking the Kool-Aid disclaimer: Earlier today, my first patch to Node.js was merged.  It provided UDP & dgram unix daemon socket support.  I don’t think it makes me a biased Node.js zealot yet, I just contributed it because it seemed useful for my own projects, and I wrote it in only a few hours.

I wrote the UDP patch to support my unpublicized Dislocate project.  It is basically seeking to unify service discovery, load balancing and administration across multiple data centers with varying latency — something I think is required to build true auto-scalling solutions.  It is replacing in a sense part of DNS, part of Load Balancers, and part of configuration management.  I am hoping to get something like a first beta release out this June of Dislocate, and will write up more about it at that time.

Posted in Uncategorized | 4 Comments

Forever Storage

I have been to nearly a dozen countries the last few years, done all kinds of fun stuff, yet almost none of it is archived in any way.  I won’t cure cancer or win a Nobel Peace Prize, but I do want to keep an archive of things I’ve done, and places I’ve been.

My mother has been doing genealogy research on our ancestors, and most of the time all she can find out about them are a few Census Records or maybe a random mention in a news paper article.  It takes too much work, for far too little information.

My Information Silos

I have stored my life’s information in several silos.

These Silos include Facebook, Flickr, Google Mail, and perhaps a half dozen other internet services.

Some of them let me export and take control of my data, and thanks to efforts like the Data Liberation Front, the ability to control your own data is generally improving.

But what is wrong with this picture is, I don’t really want that kind of control.  Control to store my data on my hard drive is worthless — my hard drive, while it is a nice SSD and no longer spins around in circles, so the chances for physical failures is slightly reduced, it is still an Information Silo. The data is still locked up on my laptop, and this is perhaps even more risky than an online service.  Most people don’t have the digital photos they took 5 years ago, let alone 10 years.  People on a whole are just bad at managing their own data, on their own machines.  It gets lost, it gets destroyed, and it seems to happen at a rate beyond other means, like a physical Photo album.

The problem with all of these Silos is that they are too easily killed.

Online services, like Facebook or Flickr, no matter how massive or open with their data, will some day die.  Most of these companies have been around for 10 years or less as major players, how can they commit to the structure and reliability to keep my data alive forever.

Local storage, is just fraught with danger too, from seemingly simple things like operating system upgrades gone bad, the first Apple OSX Worm to break out, to things outside the computer world, like Fires or floods.  The likely-hood of a few bits of data surviving the next decade is far too low, I don’t even have all my old 256kb/second mp3s anymore :)

Online Backup Services

There has been a revolution in Online Backup services in the last few years, with great consumer facing services like Dropbox and ZumoDrive.

On the more technical side, Tarsnap, which I absolutely Love,  combines impressive security with easier to use interfaces like tar, bringing innovation to the traditional enterprise backup systems.

All of these services are great for online backup and recovery — but their data and pricing models are still built around online storage, and online access of data.  They are also new companies, most of whom are built upon other young services.

Forever Storage

We have data from centuries ago;  Books were the most common storage format, many of them being transcribed by monks, which turns out to be a slightly lossy experience for the data as it migrates across languages and methods.

Not everyone will believe we can keep growing technology at the pace we have, nor that we might be able to stop death and diseases in our generation, but I do believe we are in the age where information created and stored today, could survive forever.  And if you are in doubt about the advances in medical technology, you can always arrange yourself to be frozen.

When I say, Forever, I do mean, Forever, and ever.

In Science Fiction, there are many books describing these epic time lines, perhaps my favorite is The Forever War by Joe Haldeman.  Our species has existed for just a blip in time so far, but the technological baseline we have today, is enough for the information of our lives to live on forever.

The easy way out, is to just make everything public on places like a blog. Then you hope that Google and similar companies all cross-copy it, and hope that something will survive.

I think relying upon them is still too small minded however, when you start talking about thousands of years. Humans just don’t think in terms of geological time.  The whole technical base could change — the world wide web we know today will be abandoned someday just like Gopher, and all that content will disappear into the ether.

What I want is a service that charges $100 for 100 gigabytes, guaranteed to be accessible for 1000 years.

There are small technical challenges, like how would you write to media intended to last thousands of years, where would you store it all, and how would you pass on access to this data to whomever you desire, but I think they are all solvable.

If you can store your body in cryogenic storage for thousands of years, why can’t you store your data; Not just for yourself, but for your descendants.

I might not live Forever, but I want my data to live on.

Posted in Uncategorized | 20 Comments

Internet Security is a failure

Security on the Internet sucks, and it is only getting worse.  The problem is systemic, with security researchers and developers not producing viable ways for the average user to live on the Internet in a secure fashion without excessive paranoia.

The story of Apache’s Infrastructure

The Apache Software Foundation runs about 40 machines, with varying access policies, but some have upwards of 2300 shell accounts, one for every commiter.  In the last year, there have been three major incidents in this infrastructure:

  • The first attack was in August 2009 was caused by misconfiguration of our backup procedures, and is detailed in this downtime report.
  • The second attack was a persistent DDoS attack against issues.apache.org in October 2009.
  • The third attack started this week, was a directed attack against the Apache JIRA instance, targeting individual Apache Infrastructure team members.  Full details have not yet been posted online about this attack, but you can see the initial email from Joe [gpg signature].  Hopefully later this week, we will get up a blog post with full details.

As a mostly volunteer organization, it is difficult to implement draconian security policies, but the ASF  has avoided running most dynamic webapps — the vast majority of our websites are static HTML.  Maybe this has saved us from untold other security issues, but even with our believed limited exposure, we still got hacked.

The ASF is by no means perfect, it has half-implemented some of the best practices we know we need to do, but I believe overall the ASF is more secure than most big companies.  It has some of the best sysadmins I have known, but it still has issues.   Maybe we can just blame that on having too many users, but I believe fundamentally, Internet security is a failure.

I believe there are four major facets around our insecure Internet:

  1. Identity and Authentication
  2. Transport Security
  3. Secure Software and Operating Systems
  4. Law Enforcement

Identity and Authentication: Failed.

If there was one thing I would change, it would be to stop everyone in the world from using Passwords.  Individuals might pick good ones, but on a whole, they pick bad passwords.  They also use the same password across a multitude of services.

The problem is most attackers collect these passwords, and then use them to escalate privileges to more services.

Wait a minute you might ask, you just combined Identity with Authentication, but they are different!  And yes, you are right, but for the common user, they don’t know the difference.  To solve both on a wide scale, I believe their issues are joined at the hip, as authentication depends on identity in most important use cases.

There are many ways you can avoid using passwords, but they are all too difficult for the average user and widespread adoption.

OpenID was one of the first real innovators in this area, and much credit is due to Brad for it. Even though most people on the internet likely have a provider, very few use it on a daily basis.  Between the user experience issues and phishing problems, I do not believe OpenID will ever be a real replacement for passwords for all websites.  It has solved many problems like how to comment on a blog — which is great, I hate blog spam — but it isn’t the end of Identity and Authentication.

OAuth is taking a different approach, and solving a different problem, which is great for my twitter account.  It is still too early to know if OAuth will really improve the wide-scale security of connected web services, but it has been three years since the project started and real-world use cases are still limited.  The standard still changing quickly certainly isn’t helping adoption.

Both Amazon Web Services and PayPal let you use multi-factor authentication easily, and I applaud them for this, but most websites and services do not, notability for things like email, which today is the primary identify of most people on the Internet.  I believe more services should adopt SMS based multi-factor authentication, and products like Twilio’s SMS API make this easier than ever.  I still can count on a single hand the services I have ever logged into using MFA though — I still can’t login to my bank with it, nor my email. Companies like YubiCo are also providing open stacks to improve security, but again most people don’t own a token.

You can find limited cases of SSL Client Certificates being useful and working, but on the whole they are still painful with many sharp edges.  I used client certificates extensively at Joost, and I never ever want to repeat that experience, and I am a fairly technical user.   The difficulties are not just on the clients and users, but also on running a Certificate Authority correctly with the right policies, revocations and security models.

It isn’t just the users that have problems — providers like DreamHost are unable to authenticate their own users, letting attackers take over accounts mostly via social engineering.

Transport Security: Failed.

As part of the TLS protocol, you need to establish trust between various parties, and so for the most common configurations on the Internet, SSL/TLS depends upon Certificate Authorities.

Trusting Certificate Authorities has turned into an oxymoron.  With Certificates being shipped that no one even knows how they got in the trusted list, to the threat of man in the middle attacks from valid certificates, to off the shelf devices for sale to attack it, TLS has failed.

In addition the problems of the SSL renegotiation attacks don’t help the situation, and it will take years before everyone has upgraded their SSL software to prevent this exploit.

I believe while issues in the TLS protocol itself are going to be rare overall, the problem of the CAs will not go away.  I don’t know how to solve the trusted CA problem — distributed trust systems are one of the hardest problems to solve for the average end user.  As a normal user, at some point you will need to trust a large company to make trust decisions for you, but this process is still too opaque to provide real trust for most people.  I personally have doubts that the Extended Validation Certificates are a good thing, in fact I believe it might be providing an illusion of being more secure. We are still trusting the same Certificate Authorities that have almost zero business motivation to provide good security.

Secure Software and Operating Systems: Failed.

Do your Linux servers have an uptime of over 30 days?  Then it is very likely they have a local root kernel exploit.  It used to be funny to make fun of Windows exploits, and there have been many remote ones which is terrible, but Linux and most open source alternatives have not truly improved security for the average server.  The problem isn’t just that the operating system kernels are insecure, it is that privilege escalation is far too easy, and far too common.

You should design software around expecting a local user to be compromised, and not to pick on projects like WordPress, but they have seen a rash of severe security issues over the years, with a relatively small code base — and most webapps, open source or not have similar records.   The problem is once an attacker can execute local code, in almost all situations it means with a little work, they can also gain root.

On the user’s side, browsers and their plugins, like flash,  have had a similarly abysmal track record.  Real innovation has come from Google Chrome, and most other browsers are copying these methods. This is a very good thing. Hopefully it will reduce the size of botnets in the future, but today most users are vulnerable to a multitude of remote attacks.

Law Enforcement: Failed.

In most cities, crime isn’t a major problem anymore.  You still lock your doors, take basic precautions with your bike, but the truth is, if someone really wanted to steal something from you, they probably could, but crime is not rampant.  You have an expectation that law enforcement will help you.

While law enforcement can sometimes turn a blind eye to a class of crimes, often victimless ones, they have on the whole turned a blind eye to Internet hacking.  As long as an attacker doesn’t go after Sarah Pallin’s email, there are rarely any consequences for most incidents.

Inside Apache, we have discussed going to the FBI several times, but the conclusion every time is it would be a waste of our time.  The FBI doesn’t care about our problems, because we aren’t a political candidate, nor do we have millions of credit cards.    They have their Internet Crime Complaint Center (IC3), but I believe its just a synonym for ‘circular file‘.

Obama’s White House  has published their Cyberspace Policy Review (PDF), and it talks about many great points, but it does not actually bring change to the Internet in any measurable fashion.

I don’t want to lock up 12 year old kids for the rest of their lives because they defaced some website, but there must be a better framework and structure for prosecuting attackers world wide.  No matter the  improvements made to software, users, or best practices, with attackers essentially taking zero risk of ever getting caught today, they have no motivation to stop.

What now?

People are working on making the Internet a better place, but it isn’t enough.  Everyone, in every part of the stack must care about their security.  Providers, both big and small, software developers, open source and proprietary, users both advanced and novice, they all live in a difficult world, and most of them live in an insecure one.

We won’t all switch to OpenBSD.  We won’t all switch to Chrome.   We won’t all stop using passwords.  And the government can’t save you either.

I wish I had a single answer, I dream that it was a solvable problem.  As a technical person, I am more scared of having my own identity stolen, than of any terrorists attacks.

Right now, the mission is on the individual to make smart choices, and do their best, but the only way the world will truly be a better place is if there is a systemic shift, to caring about security of the average human on the Internet, and maybe it will be big companies like Google or Microsoft that end up conquering this problem, but I hope we can learn form existing open source patterns, and find a better distributed way.

Posted in Uncategorized | 11 Comments

Living the dream

ps happy birthday Sam!

Posted in Uncategorized | Leave a comment

sxsw roundup

I did a  post about the first day at SXSW, but I failed to post any others for the following 9 days.  I also made 0 tweets, but successfully checked into a total of 4 places on foursquare.

I am absolutely terrible at this Blogging/Twitter/foursquare thing.

Anyways, SXSW was a blast.  Austin is a great city, and everyone that went there with Cloudkick had a great time.

The high and low lights:

  • Most Memorable: Get Low – Bill Murray, Robert Duvall, and Sissy Spacek all came to this screening.  Did the Q&A afterwords, and Bill Murray great.  Movie while it has some humor building up to the final scene thanks to Bill Murray, was really all about great acting by Robert Duvall.
  • They let you Bar tend? – A Kamikaze does not include cranberry juice.  I know there are a ton of bars (and therefore you need a ton of bar tenders) in Austin, but really?
  • Wow: Werner Vogel at the Big Data Cluster meetup.  Great job by Stu Hood to refute some of his silliness, and it was good to meet more Apache Cassandra developers, like the current PMC chair Jonathan Ellis in person.
  • New Favorite Band: Murder by Death - I loved the combination of Electronic Cello, with the deep voice and great songs.
  • Best non-BBQ Food: Koriente – Nice and Fresh food, and of course bubble tea.  Not that we went to many non-BBQ places this week, but this restaurant was a pleasant surprise and break from BBQ.
  • Best BBQ: Not sure, we went to many BBQs spots, including Stubbs, Iron Works, and Salt Lick.  Salt Lick I guess was a cool experience, but I did enjoy the night Ryan Phillips brought over Rudy’s to our Roof Deck.
  • Speakin of Roof Decks:  Wow, we got really lucky with the place we rented for the week, a roof deck overlooking the entire downtown.
  • Best Pass: I think I enjoyed the Film pass by far, the Interactive… I honestly went to like 3 sessions.  Music was good too, but I felt like I don’t get to see many movies, while music is easier to experience over the internet without a 2 hour time sink.
Posted in Uncategorized | Leave a comment

sxsw day 1

Everyone from Cloudkick arrived in Austin for SXSW 2010 on Thursday evening, so Day 0 technically.  Bob found us a nice condo to rent just outside the downtown, but still easy walking distance to everything.  We picked up our badges from a relatively short line that night, and then headed over to Iron Works BBQ bout 8:45.

On Friday, the first actual day of SXSW:

  • Grabbed lunch at Annies, and met with Jerry Chen who has been contributing to the libcloud project.
  • Opened CASSANDRA-885 bug report.  OOMing after the node crashed, as its unable to replay its commit log.
  • Went to a showing of “The Girl with the Dragon Tattoo“.  The movie was very intense, but I still liked it.  Showing was at the Alamo, and having a Margarita while watching was even better.
  • Had dinner with the entire crew at Stubb’s.  Loved the brisket.
  • After some bar hopping, we ended up back at the pad… playing Rock Band.
Posted in Uncategorized | 2 Comments

Facebook & Open Source: Community is just as important as the Code

I was happy to attend the “Facebook Technology Tasting” event tonight, where they gave a presentation about their newest open source project, HipHop for PHP.

HipHop is definitely some very cool technology, built by an enthusiastic team, solving real world performance issues in large scale websites, and I have no doubt other companies using PHP (Hello Yahoo!) will find it invaluable, and hopefully help turn it into a successful open source project.

What I find most interesting and encouraging about Facebook’s most recent open sourcing efforts, HipHop today and Tornado last year, is how they are taking a dramatically different approach to their earlier open source projects like Thrift or Cassandra.

Thrift, purely as an example, was one of their first projects built internally, and later open sourced.  It originally open sourced on April 1, 2007, but it had a difficult time building a community around the code.  The approach was a blog post, and code basically ‘tossed over the wall’.  External developers did try to contribute, but I believe the interactions were less than optimal, as the original forum for discussion was a Facebook Group — they learned from this quickly,  programmers didn’t like web forums for submitting patches, and later proper mailing lists were setup.

Cassandra was another project that was essentially thrown over the wall, code was available, but there was no imitative to build a community around it.

Today, both Thrift and Cassandra found their way to the Apache Software Foundation, via independent paths.  Apache Cassandra is turning into a very healthy community, having made many releases, and is in the process of graduating to a top level project.  Apache Thrift, made their first release in December 2009, has been slowly gathering more external contributors and an open community built around the code.

What I see happening with both HipHop and Tornado is completely different, and that is what is most encouraging.  From the start, they are doing everything right to encourage an open community be built around these projects.  Open Communities are what create successful projects, and give companies creating the open source projects the most rewards.

When you create an open source project, you gain almost nothing but a PR hit if there isn’t a community built around it.  For infrastructure projects, like HipHop, Cassandra, Thrift, Scribe, and Tornado, the most important thing that gives you the most rewards from open sourcing it, is having other people hack on the code — but more than that, to use the code in their own company.

Just look at the massive community that has exploded around Apache Lucene and Apache Hadoop — Yahoo could of kept this infrastructure project internal, and sure, it might of fulfilled their original goals, but they wouldn’t of ever received the thousands of external contributions, which has turned the Lucene/Hadoop world into one of the most diverse and thriving open source communities of late, giving Yahoo a thousand times return on their investment in Hadoop.

Thank you Facebook for getting it — community is just as important as the code that you are open sourcing, and I would like to wish the HipHop for PHP developers the best of luck with their new open community project!

Posted in Uncategorized | 2 Comments

Released Cloudkick’s for-pay products

I started at Cloudkick in August, and today we announced our for-pay products & Freemium model.  (TechCrunch, GigaOm, ReadWriteWeb, VentureBeat, and more)

I’ve been working along with the entire Cloudkick Team on a few parts of our launch:

  • Integration with Apache libcloud, so now Cloudkick supports EC2, Rackspace, Slicehost, RimuHosting, Linode, VPS.net and GoGrid.
  • Our new monitoring Agent, Cloudkick Agent. Extremely light weight, written in C & Lua.  Hopefully I’l have some time to blog more about some of the cool technology we did here, but we were tired of seeing monitoring agents written in High Level languages using up tons of memory on a Cloud Server.
  • Our Cloudkick Changelog Tool, aka “ckl”.  This tool lets you keep track of a large admin team and what everyone is doing.  The ASF Infrastructure team has already started using it outside of Cloudkick.  Of course, the Cloudkick UI is a little nicer than the demo one with the open source code.
  • Our new Graphing and long term trending system, built on top of Reconnoiter and Apache Cassandra.
  • Learned more about Django and JQuery than ever before.

Now that our big launch is out, hopefully I’ll find a little more time to post on this journal more.

Posted in Uncategorized | 1 Comment

httpd: mod_cache only caching your homepage

mod_cache has a pretty inflexible configuration setup.  CacheEnable can only take a prefix of a path to be cached, and to disable a sub-path with CacheDisable, you need to list all of the possible prefixes (ie, no regular expressions).

Lets say you want to cache just your root page, aka ‘/’, for your website, just in case you get hit by a Slashdot Effect.

For Apache httpd 2.2.12 or newer, you can do this by first enabling Caching on All pages, then setting the no-cache enviroment variable globally, and then unsetting it for a specific path:

CacheDirLevels 2
CacheDirLength 1
CacheEnable disk /
CacheRoot /var/cache/apache2/mod_disk_cache
CacheIgnoreHeaders Set-Cookie
CacheIgnoreNoLastMod On
CacheMaxExpire 600
SetEnv no-cache
<LocationMatch “^/$”>
UnsetEnv no-cache
</LocationMatch>

For Apache httpd before 2.2.12, you need a different method of disabling caching globally, and then re-enabling it.  The easiest way is using mod_headers, to muck with Vary header

Header set Vary *
<LocationMatch “^/$”>
Header unset Vary
</LocationMatch>

Strictly speaking, doing this to the Vary header is an RFC violation, and you best bet is to upgrade to a newer httpd version.  :-)

This works because mod_cache will refuse to cache any HTTP resource with a Vary value of “*”, because this is saying that every response form the origin will be different.

Posted in Uncategorized | 2 Comments