I’m still (slowly) getting up to speed on Ruby and various associated libraries and frameworks. Tonight I spent some time looking into YAML. I always thought that this stood for “Yet Another Markup Language”, but apparently it actually stands for “YAML Ain’t Markup Language”.

Essentially, YAML is a lightweight alternative to XML. As somebody who entered the job market in the late 1990s, XML usually seems to me like the natural choice for expressing data. The combination of Java (or other programming languages) with XML can be very powerful. At this point XML is very mature. It offers several powerful ways of defining schemas (such as XML Schema or the older DTD format), performing transformations, it supports namespaces, and it is a vital aspect of many web service standards (such as SOAP). At the same time, XML can be pretty cumbersome to deal with, and in many situations the power that it brings comes at the cost of productivity loss and added complexity. Particularly for Java, there have been a number of attempts at improving XML’s usability, for example by implementing XML / Object binding frameworks (such as JAXB, Castor, or JiBX). These tools allow you to indirectly deal with XML documents through Java domain objects, which is often a lot more convenient than working with SAX or DOM parsers.

Because of the dynamic nature of Ruby, this is slightly less of an issue. For example, the excellent REXML parser allows Ruby programs to work with XML using a nice syntax that is very much in line with “the Ruby way”. This helps when writing code to parse XML files, but it still does not solve the inherent problem of dealing with complex XML files.

This is where YAML comes in. YAML syntax is very easy and concise. It mostly eschews tags and other types of complex markup in favor of indentation and a few basic symbols (such as dashes at the beginning of a line to denote a list item). In many ways, this is not unlike Wiki markup (minus the formatting). Here is a simple example of how an invoice might be represented in YAML.

In most programming languages (including Ruby and other scripting languages), parsed YAML documents are represented using the language’s native constructs. In Ruby, this means that parsing a YAML document results in a hierarchical structure of arrays and hashes that is significantly easier to manipulate than the complex object models that typically represent an XML DOM tree.

Of course, support for YAML is bidirectional, i.e. Ruby data structures can be written to a YAML file. Even better, YAML is one of several possible mechanisms for serializing Ruby objects using Ruby’s built-in serialization mechanism. This is very nice, as it provides a human readable alternative to the usual binary serialization formats, as well as preserving backwards compatibility of serialized Ruby objects with future versions of the language, which is not guaranteed when using binary serialization.

YAML has been widely adopted by the Ruby community as the de-facto standard for configuration files and other types of data (at least those that aren’t simply expressed as native Ruby, which is also a very valid option in many cases, as Ruby’s concise syntax lends itself quite well to this). If you work with Ruby frameworks such as Rails, chances are that you will come across the occasional YAML file (such as the database configuration file in Rails).

Unfortunately the Java efforts to support YAML have been dropped in favor of OGDL, which looks very similar to YAML but is apparently even simpler.

If you are curious about using YAML with Ruby and would like to check it out, I recommend that you start with the YAML Cookbook for Ruby. YAML has been included with Ruby since version 1.8.0, so chances are you’ll be able to play with it without having to download any additional libraries. Simply fire up IRB and give it a shot. For other languages (including Python, Perl, Java, Javscript, PHP, OCaml, and Cocoa), check the download page.