s11n.net
Saving untold millions of trees... of data.
Project powered by:
SourceForge.net

Serializers

An introduction to s11n's i/o layer.

Here we will briefly discuss what Serializers are and how to get at them from within s11nlite. They are covered in detail in the s11n library manual. More info about data files can be found on this page.

First let's take a quick look at a Serializable mini-demo screenshot (28k PNG), showing a simple serialization proxy and 6 of the supported file formats.

As the s11n documentation repeats over and over, the core of libs11n is 100% data-format agnostic. Of course, the library would be pretty useless without at least one reference implementation of an i/o handler, which is there the s11n::io namespace comes in to play.

The s11n::io namespace provides one potential i/o layer for the core. That is, clients may use whatever i/o layer they like. As anyone who has ever done any i/o work understands, loading data is far from trivial, so it is expected that clients will use the i/o layer shipped with s11n, as opposed to rolling their own (though they need not). Great care is taken to keep the core layers independent of the s11n::io namespace, so clients do have the option of plugging in their own layer (though this would admittedly require a non-trivial amount of client-side effort).

One of s11n's strengths is the ability to combine arbitrary i/o layers with the core's containers. For example, s11nlite ties the s11n core and s11n::io together, hiding the details behind an extremely easy-to-use interface which keeps client code ignorant of the data format(s) used to store their data (as well as insulating the user from a large number of deceptively difficult technical challenges involved with saving and loading data).

What is a Serializer?

Despite the similar name, Serializers aren't directly related to Serializables. Serializers are instead designed to work with Data Nodes - that is, with abstract data containers. Serializers are responsible for saving data by converting trees of Data Nodes to the Serializer's supported grammar (e.g., an XML dialect). Conversely, for loading, Serializers are expected to be able to read the dialect they have written, so they can read streams to build up trees of Data Nodes. It is these Data Node trees which the core works with, keeping it ignorant of any stream-level i/o. As most programmers will agree: stream-level failures are much more likely to happen than container insertion/extraction failures. Since the core works only with containers, and not streams, it's internals are remarkably short on source code: the most error-prone part of the serialization process, stream i/o, is all marshalled by the Serializers.

The exact conventions laid out by the included Serializer implementations are described at length in the library manual, but in short they are expected to be able to read and write data which is structured like a DOM (Document Object Model). Put simply, this basically means data which can be formatted in a hierarchical data store such as XML. While XML itself is very important to consider for Serialization, the library is simply concerned with structured data, whether or not the actual data store is XML.

How do I use a Serializer?

In short, clients must select a Serializer when saving data, but normally need not do so when loading. Each data stream is assumed to have a so-called "magic cookie" as its first line of data. This cookie is used to map Serializers to data formats, which allows the library to dynamically dispatch input to the responsible Serializer. When saving a file, selection of a Serializer is essential, as the library cannot guess which one the user wants to use. That said, doing so is trivial when one uses s11nlite, as shown in this example:

s11nlite::serializer_class( "my_serializer" );

That sets the library-wide default Serializer, which will be used by all calls to s11nlite::save() and related functions (that is, any which write to a stream or take a filename parameter).

To create a Serializer object:

// create an instance of s11nlite's default Serializer:
s11nlite::serializer_base_type * ser =
s11nlite::create_serializer();
// or create a specific one, by name:
s11nlite::serializer_base_type * ser =
s11nlite::create_serializer("SerializerName");

The above will try to classload the given Serializer, which must extend s11nlite::serializer_base_type and be registered with the s11nlite::serializer_base_type classloader (what does that really mean? Please RTFM and see the supplied Serializers' code.).

The string passed to serializer_class() may be any registered (and classloadable) Serializer type. Use s11nconvert --help or (with s11n 0.9.7 or higher) s11nconvert --known-serializers to see a list of those available to your library. The library ships with the following Serializers, listed using their short-form names (which are accepted by create_serializer() and the like):
  • funtxt - a simple text-based grammar, similar to many configuration file formats. The "fun" part of the name has a history: funtxt and its brother, funxml, are formats which have been used by the QUB project since the summer of 2000, and have both proven to be very robust for most uses. Why "fun"? Because the author of the original funtxt/funxml parsers, Rusty Ballinger, calls his library libFunUtil.
  • funxml - a very basic XML dialect with some extensions to standards-compliant XML (e.g. it supports nodes with numeric names, which standard XML parsers do not allow). This Serializer is probably the most robust of those shipped with the library, in terms of what types of data is can safely hold. It is, however, also the most verbose (in terms of generated file sizes).
  • simplexml - a leaner XML dialect than funxml, which stores node data as XML attributes. Also has some extensions over standard XML. Very well-suited for "small" data, such as numbers.
  • compact - a binary-like data format (not human-editable).
  • parens - a compact lisp-like format, especially well-suited for saving human-editable sets of small data, like numbers and simple strings. It's also very emacs-friendly :).
  • wesnoth - a simple XML-ish format which uses nodes which look like [nodename]...[/nodename]. Based off of the custom file format used in the game The Battle for Wesnoth. Of interest mainly for it's human-editability.
The following Serializers also exist, but may or may not be available to any given s11n installation:
  • expat - a libexpat-based Serializers which is only installed if the configure script finds expat on your system. It uses a standards-compliant, industrial strength XML parser, so some data saved with other Serializers may not be loadable using this Serializer: funxml, for example, allows numeric-named nodes, as may algorithms like s11n::map::serialize_streamable_{map,list}(). When data portability is a concern, i.e. you want to use s11n-generated data in non-s11n-powered applications, this Serializer is almost certainly the best choice. Remember, too, that you can always use s11nconvert to export your s11n-based data to expat format:
    s11nconvert -f myfile.s11n -s expat -o outfile.xml
  • sqlite3_serializer is an add-on for 1.2 providing serialization over sqlite3 databases.
  • mysql_serializer - a proof-of-concept Serializer which uses a mysql database as its back-end. Since the database is incapable of dealing with streams, this class delegates stream-based de/serialization to one of the stream-based Serializers.


If you have decided to bypass s11nlite and use the core directly, you're own your own, but here's a tip: see the functions in the s11n::io namespace which take a SerializerType template type parameter. Also see the library manual for more details.

Which Serializer should I use?

This is largely a question of taste and of project requirements. If the data must be usable from non-s11n-powered applications, expat is an excellent choice because it uses a standards-compliant (and well-proven) XML parser. funxml is also a good choice for XML, but it has some extensions to XML and may not be usable with standards-compliant XML tools. If file size is a concern, parens, funtxt and compact are good choices. If human editability is a concern, parens and funtxt are excellent choices. (In fact, both of those Serializers support extra features to aid hand-editing, like the ability to comment out blocks of data.) What i do use? i almost always use parens, mainly because it's so easy to edit and easy to read, and because emacs' default parens-matching and auto-indentation work well with it. i also get a lot of use out of funxml, simply because it is a format i have come to "know and trust" over the years (funxml and funtxt were hijacked from the libFunUtil project, and both have long histories).

That said, there are no known data-related incompatibilities between formats except where expat is concerned: the majority of the Serializers will allow some constructs which expat will not (e.g., as mentioned before, numeric node names and property keys). Note also that some Serializers are known to be non-thread-safe when used for reading from multiple threads, which may preclude their use in certain environments (the library manual covers this aspect in more detail).

Before deciding on a format (or formats), users are encouraged to try them out and find out which (if any) of the available Serializers fit the bill most closely. The s11nconvert tool has a variety of options and features which are helpful in this regard. For example, it can convert data from any Serializers to another, and this can be used to check for compatibility between formats.

Keep in mind that once you settle on a given Serializer you are not stuck with it: you are free to change to another at any time, because the API is format-agnostic and will happily use any Serializer it is provided with. As long as you do not rely on Serializer-specific features (like simplexml's CDATA handler) then switching between formats is trivial: if you use s11nlite it becomes as simple as calling s11nlite::serializer_class("MyDesiredSerializer").

As a final note: Serializer implementers should make all reasonable efforts to make their Serializers compatible with the conventions laid out by the reference implementations, and to ensure that their formats can be converted to and from all others, e.g. using s11nconvert.

And so that's it?

No, of course that's not all there is to know, but this is essentially all most clients will ever need to know. See the library manual to learn more than you would probably like to know about Serializers.

A final tip: the tools s11nconvert (distributed as part of libs11n) and s11nbrowser can help you work with data saved using the s11nlite framework.