s11n.net
s11n is here to Save Our Data, man!
Project powered by:
SourceForge.net

libc11n: a serialization library/framework for C

c11n ("Cerialization") is the conceptual brother of s11n, providing an API similar, but not identical, to s11n's, but for applications written in C. The C language won't allow us to get the API as elegant as C++ allows for, but we can accomplish quite a lot, nonetheless.

Work began on c11n on 28 October 2008, after an insight obtained while working on another C project allowed me to implement something close to s11n's "serializable traits" in C. While i had originally dismissed the idea of "s11n for C" as basically too difficult to be worth the trouble, it seems i was quite wrong.

The code is quite new and i haven't yet written web pages for it. It does, however, have very complete API docs.

c11n's source code is, as of 2008.11.15, stored in the s11n subversion repository:

http://s11n.svn.sourceforge.net/viewvc/s11n/c11n/

You can find instructions for checking out the source tree here:

http://sourceforge.net/svn/?group_id=104450

License

The majority of the c11n code is in the Public Domain. Some of the underlying utility code is BSD, but i hope to eventually phase that out and replace it with Public Domain code at some point. In no case does it use code which is "viral" (affects the license of code which links to it).

Related work

Troy D. Hanson has written a much lighter-weight serialization library for C called TPL, which can be found at http://tpl.sourceforge.net/. TPL seems to be an especially good choice for apps which only need serialization for very limited contexts (e.g. sending IPC messages). TPL is very small compared to c11n and the client-side API is lower-level than c11n's is, but TPL supports the major features one needs for serialization to files or memory.

Features:

  • c11n can [be taught to] serialize most C types which aren't opaque or transient by nature.
  • It has flexible properties support (moreso than s11n's) which allows one to easily encode/decode multiple POD values to/from a single property. This is useful for storing very simple structs containing only value types.
  • It inherently supports in-memory de/serialization and is ignorant of all data formats. It currently has three format handlers, two of which are compatible with s11n (the "compact" and "expat" formats) and one which reads/writes objects as SQL (it uses sqlite3 to do the real work).
  • It can load data files without the user having to tell it the file type. Like s11n, it dispatches to a different handler based on the first line of the input stream.
  • Its API and conventions, when used properly, ensure that no memory is leaked when deserialization fails. (Valgrind agrees with me!)
  • Several of the core interfaces are abstract, and a basic subclassing mechanism allows clients to add their own custom stream handlers (e.g. a wrapper for a custom File object, a C++ stream, or a network socket), i/o handlers (i.e. file formats), and marshallers (they do the conversions between c11n and client-side types).
  • Because all serialized data is ultimately represented as strings, there are no endianness issues.
  • Any number of s11n_traits (called "marshallers" in c11n) can be used for any given type. In libs11n only one can exist (because it must be a unique type).
  • It's got boatloads of API docs.
  • The core library itself has no 3rd-party library dependencies. Some (optional) file formats require 3rd-party libraries, e.g. sqlite3 or libexpat.
  • According to SLOCcount, c11n's core library is currently (6 Nov 2008) worth $72582. (Not that i got one shiny cent for it, though. Simone keeps asking me, "so are we rich yet?")
  • Comes with the c11nconvert tool to convert data between the various c11n data formats. Since some formats are compatible with s11n, c11nconvert and/or s11nconvert can be used to port data between formats supported by only c11n or s11n.

Misfeatures and notable caveats:

  • It doesn't do any type of introspection or automatic selection of members to serialize. Just like in s11n, you have to "train" the library to work with your types (by providing a marshaller implementation for that type). Once "trained", it's all easy from there. (Training it also isn't too hard, but it is more work than in s11n.)
  • The boilerplate code for the marshaller implementations is much longer than equivalent code in s11n (again, due to the lack of templates in C). It can easily be generated from a shell script but it must be done on a per-type basis, so the real lines of code go up. Unlike in s11n/C++, where one set of templates can be used for all/most types, we have to create uniquely-named copies of otherwise identical functions.
  • It can't provide much of any compile-time type safety as far as passing arbitrary serializable objects around goes. The only generic option for passing around arbitrary objects in C is a (void*), so we're stuck with that limitation. Used properly, however, the API is quite type-safe.
  • Like s11n, it uses deep copying of pointer members for serialization. The primary reason for this is that any other approach requires explicit support in each data format, complicating their implementations and making the data (e.g. XML) difficult to port to external applications. Shallow serialization can be done in c11n but requires client-side work.
  • Aside from some of the file formats being compatible, c11n is not directly compatible with s11n, nor are there plans to ever reimplement s11n on top of c11n. That said, c11n's implementation has given me some good ideas for changes for s11n 2.x (s12n?). i might also port the some of i/o c11n handlers to s11n, since the c11n implementations don't use flex (as most of the s11n implementations do).