Bringing serialization into the 21st century... bit by bit.
Project powered by:
|
libc11n: a serialization library/framework for C
c11n ("Cerialization") is the conceptual brother of s11n,
providing an API similar, but not identical, to s11n's, but for
applications written in C. The C language won't allow us to get
the API as elegant as C++ allows for, but we can accomplish quite
a lot, nonetheless.
Work began on c11n on 28 October 2008, after
an insight obtained while
working on
another C project
allowed me to implement something close to s11n's "serializable traits" in C.
While i had originally dismissed the idea of "s11n for C" as basically too
difficult to be worth the trouble, it seems i was quite wrong.
The code is quite new and i haven't yet written web pages for it. It does, however,
have very complete API docs.
c11n's source code is, as of 2008.11.15, stored in the s11n subversion repository:
http://s11n.svn.sourceforge.net/viewvc/s11n/c11n/
You can find instructions for checking out the source tree here:
http://sourceforge.net/svn/?group_id=104450
License
The majority of the c11n code is in the Public Domain. Some of the underlying utility code is
BSD, but i hope to eventually phase that out and replace it with Public Domain code at some point.
In no case does it use code which is "viral" (affects the license of code which links to it).
Related work
Troy D. Hanson has written a much lighter-weight serialization library for C called TPL, which
can be found at http://tpl.sourceforge.net/. TPL seems
to be an especially good choice for apps which only need serialization for very limited contexts
(e.g. sending IPC messages). TPL is very small compared to c11n and the client-side API
is lower-level than c11n's is, but TPL supports the major features one needs for serialization
to files or memory.
Features:
- c11n can [be taught to] serialize most C types which aren't opaque or
transient by nature.
- It has flexible properties support (moreso than s11n's) which allows one to easily
encode/decode multiple POD values to/from a single property. This is useful for storing
very simple structs containing only value types.
- It inherently supports in-memory de/serialization and is ignorant
of all data formats. It currently has three format handlers, two of
which are compatible with s11n (the "compact" and "expat" formats) and one which
reads/writes objects as SQL (it uses sqlite3 to do the real work).
- It can load data files without the user having to tell it the file type.
Like s11n, it dispatches to a different handler based on the first line of the
input stream.
- Its API and conventions, when used properly, ensure that no memory is leaked when
deserialization fails. (Valgrind agrees with me!)
- Several of the core interfaces are abstract, and a basic subclassing mechanism
allows clients to add their own custom stream handlers (e.g. a wrapper for
a custom File object, a C++ stream, or a network socket), i/o handlers
(i.e. file formats), and marshallers (they do the conversions between c11n
and client-side types).
- Because all serialized data is ultimately represented as strings,
there are no endianness issues.
- Any number of s11n_traits (called "marshallers" in c11n) can be used for any given
type. In libs11n only one can exist (because it must be a unique type).
- It's got boatloads of API docs.
- The core library itself has no 3rd-party library dependencies. Some (optional) file formats
require 3rd-party libraries, e.g. sqlite3 or
libexpat.
- According to SLOCcount, c11n's
core library is currently (6 Nov 2008) worth $72582. (Not that i got one shiny cent for it,
though. Simone keeps asking me, "so are we rich yet?")
- Comes with the c11nconvert tool to convert data between the various c11n data
formats. Since some formats are compatible with s11n, c11nconvert and/or s11nconvert
can be used to port data between formats supported by only c11n or s11n.
Misfeatures and notable caveats:
- It doesn't do any type of introspection or automatic selection
of members to serialize. Just like in s11n, you have to "train"
the library to work with your types (by providing a marshaller
implementation for that type). Once "trained", it's all easy from there.
(Training it also isn't too hard, but it is more work than in s11n.)
- The boilerplate code for the marshaller implementations is much longer
than equivalent code in s11n (again, due to the lack of templates in C). It can easily be
generated from a shell script but it must be done on a per-type basis, so
the real lines of code go up. Unlike in s11n/C++, where one set of templates can
be used for all/most types, we have to create uniquely-named copies of otherwise
identical functions.
- It can't provide much of any compile-time type safety as far as
passing arbitrary serializable objects around goes. The only generic
option for passing around arbitrary objects in C is a (void*), so
we're stuck with that limitation. Used properly, however, the API is
quite type-safe.
- Like s11n, it uses deep copying of pointer members for serialization.
The primary reason for this is that any other approach requires explicit
support in each data format, complicating their implementations and making
the data (e.g. XML) difficult to port to external applications. Shallow
serialization can be done in c11n but requires client-side work.
- Aside from some of the file formats being compatible, c11n is not
directly compatible with s11n, nor are there plans to ever reimplement
s11n on top of c11n. That said, c11n's implementation has given me some
good ideas for changes for s11n 2.x (s12n?). i might also
port the some of i/o c11n handlers to s11n, since the c11n implementations don't
use flex (as most of the s11n implementations do).
|