Saving untold millions of trees... of data.
Project powered by:
Did ya ever want to save your objects into a database? Or convert them
from whatever serializable format they live in, to a database? Now we
can... Using this add-on DLL we can convert any and all s11n-saved
objects and files to and from database entries. We can also save and
load our objects directly to/from a database, ignoring files
altogether. And, of course, s11n's API ensures that our client-side
code doesn't need to know if it's using a database server in Taiwan
or if it's saving to ~/foo.s11n.
ACHTUNG: This code is not yet ported to s11n versions newer than 1.0.x, and may
never be. If you actually use this add-on, let me know and i'll port it
to 1.1+ for you, otherwise i probably won't bother.
mysql_serializer ("MyS", for short)
The s11n::io::mysql_serializer class is powered by a C++
library for mysql written by
He's got an impressive set of utility libraries on his site.
Using MyS does not require his mysql library: this tree comes with a slightly hacked copy of his
MyS can load and save objects from and to a mysql database, and works
essentially just like any other s11n serializer:
client-transparently. That is, the client never needs to know if his
data is going to an XML file, an s11n-specific data format, or
This Serializer is most interesting for following points:
- The core library is 100% ignorant of it, and can load it by being given
it's class name.
- It proves that clients can provide their own i/o handlers
without touching the core library. (Previously this was pure theory.)
A client-written Serializer becomes a full-rights player in the framework,
just like the Serializers shipped with the library.
- It demonstrates that the previous point can be accomplished with relatively
little effort. i spent about 8 hours coding it and touching it up, including
the learning curve for the mysql-related code, and tweaking s11nconvert
and the core lib to be able to import DLLs and arbitrary Serializers this way.
- It demonstrates that non-file-based Serializers are a realistic option,
and are treated with the same API transparency as their file-based cousins.
Demonstrates the ability to chain Serializers together to do more complex work.
This one, e.g., when serializing to a STREAM it saves a record to the db and
saves proxying information to the stream using a different Serializer
(i.e., one that natively supports streams). That proxying info can later be loaded
just like any s11n data, but it will direct MyS to load the db object from the database.
First, please see the Big Fat Caveats detailed below!
Then visit the downloads page.
The 2004.08.15 release of MyS will work with s11n 0.9.6 but does not
have STREAM i/o support. The 2004.08.16+ release supports both files
and streams but requires s11n 0.9.7 (which won't be released until 17 Aug).
Big Fat Caveats!
As this add-on is required to be released under the GNU General Public License (GPL),
using it will force your libs11n library (and related libs) to fall under the GPL.
To the best of my understanding, this causes your binaries to fall
under the GPL, but not their sources, as the s11n sources do not in any way
depend on MyS, nor do they "even know it exists" (on a technical level, that is).
(If someone would
enlighten me on
the exact license-level implications of this
i would be very appreciative.)
Aside from any potential philosophical or licensing issues, there are
a couple technical ones:
- This code is brand new (only a few hours old, as of the
initial release (15 Aug 2004)), and probably contains some notable bugs. That said,
i have been able to use it with s11n for a wide variety of s11n data and haven't
seen any particular problems yet. Theoretically, saving very complex strings may
cause some Grief, but this isn't proven yet (and should be trivial to fix when
it shows up).
- This code does no error checking for things like a broken db connection.
It does properly recognize when it cannot connect at all, but if a connection
dies during the life of the app Grief will probably result. Since the run-time
of s11nconvert is normally only a few seconds, this has not yet become a
practical concern, but it easily could for longer-running s11n applications.
(To be fixed, maybe. Patches are of course welcomed.)
Using it requires a running mysql server and having a valid login for it,
with write access to the database (see below). It is possible to create
a user just for this purpose, and configure the build tree with
--mysql-user=your_s11n_user. That does not create the user or
add any rights for it - it only sets the default user name used
by the Serializer when logging in to the db. There are also the following
related configure options: --mysql-db, --mysql-password,
and --mysql-host, all of which are supported by both the
configure script and s11nconvert.
MyS is horribly slow: 5-7 nodes per second is about it's
peak, even when using a local mysql server on a single-user machine.
It doesn't yet allow very fine control over the db connections. Patches
would be most appreciated (and feel free to mutilate the db-related interface
as much as you need to, just not the Serializer interface!).
- Minor bug: there appears to be some odd whitespace translation
being done at some undetermined point in the data conversions to/from
the server. Some whitespace (ascii code: oct 040, hex 0x20,
dec 32) gets replaced with a different type of space (ascii code: oct
0137, hex 0x5f, dec 95). It doesn't happen for all spaces, only a
few, so it's really odd. Reproduce by: import a file to a db, then back to a
file, and compare the original file with the routed-through-mysql file.
It happens on all data formats.
The default database name for MyS is mys11n, but this can be changed
at build-time by passing --mysql-db to the configure script
or at runtime by passing it to s11nconvert.
Here's what the mys11n db code looks like:
CREATE TABLE nodeinfo (
id int(11) NOT NULL auto_increment,
parent_id int(11) NOT NULL default '0', // = parent node's nodeinfo.id, or 0
class varchar(100) NOT NULL default '', // node_traits<>::class_name()
name varchar(100) NOT NULL default '', // node_traits<>::name()
PRIMARY KEY (id)
CREATE TABLE properties (
node_id int(11) NOT NULL default '0', // = nodeinfo.id
pkey varchar(255) NOT NULL default '', // property key
pvalue text // property value
CREATE TABLE pseudofiles (
id int(11) NOT NULL auto_increment, // "inode" number ;)
node_id int(11) NOT NULL default '0', // = nodeinfo.id
name varchar(255) NOT NULL default '', // "filename"
PRIMARY KEY (id)
(Import that into your mysql database. You can grab the
SQL file from the source tree, sql/mys11n.mysql.)
If you know you will only use relatively small data, feel free to change
the varchar and text fields to something smaller, and
change the table TYPEs to whatever you need. MyS is lax about property types:
anything that node_traits<>::get/set() can handle, MyS can handle.
If you are ONLY going to store numbers, for example, you could theoretically
change properties.pvalue to an appropriate numeric type (and get
away with it, i mean).
Nodes are stored in so-called pseudofiles,
with each "file" containing exactly one root node, which itself may
contain any number of nodes. Node "filenames" may be any db-legal
string, up to a maximum of 255 characters (or more, if you modify the
db table to accomodate that). Root node names are case-sensitive and
must be unique - an existing object with a duplicate name will be
deleted before a new one is saved, as will any children and properties
of that object. i.e., it behaves similarly to a filesystem, by
"overwriting" nodes. For portability and ease of db maintenance, this
is not enforced at the db-level, but by the Serializer.
Using MyS with s11nconvert
s11nconvert 0.9.6 adds a couple of features to load arbitrary
DLLs and to force a specific Serializer class to try to read
a given input, as demonstrated in the examples below. Normally
the input Serializer is determined automatically
based on a file's magic cookie, but db records don't have
associated files, and therefor can't have a cookie, can they?
Thus s11nconvert now has the -S option to force a specific
input Serializer (which will obviously fail if you try to use
the wrong Serializer to read a data set).
Import a data file into the database:
In the previous example we load in a file using -f my.s11n, set the
output format with -s mysql[_serializer], and save it to the "file" named
MyDbObject. The -s classloads the Serializer's DLL, if needed,
but we can also use -dl DLLNAME to explicitely load a DLL, and must do
this when the DLL's name does not match the class name of the Serializer (actually,
it's a tad more complicated than that, but that's not important now...).
s11nconvert -dl mysql -f my.s11n -s mysql -o MyDbObject
// or, more simply:
s11nconvert -f my.s11n -s mysql -o MyDbObject
Export a database pseudo-file to a data file:
(Remember that MyDbObject is not a real file, but a pseudo-file
in the database.)
s11nconvert -S mysql -f MyDbObject -s simplexml > my.s11n
Here we loaded the DLL, then used -S mysql to force MyS to be
the input Serializer. The -f option tells
s11nconvert to the load "file" named MyDbObject. We set the
ouput Serializer using -s simplexml, and send it to a file
with either > my.s11n or -o my.s11n.
To copy a mysql-hosted object to another object in the database:
That forces MyS to be used as the input Serializer (-S),
reads in the MyDbObject using that Serializer, sets the output
Serializer (-s), and saves it to a new "file" called "My Copied Object".
s11nconvert -S mysql -f MyDbObject -s mysql -o "My Copied Object"
Using s11n 0.9.7+ and mysql_serializer 2004.08.16+, if you try to
output a mysql record to a stream (i.e., by omitting the -o
option) MyS will create a set of proxy data which it can later use to
load the object from it's database:
s11nconvert -f in.paren -s mysql
Note that -S is not needed when this option is used, as
the special magic cookie effectively does the same thing, but
does so at a much deeper level in the library than -S
does, and also works when loading streams from arbitrary client code
(as opposed to via s11nconvert).
That proxy code can be fed right back into s11nconvert (or client
code) to load the appropriate object:
s11nconvert -f infilename -s simplexml
<somenode s11n_class="NoClass" a="()\b" foo="bar" long="this is a long property">
<fred s11n_class="FredClass" key="value" />
<wilma s11n_class="WilmaClass" the="lovely wife" />
<betty s11n_class="BettyClass" value="the apple of Barney's eye" />
<deeper s11n_class="Foo" how_deep="really deep!">
<and_deeper s11n_class="Ouch" />
The great part is that this support works at such a level that
arbitrary client code need not know if it's using a database, and
client-side tools (like s11nconvert and
the s11n data browser) can read
and write such data just as easily as they would any other.
s11nconvert passes on all command-line arguments to DLLs it loads, and MyS supports
several options which you can see by loading the DLL and passing --help
(Tip: any client app which initializes acme::argv_parser::args(int,char**) inherently
passes on all arguments to other objects this way.)
s11nconvert -dl mysql_serializer --help
Sets the s11n/mysql database's name.
Sets the mysql db server.
Sets the mysql user's password.
Sets the mysql db user.
Toggles on verbose mode for mysql_serializer.
If you've read this far, you deserve some bonus tips: the -dl DLLNAME
option has a couple of interesting features:
- It has an evil twin, -DL, which works identically but does not consider
a failed DLL load to be an error (i.e., it ignores such errors). -dl, on the
other hand, will fail if a given DLL cannot be loaded.
- It is optional if a library named SERALIZER.so or SERIALIZER_serializer.so
can be found and opened, where SERIALIZER is the name passed to -S
- May be passed base-names if DLLNAME.so is in s11n's classpath
(as described in the library manual).
- May be passed multiple times, unlike the other options. Use it
once for each DLL you want to load, e.g. -dl MyDLL -dl YourDLL.
- s11nconvert does nothing with the DLLs except to open them.
i.e., it does not look for any specific symbols in any DLLs.
In the case of DLLs containing s11n registration code, this simple act is enough
to activate the registrations they contain. You may, if you like, use
this feature for opening non-s11n-related DLLs, but i can personally see
little use in doing so in the context of s11nconvert.
Using MyS in client code
First, see the Big Fat Caveats, above.
To use MyS from s11nlite, you have a couple
Simply do the following from somewhere in your app (e.g.,
your main() routine):
Once you have done this, s11nlite::save() will use that
Serializer. If you're connecting to the db using the build-time
defaults, or using parameters passed in to s11nconvert, then this is
all you need to do to make your database the default i/o repository
used by s11nlite.
s11nlite::serializer_class( "mysql_serializer" );
Alternately, you can load it dynamically:
should do the trick. That will load the DLL, if needed (and if it can
be found). Simply loading the DLL will take care of the classloader
and registration work which is needed to plug the little guy into
// if serialize_class() has been set:
s11nlite::serializer_base_type * ser =
s11nlite::serializer_base_type * ser =
s11nlite::create_serializer( "mysql_serializer" );
If you need to customize the db connection, look at Database.h, then do:
Note that this requires including GPL'd header files,
which will immediately cast your source code into the GPL!
// If you want to see any error messages, you must
// register an error handler:
// Create db object and connect...
if( db.Connected() )
delete( s11n::io::mysql_database() );
s11n::io::mysql_database( &db );
// bail out...
Note also that the approach shown here is slightly different
from that used by the original mysql driver code,
so be aware of that if you happen to also use that code.