s11n.net
Save the planet. Save the trees. Save your data, man.
Project powered by:
SourceForge.net

MySQL Serializer

Did ya ever want to save your objects into a database? Or convert them from whatever serializable format they live in, to a database? Now we can... Using this add-on DLL we can convert any and all s11n-saved objects and files to and from database entries. We can also save and load our objects directly to/from a database, ignoring files altogether. And, of course, s11n's API ensures that our client-side code doesn't need to know if it's using a database server in Taiwan or if it's saving to ~/foo.s11n.

ACHTUNG: This code is not yet ported to s11n versions newer than 1.0.x, and may never be. If you actually use this add-on, let me know and i'll port it to 1.1+ for you, otherwise i probably won't bother.

mysql_serializer ("MyS", for short)

Credits:
The s11n::io::mysql_serializer class is powered by a C++ library for mysql written by Anders Hedström. He's got an impressive set of utility libraries on his site. Using MyS does not require his mysql library: this tree comes with a slightly hacked copy of his code.

MyS can load and save objects from and to a mysql database, and works essentially just like any other s11n serializer: client-transparently. That is, the client never needs to know if his data is going to an XML file, an s11n-specific data format, or a database.

This Serializer is most interesting for following points:
  • The core library is 100% ignorant of it, and can load it by being given it's class name.
  • It proves that clients can provide their own i/o handlers without touching the core library. (Previously this was pure theory.) A client-written Serializer becomes a full-rights player in the framework, just like the Serializers shipped with the library.
  • It demonstrates that the previous point can be accomplished with relatively little effort. i spent about 8 hours coding it and touching it up, including the learning curve for the mysql-related code, and tweaking s11nconvert and the core lib to be able to import DLLs and arbitrary Serializers this way.
  • It demonstrates that non-file-based Serializers are a realistic option, and are treated with the same API transparency as their file-based cousins.
  • Demonstrates the ability to chain Serializers together to do more complex work. This one, e.g., when serializing to a STREAM it saves a record to the db and saves proxying information to the stream using a different Serializer (i.e., one that natively supports streams). That proxying info can later be loaded just like any s11n data, but it will direct MyS to load the db object from the database.

Source code...

First, please see the Big Fat Caveats detailed below!

(i'll wait...)

Then visit the downloads page.
The 2004.08.15 release of MyS will work with s11n 0.9.6 but does not have STREAM i/o support. The 2004.08.16+ release supports both files and streams but requires s11n 0.9.7 (which won't be released until 17 Aug).

Big Fat Caveats!

As this add-on is required to be released under the GNU General Public License (GPL), using it will force your libs11n library (and related libs) to fall under the GPL. To the best of my understanding, this causes your binaries to fall under the GPL, but not their sources, as the s11n sources do not in any way depend on MyS, nor do they "even know it exists" (on a technical level, that is). (If someone would enlighten me on the exact license-level implications of this i would be very appreciative.)

Aside from any potential philosophical or licensing issues, there are a couple technical ones:
  • This code is brand new (only a few hours old, as of the initial release (15 Aug 2004)), and probably contains some notable bugs. That said, i have been able to use it with s11n for a wide variety of s11n data and haven't seen any particular problems yet. Theoretically, saving very complex strings may cause some Grief, but this isn't proven yet (and should be trivial to fix when it shows up).
  • This code does no error checking for things like a broken db connection. It does properly recognize when it cannot connect at all, but if a connection dies during the life of the app Grief will probably result. Since the run-time of s11nconvert is normally only a few seconds, this has not yet become a practical concern, but it easily could for longer-running s11n applications. (To be fixed, maybe. Patches are of course welcomed.)
  • Using it requires a running mysql server and having a valid login for it, with write access to the database (see below). It is possible to create a user just for this purpose, and configure the build tree with --mysql-user=your_s11n_user. That does not create the user or add any rights for it - it only sets the default user name used by the Serializer when logging in to the db. There are also the following related configure options: --mysql-db, --mysql-password, and --mysql-host, all of which are supported by both the configure script and s11nconvert.
  • MyS is horribly slow: 5-7 nodes per second is about it's peak, even when using a local mysql server on a single-user machine.
  • It doesn't yet allow very fine control over the db connections. Patches would be most appreciated (and feel free to mutilate the db-related interface as much as you need to, just not the Serializer interface!).
  • Minor bug: there appears to be some odd whitespace translation being done at some undetermined point in the data conversions to/from the server. Some whitespace (ascii code: oct 040, hex 0x20, dec 32) gets replaced with a different type of space (ascii code: oct 0137, hex 0x5f, dec 95). It doesn't happen for all spaces, only a few, so it's really odd. Reproduce by: import a file to a db, then back to a file, and compare the original file with the routed-through-mysql file. It happens on all data formats.

Db structure

The default database name for MyS is mys11n, but this can be changed at build-time by passing --mysql-db to the configure script or at runtime by passing it to s11nconvert.

Here's what the mys11n db code looks like:
CREATE TABLE nodeinfo (
id int(11) NOT NULL auto_increment,
parent_id int(11) NOT NULL default '0', // = parent node's nodeinfo.id, or 0
class varchar(100) NOT NULL default '', // node_traits<>::class_name()
name varchar(100) NOT NULL default '', // node_traits<>::name()
PRIMARY KEY (id)
) TYPE=MyISAM;

CREATE TABLE properties (
node_id int(11) NOT NULL default '0', // = nodeinfo.id
pkey varchar(255) NOT NULL default '', // property key
pvalue text // property value
) TYPE=MyISAM;
CREATE TABLE pseudofiles (
id int(11) NOT NULL auto_increment, // "inode" number ;)
node_id int(11) NOT NULL default '0', // = nodeinfo.id
name varchar(255) NOT NULL default '', // "filename"
PRIMARY KEY (id)
) TYPE=MyISAM;

(Import that into your mysql database. You can grab the SQL file from the source tree, sql/mys11n.mysql.)

If you know you will only use relatively small data, feel free to change the varchar and text fields to something smaller, and change the table TYPEs to whatever you need. MyS is lax about property types: anything that node_traits<>::get/set() can handle, MyS can handle. If you are ONLY going to store numbers, for example, you could theoretically change properties.pvalue to an appropriate numeric type (and get away with it, i mean).

Nodes are stored in so-called pseudofiles, with each "file" containing exactly one root node, which itself may contain any number of nodes. Node "filenames" may be any db-legal string, up to a maximum of 255 characters (or more, if you modify the db table to accomodate that). Root node names are case-sensitive and must be unique - an existing object with a duplicate name will be deleted before a new one is saved, as will any children and properties of that object. i.e., it behaves similarly to a filesystem, by "overwriting" nodes. For portability and ease of db maintenance, this is not enforced at the db-level, but by the Serializer.

Using MyS with s11nconvert

s11nconvert 0.9.6 adds a couple of features to load arbitrary DLLs and to force a specific Serializer class to try to read a given input, as demonstrated in the examples below. Normally the input Serializer is determined automatically based on a file's magic cookie, but db records don't have associated files, and therefor can't have a cookie, can they? Thus s11nconvert now has the -S option to force a specific input Serializer (which will obviously fail if you try to use the wrong Serializer to read a data set).

Examples:

Import a data file into the database:
s11nconvert -dl mysql -f my.s11n -s mysql -o MyDbObject
// or, more simply:
s11nconvert -f my.s11n -s mysql -o MyDbObject
In the previous example we load in a file using -f my.s11n, set the output format with -s mysql[_serializer], and save it to the "file" named MyDbObject. The -s classloads the Serializer's DLL, if needed, but we can also use -dl DLLNAME to explicitely load a DLL, and must do this when the DLL's name does not match the class name of the Serializer (actually, it's a tad more complicated than that, but that's not important now...).

Export a database pseudo-file to a data file:
s11nconvert -S mysql -f MyDbObject -s simplexml > my.s11n
(Remember that MyDbObject is not a real file, but a pseudo-file in the database.)
Here we loaded the DLL, then used -S mysql to force MyS to be the input Serializer. The -f option tells s11nconvert to the load "file" named MyDbObject. We set the ouput Serializer using -s simplexml, and send it to a file with either > my.s11n or -o my.s11n.

To copy a mysql-hosted object to another object in the database:
s11nconvert -S mysql -f MyDbObject -s mysql -o "My Copied Object"
That forces MyS to be used as the input Serializer (-S), reads in the MyDbObject using that Serializer, sets the output Serializer (-s), and saves it to a new "file" called "My Copied Object".

Using s11n 0.9.7+ and mysql_serializer 2004.08.16+, if you try to output a mysql record to a stream (i.e., by omitting the -o option) MyS will create a set of proxy data which it can later use to load the object from it's database:
s11nconvert -f in.paren -s mysql
#s11n::io::serializer mysql_serializer
<!DOCTYPE SerialTree>
<mysql_serializer_proxy_node class="s11n::data_node">
<dbhost>localhost</dbhost>
<dbname>mys11n</dbname>
<dbpassword></dbpassword>
<dbuser>s11n</dbuser>
<node_id>6715</node_id>
</mysql_serializer_proxy_node>

Note that -S is not needed when this option is used, as the special magic cookie effectively does the same thing, but does so at a much deeper level in the library than -S does, and also works when loading streams from arbitrary client code (as opposed to via s11nconvert).

That proxy code can be fed right back into s11nconvert (or client code) to load the appropriate object:
s11nconvert -f infilename -s simplexml
<!DOCTYPE s11n::simplexml>
<somenode s11n_class="NoClass" a="()\b" foo="bar" long="this is a long property">
<fred s11n_class="FredClass" key="value" />
<wilma s11n_class="WilmaClass" the="lovely wife" />
<betty s11n_class="BettyClass" value="the apple of Barney&apos;s eye" />
<deep s11n_class="Foo">
<deeper s11n_class="Foo" how_deep="really deep!">
<and_deeper s11n_class="Ouch" />
</deeper>
</deep>
</somenode>


The great part is that this support works at such a level that arbitrary client code need not know if it's using a database, and client-side tools (like s11nconvert and the s11n data browser) can read and write such data just as easily as they would any other.

Command-line options:

s11nconvert passes on all command-line arguments to DLLs it loads, and MyS supports several options which you can see by loading the DLL and passing --help to s11nconvert:
s11nconvert -dl mysql_serializer --help
<snip>
mysql-db:
Sets the s11n/mysql database's name.
mysql-host:
Sets the mysql db server.
mysql-password:
Sets the mysql user's password.
mysql-user:
Sets the mysql db user.
mysql-verbose:
Toggles on verbose mode for mysql_serializer.
<snip>
(Tip: any client app which initializes acme::argv_parser::args(int,char**) inherently passes on all arguments to other objects this way.)

Tips:

If you've read this far, you deserve some bonus tips: the -dl DLLNAME option has a couple of interesting features:
  • It has an evil twin, -DL, which works identically but does not consider a failed DLL load to be an error (i.e., it ignores such errors). -dl, on the other hand, will fail if a given DLL cannot be loaded.
  • It is optional if a library named SERALIZER.so or SERIALIZER_serializer.so can be found and opened, where SERIALIZER is the name passed to -S or -s.
  • May be passed base-names if DLLNAME.so is in s11n's classpath (as described in the library manual).
  • May be passed multiple times, unlike the other options. Use it once for each DLL you want to load, e.g. -dl MyDLL -dl YourDLL.
  • s11nconvert does nothing with the DLLs except to open them. i.e., it does not look for any specific symbols in any DLLs. In the case of DLLs containing s11n registration code, this simple act is enough to activate the registrations they contain. You may, if you like, use this feature for opening non-s11n-related DLLs, but i can personally see little use in doing so in the context of s11nconvert.

Using MyS in client code

First, see the Big Fat Caveats, above.

To use MyS from s11nlite, you have a couple approaches:

Simply do the following from somewhere in your app (e.g., your main() routine):
s11nlite::serializer_class( "mysql_serializer" );
Once you have done this, s11nlite::save() will use that Serializer. If you're connecting to the db using the build-time defaults, or using parameters passed in to s11nconvert, then this is all you need to do to make your database the default i/o repository used by s11nlite.

Alternately, you can load it dynamically:
// if serialize_class() has been set:
s11nlite::serializer_base_type * ser =
s11nlite::create_serializer();
// otherwise:
s11nlite::serializer_base_type * ser =
s11nlite::create_serializer( "mysql_serializer" );
should do the trick. That will load the DLL, if needed (and if it can be found). Simply loading the DLL will take care of the classloader and registration work which is needed to plug the little guy into the core.

If you need to customize the db connection, look at Database.h, then do:
#include <s11n.net/mysql_serializer/mysql_serializer.hpp>
#include <s11n.net/mysql_serializer/libsql++.hpp>
// If you want to see any error messages, you must
// register an error handler:
StderrLog errlog;
// Create db object and connect...
Database db(host,user,password,db,&errlog);
if( db.Connected() )
{
delete( s11n::io::mysql_database() );
s11n::io::mysql_database( &db );
}
else
{
// bail out...
}
Note that this requires including GPL'd header files, which will immediately cast your source code into the GPL!

Note also that the approach shown here is slightly different from that used by the original mysql driver code, so be aware of that if you happen to also use that code.