Saving untold millions of trees... of data.
Project powered by:
Serialize your C++ objects to and from sqlite3
sqlite3_serializer is an sqlite3-powered Serializer,
providing serialization over sqlite3 databases.
This serializer will partially work with s11n 1.2.0 but requires 1.2.1+ for
binary magic cookie support (i.e., the ability to dynamically dispatch
db files to this Serializer) and serialize-over-streams support.
There are two slightly different approaches to using this add-on:
The add-on is available for download here:
- Can be used as a conventional Serializer. That is, just like another data format.
- Can be hooked in to s11nlite via s11nlite::interface(),
which enables it to intercept all save/load requests on behalf of
The sqlite3 headers and library are
required (only tested with version 3.2.7), but comes preinstalled on
many modern Linux distributions.
Like sqlite3 and s11n, this code is released into the Public Domain.
- Can be used with s11nconvert and
s11nbrowser to convert s11n data to and from
- Database-stored s11n data can be queried and edited via SQL tools like sqlite's console.
- Can save directly to a database or to SQL. When writing to files it uses databases and over
streams it writes SQL (the db layer cannot deal with streams).
- Versions => 2005.12.09 can deserialize from both SQL and database files.
(The first release could not read from SQL at all.) It can load SQL over streams,
but not databases over streams (again, the db layer can't work with streams).
Known caveats and bugs:
- When saving to streams it writes SQL. When saving to files
it writes sqlite3 databases.
- It reads really slowly on large data sets. On small- to mid-sized
sets (say, up to 5k nodes), it reads around 10-15k objects/second. On
large sets (say, 10k nodes + 10k properties) it reads much, much more
slowly (a few hundred nodes/second). For small sets it writes database almost
as quickly as other formats, slowing down notably large data sets. (i
don't yet know if the "large DB problem" is an inherent property of
sqlite or a problem in my implementation.) Writing SQL is as fast as, if not
faster than, most other formats.
- When writing SQL, the output can be huge. When writing to
a db we have programmatic access to things like database record IDs,
so we can build up our DOM relationships via the C API. In SQL we do
not, so it outputs lots of extra code in order to build the
parent/child relationships as the object nodes are inserted. On one
55k-object dataset, with an additional 50k properties, the SQL output
was 23MB (compared to a 6MB database). On the other hand, outputing
the SQL is much, much faster than writing to a database.
- Deserializing from SQL is rather inefficient because to parse the SQL
we create an in-memory database. We don't buffer the SQL itself (only a
few lines at a time in the normal case), but the db itself effectively
is a buffer, and will be of a size proportional to (but likely larger than)
the object tree we are deserializing.
Using the Serializer via s11nlite:
First, you will need to either tell your app to link against the
sqlite3_serializer library, or you dynamically load it (if supported on
your platform) as shown here:
Then you use it as you would any other Serializer, as explained in the
library manual. If you want the Serializer to "take over" s11nlite's
API, such that all s11nlite::save(object,FILE) calls are automatically
serialized using sqlite3, you have two options. First, you can directly
include the handler and register it yourself:
std::string found = s11n::plugin::open( "sqlite3_serializer" );
If you have the sqlite3_serializer source tree, adding the file
s11nlite_sqlite3.cpp do your project will do the above at app startup
time. It is also shipped as a DLL which you can simply link against:
s11nlite_sqlite3.so. Opening that DLL from your application
will automatically install the s11nlite handler.
s11nlite::instance( & apihandler );
Using the Serializer via s11nconvert and s11nbrowser:
The releases of these apps for s11n 1.2.1 include command-line options
for loading DLLs. Both apps work the same way in this regard, as shown below:
The -dl form causes the app to fail if the DLL cannot be loaded. The
-DL form tolerates a failed load and continues execution. The -dl
options can be specified any number of times, and DLLs are always
loaded before other operations are performed. Use the -v (verbose)
flag to get more information from the DLL loading process. The names
passed to -dl may be absolute or relative DLL file names or "partial
names", if the name can be resolved via
s11n::plugin::path(). e.g., Using -dl libz would
probably open /usr/lib/libz.so. Note that you can open
arbitrary DLLs this way, not just Serializers.
~> s11nconvert -dl sqlite3_serializer ... other args ...
(Reminder: the -dl option is unavaible in version 1.2.0 due to
code refectoring. It came back in 1.2.1.)
Samples of converting data with s11nconvert and sqlite's console
(Note that sqlite3 is part of the sqlite distribution,
not part of this utility.)
~> alias sc='s11nconvert -dl sqlite3_serializer'
# List known Serializers/formats:
51191011 compact expat funtxt funxml parens simplexml sqlite3 wesnoth
# Convert a large s11n file to SQL:
~> time sc -f 54400.s11n -s sqlite3 > 54400.sql
# Convert that same file to a database:
~> time sc -f 54400.s11n -s sqlite3 -o 54400.sq3
# Now browse the data using a db client:
~> sqlite3 54400.sq3
SQLite version 3.2.7
Enter ".help" for instructions
sqlite> select count(*) from p; select count(*) from n;
CREATE TABLE n(rowid INTEGER PRIMARY KEY AUTOINCREMENT,parent_id INTEGER, class, name);
CREATE TABLE p(node_id INTEGER REFERENCES n(rowid),key TEXT,value TEXT);
CREATE INDEX ndx_nodes ON n (rowid,parent_id);
CREATE INDEX ndx_props ON p (node_id);
CREATE TRIGGER tr_cleanup_node BEFORE DELETE ON n
FOR EACH ROW BEGIN
DELETE FROM p WHERE node_id = OLD.rowid;
DELETE FROM n WHERE parent_id = OLD.rowid;
# We could alternately create a new database by importing the SQL:
~> sqlite3 -init 54400.sql
# If you want to save the results in a non-temporary DB:
~> sqlite3 -init 54400.sql mydatabase