Camel overview
==============
Camel is intended as a replacement for libraries like :py:mod:`pickle` or
`PyYAML`_, which automagically serialize any type they come across. That seems
convenient at first, but in any large or long-lived application, the benefits
are soon outweighed by the costs:
.. _PyYAML: http://pyyaml.org/
* You can't move, rename, or delete any types that are encoded in a pickle.
* Even private implementation details of your class are encoded in a pickle by
default, which means you can't change them either.
* Because pickle's behavior is recursive, it can be difficult to know which
types are pickled.
* Because pickle's behavior is recursive, you may inadvertently pickle far more
data than necessary, if your objects have caches or reified properties. In
extreme cases you may pickle configuration that's no longer correct when the
pickle is loaded.
* Since pickles aren't part of your codebase and are rarely covered by tests,
you may not know you've broken pickles until your code hits production... or
much later.
* Pickle in particular is very opaque, even when using the ASCII format. It's
easy to end up with a needlessly large pickle by accident and have no
visibility into what's being stored in it, or to break loading a large pickle
and be unable to recover gracefully or even tell where the problem is.
* Automagically serialized data is hard enough to load back into your *own*
application. Loading it anywhere else is effectively out of the question.
It's certainly possible to whip pickle or PyYAML into shape manually by writing
``__reduce__`` or representer functions, but their default behavior is still
automagic, so you can't be sure you didn't miss something. Also, nobody
actually does it, so merely knowing it's possible doesn't help much.
Camel's philosophy
------------------
Explicit is better than implicit.
Complex is better than complicated.
Readability counts.
If the implementation is hard to explain, it's a bad idea.
*In the face of ambiguity, refuse the temptation to guess.*
— `The Zen of Python`_
.. _The Zen of Python: https://www.python.org/dev/peps/pep-0020/
Serialization is hard. We can't hide that difficulty, only delay it for a
while. And it *will* catch up with you.
A few people in the Python community have been rallying against pickle and its
ilk for a while, but when asked for alternatives, all we can do is mumble
something about writing functions. Well, that's not very helpful.
Camel forces you to write all your own serialization code, then wires it all
together for you. It's backed by YAML, which is ostensibly easy for humans to
read — and has explicit support for custom types. Hopefully, after using
Camel, you'll discover you've been tricked into making a library of every type
you serialize, the YAML name you give it, and exactly how it's formatted. All
of this lives in your codebase, so someone refactoring a class will easily
stumble upon its serialization code. Why, you could even use this knowledge to
load your types into an application written in a different language, or turn
them into a documented format!
Let's see some code already
---------------------------
Let's!
Here's the Table example from `a talk Alex Gaynor gave at PyCon US 2014`_.
Initially we have some square tables.
.. _a talk Alex Gaynor gave at PyCon US 2014: https://www.youtube.com/watch?v=7KnfGDajDQw&t=1292
.. code-block:: python
class Table(object):
def __init__(self, size):
self.size = size
def __repr__(self):
return "
".format(self=self)
We want to be able to serialize these, so we write a *dumper* and a
corresponding *loader* function. We'll also need a *registry* to store these
functions::
from camel import CamelRegistry
my_types = CamelRegistry()
@my_types.dumper(Table, 'table', version=1)
def _dump_table(table):
return dict(
size=table.size,
)
@my_types.loader('table', version=1)
def _load_table(data, version):
return Table(data["size"])
.. note:: This example is intended for Python 3. With Python 2,
``dict(size=...)`` will create a "size" key that's a :py:class:`bytes`,
which will be serialized as ``!!binary``. It will still work, but it'll be
ugly, and won't interop with Python 3. If you're still on Python 2, you
should definitely use dict literals with :py:class:`unicode` keys.
Now we just give this registry to a :py:class:`Camel` object and ask it to dump
for us::
from camel import Camel
table = Table(25)
print(Camel([my_types]).dump(table))
.. code-block:: yaml
!table;1
size: 25
Unlike the simple example given in the talk, we can also dump arbitrary
structures containing Tables with no extra effort::
data = dict(chairs=[], tables=[Table(25), Table(36)])
print(Camel([my_types]).dump(data))
.. code-block:: yaml
chairs: []
tables:
- !table;1
size: 25
- !table;1
size: 36
And load them back in::
print(Camel([my_types]).load("[!table;1 {size: 100}]"))
.. code-block:: python
[]
Versioning
..........
As you can see, all serialized Tables are tagged as ``!table;1``. The
``table`` part is the argument we gave to ``@dumper`` and ``@loader``, and the
``1`` is the version number.
Version numbers mean that when the time comes to change your class, you don't
have anything to worry about. Just write a new loader and dumper with a higher
version number, and fix the old loader to work with the new code::
# Tables can be rectangles now!
class Table(object):
def __init__(self, height, width):
self.height = height
self.width = width
def __repr__(self):
return "".format(self=self)
@my_types.dumper(Table, 'table', version=2)
def _dump_table_v2(table):
return dict(
height=table.height,
width=table.width,
)
@my_types.loader('table', version=2)
def _load_table_v2(data, version):
return Table(data["height"], data["width"])
@my_types.loader('table', version=1)
def _load_table_v1(data, version):
edge = data["size"] ** 0.5
return Table(edge, edge)
table = Table(7, 10)
print(Camel([my_types]).dump(table))
.. code-block:: yaml
!table;2
height: 7
width: 10
More on versions
----------------
Versions are expected to be positive integers, presumably starting at 1.
Whenever your class changes, you have two options:
1. Fix the dumper and loader to preserve the old format but work with the new
internals.
2. Failing that, write new dumpers and loaders and bump the version.
One of the advantages of Camel is that your serialization code is nothing more
than functions returning Python structures, so it's very easily tested. Even
if you end up with dozens of versions, you can write test cases for each
without ever dealing with YAML at all.
You might be wondering whether there's any point to having more than one
version of a dumper function. By default, only the dumper with the highest
version for a type is used. But it's possible you may want to stay
backwards-compatible with other code — perhaps an older version of your
application or library — and thus retain the ability to write out older
formats. You can do this with :py:meth:`Camel.lock_version`::
@my_types.dumper(Table, 'table', version=1)
def _dump_table_v1(table):
return dict(
# not really, but the best we can manage
size=table.height * table.width,
)
camel = Camel([my_types])
camel.lock_version(Table, 1)
print(camel.dump(Table(5, 7)))
.. code-block:: yaml
!table;1
size: 35
Obviously you might lose some information when round-tripping through an old
format, but sometimes it's necessary until you can fix old code.
Note that version locking only applies to dumping, not to loading. For
loading, there are a couple special versions you can use.
Let's say you delete an old class whose information is no longer useful. While
cleaning up all references to it, you discover it has Camel dumpers and
loaders. What about all your existing data? No problem! Just use a version
of ``all`` and return a dummy object::
class DummyData(object):
def __init__(self, data):
self.data = data
@my_types.loader('deleted-type', version=all)
def _load_deleted_type(data, version):
return DummyData(data)
``all`` overrides *all* other loader versions (hence the name). You might
instead want to use ``any``, which is a fallback for when the version isn't
recognized::
@my_types.loader('table', version=any)
def _load_table(data, version):
if 'size' in data:
# version 1
edge = data['size'] ** 0.5
return Table(edge, edge)
else:
# version 2?
return Table(data['height'], data['width'])
Versions must still be integers; a non-integer version will cause an immediate
parse error.
Going versionless
.................
You might be thinking that the version numbers everywhere are an eyesore, and
your data would be much prettier if it only used ``!table``.
Well, yes, it would. But you'd lose your ability to bump the version, so you'd
have to be *very very sure* that your chosen format can be adapted to any
possible future changes to your class.
If you are, in fact, *very very sure*, then you can use a version of ``None``.
This is treated like an *infinite* version number, so it will always be used
when dumping (unless overridden by a version lock).
Similarly, an unversioned tag will look for a loader with a ``None`` version,
then fall back to ``all`` or ``any``. The order versions are checked for is
thus:
* ``None``, if appropriate
* ``all``
* Numeric version, if appropriate
* ``any``
There are deliberately no examples of unversioned tags here. Designing an
unversioned format requires some care, and a trivial documentation example
can't do it justice.
Supported types
---------------
By default, Camel knows how to load and dump all types in the `YAML type
registry`_ to their Python equivalents, which are as follows.
.. _YAML type registry: http://yaml.org/type/
=============== ========================================
YAML tag Python type
=============== ========================================
``!!binary`` :py:class:`bytes`
``!!bool`` :py:class:`bool`
``!!float`` :py:class:`float`
``!!int`` :py:class:`int` (or :py:class:`long` on Python 2)
``!!map`` :py:class:`dict`
``!!merge`` —
``!!null`` :py:class:`NoneType`
``!!omap`` :py:class:`collections.OrderedDict`
``!!seq`` :py:class:`list` or :py:class:`tuple` (dump only)
``!!set`` :py:class:`set` or :py:class:`frozenset` (dump only)
``!!str`` :py:class:`str` (:py:class:`unicode` on Python 2)
``!!timestamp`` :py:class:`datetime.date` or :py:class:`datetime.datetime` as appropriate
=============== ========================================
.. note:: PyYAML tries to guess whether a bytestring is "really" a string on
Python 2, but Camel does not. Serializing *any* bytestring produces an ugly
base64-encoded ``!!binary`` representation.
This is a **feature**.
.. note:: A dumper function must return a value that can be expressed in YAML
without a tag — that is, any of the above Python types *except*
:py:class:`bytes`, :py:class:`set`/:py:class:`frozenset`, and
:py:class:`datetime.date`/:py:class:`datetime.datetime`. (Of course, if the
value is a container, its contents can be anything and will be serialized
recursively.)
If a dumper returns a :py:class:`collections.OrderedDict`, it will be
serialized like a plain dict, but the order of its keys will be preserved.
The following additional types are loaded by default, but **not dumped**. If
you want to dump these types, you can use the existing ``camel.PYTHON_TYPES``
registry.
====================== =====================================
YAML tag Python type
====================== =====================================
``!!python/complex`` :py:class:`complex`
``!!python/frozenset`` :py:class:`frozenset`
``!!python/namespace`` :py:class:`types.SimpleNamespace` (Python 3.3+)
``!!python/tuple`` :py:class:`tuple`
====================== =====================================
Other design notes
------------------
* Camel will automatically use the C extension if available, and fall back to a
Python implementation otherwise. The PyYAML documentation says it doesn't
have this behavior because there are some slight differences between the
implementations, but fails to explain what they are.
* :py:meth:`Camel.load` is safe by default. There is no calling of arbitrary
functions or execution of arbitrary code just from loading data. There is no
"dangerous" mode. PyYAML's ``!!python/object`` and similar tags are not
supported. (Unless you write your own loaders for them, of course.)
* There is no "OO" interface, where dumpers or loaders can be written as
methods with special names. That approach forces a class to have only a
single representation, and more importantly litters your class with junk
unrelated to the class itself. Consider this a cheap implementation of
traits. You can fairly easily build support for this in your application if
you really *really* want it.
* Yes, you may have to write a lot of boring code like this::
@my_types.dumper(SomeType, 'sometype')
def _dump_sometype(data):
return dict(
foo=data.foo,
bar=data.bar,
baz=data.baz,
...
)
I strongly encourage you *not* to do this automatically using introspection,
which would defeat the point of using Camel. If it's painful, step back and
consider whether you really need to be serializing as much as you are, or
whether your classes need to be so large.
* There's no guarantee that the data you get will actually be in the correct
format for that version. YAML is meant for human beings, after all, and
human beings make mistakes. If you're concerned about this, you could
combine Camel with something like the `Colander`_ library.
.. _Colander: http://docs.pylonsproject.org/projects/colander/en/latest/
Known issues
------------
Camel is a fairly simple wrapper around `PyYAML`_, and inherits many of its
problems. Only YAML 1.1 is supported, not 1.2, so a handful of syntactic edge
cases may not parse correctly. Loading and dumping are certainly slower and
more memory-intensive than pickle or JSON. Unicode handling is slightly
clumsy. Python-specific types use tags starting with ``!!``, which is supposed
for be for YAML's types only.
.. _PyYAML: http://pyyaml.org/
Formatting and comments are not preserved during a round-trip load and dump.
The `ruamel.yaml`_ library is a fork of PyYAML that solves this problem, but it
only works when using the pure-Python implementation, which would hurt Camel's
performance even more. Opinions welcome.
.. _ruamel.yaml: https://pypi.python.org/pypi/ruamel.yaml
PyYAML has several features that aren't exposed in Camel yet: dumpers that work
on subclasses, loaders that work on all tags with a given prefix, and parsers
for plain scalars in custom formats.