Why I Like ZODB

YZF · on Nov 24, 2013

I'm sorry but I hate ZODB with a passion.

- It does not scale.

- Poor support for replication or sharding.

- It is slow. Really slow. Un-pickle every time you get something from cache.

- It is error prone. Forget a commit or forget to handle conflict errors and you're in big trouble.

- No interoperability. Want to write a service in C++ to access your db? You're out of luck.

- As your system grows you'll have conflicts all over the place.

- Some server side stuff needs to have your objects, e.g. if you want to do conflict resolution.

- Migrations/change to schemas are painful, once you change your object you're no longer going to be able to de-serialize it.

- You have to roll your own if you want change notifications.

So if you're just looking to persist some Python objects in a small system, great. Otherwise I'd stay away from this.

e12e · on Nov 25, 2013

Note: I'm being brief, not trying to be snarky -- I'd love to hear what's behind your statements.

> - It does not scale.

What does this mean? I'm not saying you're wrong, but without some context, that doesn't say a whole lot.

I recall one big consulting company dropping Plone (and zodb) some five years ago because zodb/plone scaled to about 10.000 documents, and using a custom index solution based on lucene they managed to scale to around 100.000 documents for their cms -- but they ended up needing something else for "their biggest" clients. Can't find the link or remember the company now (I believe it was a German design shop). But it's the only story I've heard about zodb not scaling for it's typical use case?

> - Poor supprt for replication and sharding.

Are you aware that ZRS is now free and open source?

http://www.zope.com/products/x1752814276/Zope-Replication-Se...

> - It is slow. Really slow. Un-pickle every time you get something from cache.

You have to marshall structures that you load in a different way too -- is this really something specific to zodb? Are you saying unpickle is slower than other ways to marshall python objects?

> - It is error prone. Forget a commit or forget to handle conflict errors and you're in big trouble.

As opposed to not handling transaction errors with a postgres backend?

> - No interoperability. Want to write a service in C++ to access your db? You're out of luck.

>

Well, it's an object database. The only other I know of off the top of my head that I know people are actually using, is Gemstone. You could of course wrap zodb in a xml/json api -- but yes, I don't think interop with other languages is a good fit for zodb.

> - Some server side stuff needs to have your objects, e.g. if you want to do conflict resolution. > - Migrations/change to schemas are painful, once you change your object you're no longer going to be able to de-serialize it.

This is a problem I'm constantly running into with Plone and a more or less well understood set of third party add-ons. I really think the smalltalk image approach is better (if you have the "data" you also have the "behaviour" -- with zodb you might have a serialization of a complex class, but not the ability to marshal it).

YZF · on Nov 25, 2013

I wasn't aware ZRS is now free/open source. I would need to look at what that brings to the table but it's unlikely to change my views. I'll check it out though and thanks for the heads up!

Ignoring ZRS- As your number of clients and transactions go up you're still bottlenecked in a single server. That's what I mean by doesn't scale. For various reasons (e.g. objects can refer to each other) you're basically stuck. A scalable database provide various means of growing as your load grows and ZODB does not.

There are a few problems with pickling. First of all it is slow. Under some assumptions there are faster ways of marshalling in Python. Secondly your granularity of access is the entire object. You can't just get a certain field out of a large object. Thirdly because objects in the client cache are pickled you are spending a whole lot of time serializing/de-serializing them when you don't really need to do that. In one of my applications that happens to account for 80% of the execution time.

I haven't spent a lot of time with SqlAlchemy but I think an ORM that maps well to some performant database is a better approach in Python.

mcdonc · on Nov 25, 2013

It's best to keep object records small when using ZODB. This does mean you need to do some planning about object structure. Objects that inherit from "persistent.Persistent" are kept as separate records in ZODB, so you can break up a large object into several smaller ones by attaching persistent attributes to another object. If you just make a big structure out of nonpersistent objects, ZODB will have all the downsides of plain old pickle (like slow loading time for large objects) indeed, but its entire purpose is to allow you to not do this.

e12e · on Nov 25, 2013

Just to save someone searching ZRS is on pypi:

https://pypi.python.org/pypi/zc.zrs/2.4.4

mcdonc · on Nov 25, 2013

It does not unpickle every time you get something from cache. It maintains a first-level in-memory LRU per-thread object cache which is the actual Python object (not its pickle representation). In practice, that means you don't need to maintain your own results cache (with e.g. redis or memcached) as you do with most SQL-based systems. If it's slow for you, it's probably not due to its cache.

Write conflicts are indeed hairy to deal with. Using a BTree in places where you might instead use a dictionary usually makes this a lot better.

It sounds like you got burned by using it without understanding it very well.

zopyx01 · on Nov 25, 2013

The ZODB is and was a Python pickle store in the first place. The question about what makes a "database" can be answered differently. Functionality like indexes and query languages are in the Zope world application level functionality build on top of the ZODB. The ZODB turned out being a database solution for many large scale projects (up to several 100 GB). There are options for sharding (mount points) and replications (ZRC, Relstorage). The ZODB is unlikely the solution that you would use nowadays for "big data" however keep in mind that the ZODB is already 15 years old and severed many people in professional solutions for more than a decade. We know of many businesses still using the ZODB in mission critical applications. But yes, the ZODB is an object store and not a RDBMS - it's a completely different beast - like always: use the right tool for each project. And the ZODB was already "NoSQL" by the end of the 90s of the last millennium. No need to hate the ZODB - it is just another database option - and in some case I would still use the ZODB today over tinkered garbage database solutions like MongoDB having a braindead replication and sharing model.

rhizome31 · on Nov 24, 2013

I had to work with ZODB on a couple of projects and it has been overall a rather painful experience. Inability to run ad hoc queries and necessity to manage indexes and maintain consistency at the application level has been a source of annoying bugs for me. Not worth it IMHO.

chrismorgan · on Nov 24, 2013

One of the problems people may experience when trying out something like ZODB is trying to use it as they would use a relational database management system rather than as they would manage a normal object structure inside their program. Things like list comprehensions and careful design of the structure of the objects can improve things a lot, but I will agree that sometimes the fact that you can't as easily run ad hoc queries is sad. Also very often you simply won't need to use an index, and for a very large range of problems the graph nature of ZODB (compared with the tabular nature of the RDBMS, where you'll need indices to avoid full-table scans) is liberating.

Certainly there are some types of problems which ZODB won't work well with, but overall I quite enjoyed the experimental project I made early this year in Pyramid with ZODB. Combine it with traversal-based URL generation and you get very interesting results.

I think it's the sort of tool that I'd recommend people at least try; once they've broadened their horizons, then they can go back to using their RDBMS when they wish to. (In this it's just like Pyramid's traversal; many developers have forgotten that pattern matching is not the only good way of routing URLs.)

davidkhess · on Nov 25, 2013

I've used Durus a lot (a simpler version of ZODB) and indexing was always the biggest issue I ran into. Normal object associations don't need traditional indexing but object lookup based on attribute values gets pretty painful without some indexing support.

ryanobjc · on Nov 24, 2013

ZODB is kind of neat, but how does it fare with non-python client libraries? I note the phrase "you can store anything that is pickleable" so I am thinking not?

chrismorgan · on Nov 25, 2013

Correct, ZODB is exclusive to Python.

wslh · on Nov 25, 2013

If you like ZODB (like me), you might also like DyBASE: http://www.garret.ru/dybase.html

BTW I used ZODB to add persistence and acknowledgment to the Python Queue class: http://blog.databigbang.com/adding-acknowledgement-semantics...

davidkhess · on Nov 25, 2013

If you are interested in object-oriented databases for Python check out Durus: https://www.mems-exchange.org/software/DurusWorks/

It's design is based on ZODB but with a number of simplifications and no dependencies on Zope.

kilink · on Nov 25, 2013

In what way does ZODB depend on Zope? It has a dependency on zope.interface, which is pretty minimal, and seems to be widely used outside of the Zope Community (e.g., Twisted).

davidkhess · on Dec 9, 2013

Whoops. That was a presumption on my part.

mcdonc · on Nov 25, 2013

But it has Zope in its name!!!111!! ;-)

alextingle · on Nov 25, 2013

Urgh! Isn't Zope dead yet?

reinhardt · on Nov 25, 2013

ZODB != Zope