Well, the author cares enough about RethinkDB to test it, even if he's a mongodb...

timmaxw · on May 27, 2015

> the author has corrected the discrepancies since then

As of the time I posted this comment, the blog post still seems to be comparing indexed MongoDB operations against non-indexed RethinkDB operations. Under those conditions I'd expect RethinkDB to be at least 1000x slower than MongoDB. The fact that he's finding that RethinkDB is only 3x slower than MongoDB makes me think that there are still other major problems with this benchmark.

> No, but at least he tried...

It's true that the author tried; but that doesn't change the fact that people are going to read this blog post and assume that the numbers are at least approximately correct. As a RethinkDB employee, it really frustrates me to see RethinkDB being judged according to benchmarks that are conducted so carelessly that they are essentially random.

I think this is the fourth time in the past year that I've seen a third party try to benchmark RethinkDB and get something wrong. Maybe we need to start a "best practice" of checking in with the maintainers of a project before publishing benchmark results about the project.

weddpros · on May 27, 2015

Some mistakes in his benchmarks, among probably others:

- I don't see any mongodb index creation, so mongodb is inserting with no index while rethinkdb is inserting with the index. That's probably why there's a gap between the two

- there's no mongodb index, and rethinkdb queries do not make use of the index (this is probably why rethinkdb is not 1000x slower: both aren't using indexes)

- the $in query should be last_update: random_timestamp(), there's no need for $in here

- his insertion code creates 100K memory clones of the object to insert in the mongodb version only, not in rethinkdb

I'm sad to add: what the author is benchmarking here is the likely performance of a system he could build with either db. It's not necessarily bad (save for bad press) that he's bad at benchmarking: the mistakes he's made in his benchmark are similar to the mistakes he'll make in his code.

But yes, the author may use some help!

timmaxw · on May 27, 2015

In the benchmark script [1] that the author provided on his GitHub account, there's a call to ensure_index() for MongoDB. And he's reporting an average latency of 0.15ms for MongoDB read operations, so it's pretty clear that the MongoDB index is actually being used.

[1] https://github.com/martinrusev/rethinkvsmongo-benchmark/blob...

weddpros · on May 27, 2015

Yes, this kind of latency definitely says "indexed".

When I wrote this, I'd read a rethinkDB employee say "it's not using the index on rethinkdb", and performance was similar (3x) between the two. I trusted the "no index" path, and couldn't find an ensureIndex command... So I assumed there was no index on mongodb.

Truth is, this kind of performance can only come with indexes, on RethinkDB and mongodb.

arielweisberg · on May 27, 2015

RethinkDB needs to ship its own benchmark client. Also implement an ugh YCSB driver. Provide both with the database download.

It's madness to expect someone new to database benchmarking to implement a correct fully featured benchmark client. They are going to stumble enough on database and instance configuration as it is.

geocar · on May 27, 2015

> Maybe we need to start a "best practice" of checking in with the maintainers of a project before publishing benchmark results about the project.

http://en.wikipedia.org/wiki/David_DeWitt

fche · on May 27, 2015

@threeseed

> [...] I've never used a database that required me to explicitly define which indexes I want to use for a read. [...]

In a way, it's traditional (IBM IMS/DB, 1960s).

erbdex · on May 27, 2015

+1 on 'Profiling best practice'. Any such existing project?

threeseed · on May 27, 2015

Not sure why you are frustrated it's just a blog post by someone who was inexperienced with your product. At least he owned up to the mistakes and was willing to fix it. It's an opportunity for you to work with the guy to show him how to do it properly and write a blog post of your own.

I would say that you probably should look at your API because I've never used a database that required me to explicitly define which indexes I want to use for a read. But I've never used RethinkDB so maybe there is a legitimate reason.

andrewflnr · on May 27, 2015

GP is frustrated because it's not just the one blog post. Even if the author is willing to correct it, the original bad data will tend to get more exposure, because most people won't check in for corrections (unless they see a post like OP). Then, there'll be another crappy benchmark next month or next week. If I thought my product was being judged this way, it would drive me frothing mad.

copsarebastards · on May 27, 2015

So basically you're saying that because he cares and because he tried, and because the comments were dumb, nobody should criticize him? Do you really not care about getting accurate results?

Running benchmarks is an engineering practice. If you failed to get meaningful results, you failed. Yes, he cares, yes, he tried, yes, the comments are dumb, but he still failed. Sure, I'll give the guy kudos for trying, but I'm not going to pretend he didn't fail. As far as I'm concerned, telling someone they failed is a favor, because now they can change their methodology, try again, and maybe succeed. It's part of the process of achieving meaningful results. The entire point of what he's doing is to achieve meaningful results, not to get a participation medal.

Your response reminds me of this: https://www.youtube.com/watch?v=gSjLiQxEZlM

steego · on May 27, 2015

> So basically you're saying that because he cares and because he tried, and because the comments were dumb, nobody should criticize him?

I don't think anybody is saying that.

copsarebastards · on May 27, 2015

Please, enlighten me on what you think weddpros is saying.

steego · on May 28, 2015

Weddpros was making the point that criticism worked. The poster tried (An important first step), failed, and critical people pointed out his errors. Weddpros points out the poster corrected those errors because he received criticism. Weddpros is clearly a fan of critical feedback.

JDDunn9 · on May 27, 2015

You must be fun at parties...

copsarebastards · on May 27, 2015

I am, because I'm there to have fun. Looking at DB benchmarks can be fun, but there are other reasons I do it.

sklivvz1971 · on May 27, 2015

I suppose the author of the test forgot the "only test realistic scenarios" best practice.

weddpros · on May 27, 2015

We know nothing about the author's use case...

Maybe he wanted to know where each DB shines compared to each other, to see if some workloads are better suited to one or the other.

Of course, benchmarks "should" include concurrent reads/updates/writes/deletes because it can make a huge difference depending on the DB's implementation.

Of course, the author "should" also have tested sharding / durability / resistance to partition / resource consumption in his tests... Maybe he didn't have the resources to test properly. I also do quick&dirty benchmarks like these, mostly because exhaustive benchmarks cost so much more (time, money, expertise)...

DannoHung · on May 27, 2015

> "do your own benchmark" best practice

Is this a best practice? It seems like we've been delivered evidence that it is really hard to do good benchmarks unless you're already intimately familiar with what you're testing, which says something about how hard it is to make a good choice.

I don't know about other industries, but this sort of result is what stuff like the STAC M3 Benchmark suite was designed for: Typical usecases that experts can implement so you can get realistic performance comparisons.

sarciszewski · on May 27, 2015

I have to say, it's rare for me to agree with a HN comment as much as I do to this one. Well said!