Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MongoDB 3.2: Now powered by Postgres (linkedin.com)
120 points by buffyoda on Dec 8, 2015 | hide | past | favorite | 25 comments


tl;dr: There's a new "BI Connector" which will allow you to connect your business intelligence tools to MongoDB using the Postgres wire protocol (which many speak). This is somehow bad because Postgres is also popular, and maybe people will use Postgres now. Also: the author (who has a competing connector) knows the names of a lot of people at MongoDB.


My reading of the article was not that it uses the protocol, but that it actually unpacks it into an actual Postgres database. This obviously looks bad for MongoDB if their analytics literally uses a competing database.

https://github.com/asya999/yam_fdw


If you're interested in how much computation gets pushed down into the MongoDB database, versus how much gets pulled back into PostgreSQL, you can find some examples here:

http://slamdata.com/blog/2015/12/08/nosql-analytics-for-mong...

Warning: It's not pretty.


Thank you. The writer had way too much fun writing that --at the expense of the actual article. There is about one page of actual information woven into 12 pages of anecdotes of wandering around a conference being exasperated.


That namedropping though!


tl;dr 2.0:

1) MongoDB Inc have made a "foreign data wrapper" for PostgreSQL which enables MongoDB databases to be accessed from within PostgreSQL.

2) This makes data in MongoDB databases accessible to existing analytics software for SQL databases, which is often made by companies with lots of money.

3) The CTO of SlamData Inc, which makes analytics software for NoSQL databases, thinks that MongoDB Inc shouldn't have done that.

tl;dr 3.0:

A company faces increased competition; isn't happy about it.


It enables MongoDB databases to be accessed from within PostgreSQL at a massive performance penalty compared to just storing your data in Postgres in the first place, because the particular kind of foreign data wrapper they're using has limited ability to make use of MongoDB's query functionality and has to literally load the entire contents of the database into Postgres for anything non-trivial. Which means you're better off just using Postgres. This is bad for MongoDB because their business model relies on people actually using MongoDB rather than the competition.


Frankly, if I were tasked with integrating existing BI tools with MongoDB, I'd immediately start looking at ways to "escape" the anemic Mongo ecosystem to something a bit richer. A Postgres FDW seems like an excellent design.

Of course, I'm a bit of a Postgres partisan, and a Mongo refugee, but it still seems like a solid engineering decision and most of this guys arguments seem to hinge on "BUT POSTGRES IS TEH ENEMY!".


Well, you might start off down that path, but eventually you'd find that if you try to execute analytics via PostgreSQL via FDW via Multicorn via MongoDB, you're only able to push conjunctions of simple relational operators on original (non-derived) fields in the source collection.

What that means is virtually any query will end up executing (via PostgreSQL via FDW via Multicorn via MongoDB) by first pulling out all (!) the data from all (!) source collections, relocating it to MongoDB, and then executing the query. Possibly, in fact, these full collection scans might be repeated multiple times, especially for nested data, crosses, and other types of operations.

And then you'd decide that "solid engineering decision" wasn't so solid after all. Then hopefully you'd quit MongoDB and go work on PostgreSQL full time. ;-)


We have a legacy app that runs on Mongo. For MI/BI reasons we implemented pretty much the same thing about 18 months ago, first with Citus DB's excellent mongo_fdw and then with Stripe's mosql.

That application is currently being rewritten on top of PostgreSQL.


I think the guy's argument is more "but you should have used my product insteeeaaaadddd"


It's nothing new that PostgreSQL is a great tool for doing analytics, even coming from MongoDB. I'm very happy that MongoDB took this route, it speaks a lot about their capabilities in the non-OLTP world.

Having said that, I very biasedly say that there's a much better solution to this connector, which doesn't flatten out the MongoDB data: it's called ToroDB (https://github.com/torodb/torodb).

ToroDB, open source, speaks the MongoDB protocol, transforms documents to relational tables (without any kind of flattening, and without having to define any schema) and stores data in a RDBMS. More precisely, PostgreSQL.

Current development version (repl branch) speaks the replication protocol, and hence can replicate live from a MongoDB into PostgreSQL. No connector needed, no flattening, no FDWs, nothing else. Just add a new "slave" (ToroDB) to your replica set and you're good to go.

It goes even further: if you want pure data warehousing, ToroDB will soon support GreenPlum. Some initial benchmarks (http://www.slideshare.net/8kdata/torodb-scaling-postgresql-l..., slide #42) show 25x-75x improvement between doing aggregate queries in MongoDB and their equivalent queries in GreenPlum's distributed SQL.

Now that MongoDB 3.2 ships with PostgreSQL "included", feel free to try ToroDB. It's always better the original :)

Note: I am a ToroDB developer.


I think ToroDB is super cool, and I wish your project the best of luck! You can't go wrong building something on PostgreSQL. :)

That said, PostgreSQL FDW is NOT a great option for MongoDB analytics. Not only is the data model so different that you lose the ability to answer many types of questions, but Multicorn supports only basic pushdown (conjunctions of simple relational operators on original columns).

What this means is that analytics via PostgreSQL via FDW via Multicorn via MongoDB suffers from (a) very poor expressive power, and (b) ridiculously slow performance, since nearly any type of query will require at least one full table scan on all the source tables (in some cases, especially with arrays, many more full table scans may be required for a single query!).

Better off just using ToroDB. Am I right? :)


Thank you, John De Goes. I definitely agree. While PostgreSQL FDWs are a great way of extending Postgres, I don't see they are a good fit for this use case. Not only there are a lot of pushdowns not supported (although that is in the process of being improved), but more importantly, as you mentioned, this connector is going to impose a lot of full table scans for even the simplest queries. I'm dying to benchmark this connector against ToroDB. But unfortunately, the MongoDB proprietary license agreement explicitly forbids any kind of benchmark. I guess they have reasons to do so ;)))))))

I cannot be objective saying that you should better of using ToroDB, but I definitely think so.

I also want to congratulate you. I think Quasar and SlamData have gone very far, and I'd encourage you to keep on pushing it. While this connector may or may not adversely affect SlamData, there's always room for differentiation and improvement. Good luck!


Classic marketing pitch from a little company that wants to claim it's much more significant than it is:

1. Claim a must-have set of requirements that ... 2. ... happen to match its product's feature set ... 3. ... but not its competitors.'

http://slamdata.com/whitepapers/characteristics-of-nosql-ana... is presumably the core of the argument.

I tend not to pay attention to such claims until the company rephrases them more honestly.

That said, a brief discussion of what is really happening is in http://www.dbms2.com/2015/09/10/mongodb-update/ Would more be better? Sure.


The author tries to paint Mongo as an embarrassingly short-sighted, pseudo-enterprise company that can't share its toys with others.

Mongo could refuse this by demonstrating collaboration efforts and solutions with a solution marketplace similar Atlasssian and VMware. On the partner side, cross-selling, cross promotions and collaborative sales/product strategies can reduce conflict and wasted/duplicated/unaligned effort that can lead to sour partner experiences.


Guru author should marry on Postgres)


MongoDB: the snapchat of databases.


brilliant.

makes me wonder if snapchat uses mongo; i can't think of one more suited to snapchat's unique selling point.


Is this posted anywhere besides LinkedIn? Would like to read but am not willing to give page views to LinkedIn.


That's really silly. LinkedIn produce some really interesting engineering content.

EDIT I'm not sure what Pulse is, but it looks like aggregated content. Anyways, here's an article from the LinkedIn engineering team that's well worth a read https://engineering.linkedin.com/distributed-systems/log-wha...


Perhaps a "magazine" HR/engineering uses to promote inbound candidate flow and knowledge sharing.



Looks like it's time to switch databases. BLOAT. RIP Mongo.


Or Mongo may need to refine collaboration with partners to sell more deployed customer solutions... that's the only takeaway from this I see.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: