More

berto99 · on May 13, 2014

not yet

berto99 · on April 10, 2014

Why is mongo a problem?

micro_cam · on April 10, 2014

For me it is a red flag in terms of scalability as lots of our data sets won't fit in mongo backed by a 1-2 TB disk even if they take up < 100 GB in the original format (usually binary/compressed genetic data).

It also uses a ton of ram and performance really suffers when the data won't fit in ram so it isn't a great choice if you are trying to push the limits of what your machines can do.

They are only using it to store models and whatever "behavioral data" is but models for things like random forests can be really big and you want to be able to write/read trees from separate machines etc.

I wonder why they chose to use mongo vs local disk or HDFS which they already require.

smhchan · on April 10, 2014

it's the real-time prediction query, e.g. geospatial search, that makes use of mongo's indices.

micro_cam · on April 10, 2014

Thanks for the clarification, the write up isn't clear. Have you benchmarked against postGIS or stock mysql? And tried any larger-than-memory databases?

We were using mongo in a suit of web applications that display the results of ML and statistical analysis of cancer data and we've found its query performance lacking in a number of cases...I think the mongo geospatial index is a pretty simple geohash setup on top of their normal query engine and I would expect it to have the same issues.

I do think this project is very interesting, just providing my feedback based on doing similar work.

Memory overhead of both mongo and hadoop would actually be my biggest worry since, especially on desktop workstations it is quite common for machine learning tools in R or python to need most of the available memory when tackling even small problems.

duaneb · on April 10, 2014

Unless there's something about Mongo that means it's perfect for machine learning (unlikely), the last thing I want to maintain is yet another database because they didn't offer any choice.

ironchef · on April 10, 2014

A number of people have been bit by issues in mongo in the past such as: the approach it had taken to write locking, that it has silently discarded writes in certain cases, the charge that it uses inflated storage on disk, and the performance characteristics when the working set does not fit into memory. I'm sure there are more but when it arrived it had great marketing as was touted as the greatest thing since sliced bread. Unfortunately, some people ended up with horrendous sandwiches and remember the awfulness of said sandwiches.

lukasm · on April 10, 2014

I heard about two cases when MongoDB failed doing The Most Important Thing - storing data. No one really care about autosharding, no migrations etc. if you can't store the data. Due to some replication issue data was inconsistent.

berto99 · on April 10, 2014

But can't this happen to any db system? Mongo is pretty new and I'm not surprised things like this happens from time to time until the kinks are worked out. The new version of Mongo looks pretty good as well.

lucian1900 · on April 10, 2014

Not really, no database is this careless with user data.

Mongo is just really bad quality. Never use it, there's always something better in every way.

smhchan · on April 10, 2014

happy to explore other choices together with the community. Some users have voted Riak: https://predictionio.uservoice.com/forums/219398-general/sug...

louthy · on April 10, 2014

Riak is a good call. A SQL db would be nice too.

berto99 · on April 4, 2014

Well, people will exploit only to the point where you'll let them. You have the power to say no, to talk back (it's not a bad thing), to express your opinion. I've been in situation like this quite a number of times and I just tell myself: I'm a very talented developer and I can always get another job or work for myself if it comes to it, so I will not accept any exploitation. To do this you need confidence. As a developer, my way of getting confidence is being so good they can't ignore you (check out that book). Do your very best, learn every moment you get a chance. BTW, sometimes you can just talk directly to your manager and let him know how you're feeling. Tell him you feel like you're being singled out and you think it's unfair. It's the truth. If he can't handle that, and keep pushing you, just leave at the time you feel you should leave and tell him you have errands on the weekends. But use the weekends to your advantage by learning new skills. He/She can't physically force you to do anything.

th3iedkid · on April 5, 2014

thanks; let me train myself to those lines and yes because this thing is so prevalent all places, i'll anyway someplace or other hit upon it.The best way out is to grow out of it and hope all young grads get to do the same :)

berto99 · on March 18, 2014

This could be a good teaching tool for people new to development. Rather than setting up a bunch of vm's, just use nitrous and code in the browser...and as they become better, they can start using ssh and so on. Could also be interesting as an enterprise offer.

berto99 · on Feb 12, 2014

ugh, the way you did it, I thought that was the way most people did it. Usually you see the answer and only "show working". Though the mechanical approach can facilitate more complex problems.

berto99 · on Feb 5, 2014

Respect Indians, however clean up the stupid cast system.

berto99 · on Jan 9, 2014

A little off topic, but nice to see you're using Angular.

berto99 · on Jan 9, 2014

Tried to sign up, but I need an invitation code.

tristanz · on Jan 9, 2014

You currently need an invitation code to register. We're giving these out slowly to make sure everything works smoothly.

mdda · on Jan 9, 2014

Typo : "Distributed POSIX complaint project file system"

apatil · on Jan 9, 2014

Thanks.

ihnorton · on Jan 9, 2014

SageMath Cloud has open signup:

http://cloud.sagemath.com

berto99 · on Jan 9, 2014

Wow, this looks really nice. One thing that isn't clear is how I get the information out. so for example, if I'm building a recommendation engine, is there some kind of api that my webapp can use to get the information (sorry, new to all this).

tristanz · on Jan 9, 2014

Right now Sense is best for ad-hoc interactive analysis and batch/scheduled jobs. You can run long-running services that expose something like a REST endpoint, but we have plans to make exposing services much easier, so I'd probably hold off until we have an "official" solution.

berto99 · on Dec 13, 2013

That sounds like a problem that could be solved in nodejs pretty easily. I also want to get going with Go but just don't have any pressing need for it right now.

bsaul · on Dec 13, 2013

Indeed, node.js was my other option. But compare deploying another server-side technology to just copy/pasting a binary.