Hacker Newsnew | past | comments | ask | show | jobs | submit | gregorsamza's commentslogin

A dataset with "many long lists of numbers" sounds like an ideal use case for NumPy, have you tried using that?


I haven't. I don't have any complex math in mind (yet), just some simple transformations. The problem is that even something as simple as checking a list for potential duplicates becomes really RAM intensive for sufficiently large lists. (I'm not even doing deep equality, just comparing metadata.)

I still have plenty more work to do on the project. I think I'll end up fanning out each list iteration into a series of smaller chunks to keep me from blowing through all the RAM on any one request.


Numpy supports lots of array math, but another way to think of it is as an api for working directly with memory (and values stored as platform types instead of python objects).

(Which you may well realize...)


Or even simply the array module.


GAE allows only pure Python. No binary modules like NumPy.



Python has an array class too:

http://docs.python.org/2/library/array.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: