Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's about 14B trades per year on the NYSE which i'm sure could represent 10x that in entities (buyer, seller, broker, etc) and could easily hit 1000x that in log lines. The shares per day is in the billions, so hitting 1T if each share is represented uniquely.


You don't typically use vector search for trade data though. It's already ridculously well structured. Assets have identifiers, parties and counterparties have IDs, etc. I'm not sure what nearest neighbors in a vector space would add.


Nonetheless it’s an example you asked for of a dataset with over a trillion entries.


I asked which dataset they were indexing that was of this size, not whether any such dataset exists in other domains.


I believe I hear the sound of a True Scotsman knocking at the door.


My exact words are still up there

> What sort of dataset are you indexing that has trillion entries?

It doesn't say

> What sort of dataset has a trillion entries?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: