There's about 14B trades per year on the NYSE which i'm sure could represent 10x...

marginalia_nu · on May 10, 2023

You don't typically use vector search for trade data though. It's already ridculously well structured. Assets have identifiers, parties and counterparties have IDs, etc. I'm not sure what nearest neighbors in a vector space would add.

l33t233372 · on May 10, 2023

Nonetheless it’s an example you asked for of a dataset with over a trillion entries.

marginalia_nu · on May 10, 2023

I asked which dataset they were indexing that was of this size, not whether any such dataset exists in other domains.

captaincrowbar · on May 11, 2023

I believe I hear the sound of a True Scotsman knocking at the door.

marginalia_nu · on May 11, 2023

My exact words are still up there

> What sort of dataset are you indexing that has trillion entries?

It doesn't say

> What sort of dataset has a trillion entries?