Hacker Newsnew | past | comments | ask | show | jobs | submit | purist33's commentslogin

This sounds like a load of bullshit to me. But is there any merit to this ?


Something similar happens in my company too. In a particular place, we use the hash of a string as the key in a hashmap instead of the string itself, because the hash is smaller and is easier to compare after the initial map has been made. It is a 64bit hash too. I have been crying about this everytime it comes up, and the response is, it will never happen. My problem is that we will never know if it ever happens too.


Isn't a hash map supposed to already be internally hashing the string for you and correctly handling collisions?


Yikes.

It's valid to assume "it will never happen" for 128 bits or more (if the hash function isn't broken) since chance of a random collision is astronomically small, but a collision in 64 bits is within realm of possibility (50% chance of hitting a dupe among 2^32 items).


> valid, 128 bits

The birthday paradox is a thing. If you have 128 bits of entropy, you expect the 50% mark to be proportional to 64-bit keys, not 128 bits. 64 bits is a lot, but in my current $WORK project if I only had 128 bits of entropy the chance of failure any given year would be 0.16%. That's not a lot, but it's not a negligible amount either.

Bigger companies care more. Google has a paper floating around about how "64 bits isn't as big as it used to be" or something to that effect, complaining about how they're running out of 64-bit keys and can't blindly use 128-bit random keys to prevent duplication.

> bits of entropy

Consumer-grade hash functions are often the wrong place to look for best-case collision chances. Take, e.g., the default Python hash function which hashes each integer to itself (mod 2^64). The collision chance for truly random data is sufficiently low, but every big dictionary I've seen in a real-world Python project has had a few collisions. Other languages usually make similar tradeoffs (almost nobody uses crytographic hashes by default since they're too slow). I wouldn't, by default, trust a generic 1-million-bit hash to not have collisions in a program of any size and complexity. 128 bits, even with low enough execution counts to otherwise make sense, is also unlikely to pan out in the real world.


I agree that 128 bits is on the lower end of "never", but you still need to store trillions of hashes to have a one-in-a-trillion chance to see a collision (and that's already the overall probability, you don't multiply it by the number of inserts to get 1:1 chance :) I don't think anybody in the world has ever seen a collision of a cryptographically strong 128-bit hash that wasn't a bug or attack.

Birthday paradox applies when you store the items together (it's a chance of collision against any existing item in the set), so overall annual hashing churn isn't affected (more hashes against a smaller set doesn't increase your collision probability as quickly).


Based on currently available public estimates, Google stores around 2^75 bytes, most of that backed by a small number of very general-purpose object stores. A lot of that is from larger files, but you're still approaching birthday-paradox numbers for in-the-wild 128-bit hash collisions.


Hashtables have collisions because they don't use all bits of hash, they calculate index=hash%capacity. It doesn't matter, how you calculate the hash, if you have only a few places to insert an item, they will collide.


Right, but the problem they were describing was storing the "hash" in a hash table, not storing the item using a hash. For that, it absolutely matters, and the fact that it was a 128-bit hash IMO isn't good enough because the hash function itself likely sucks.


This makes no sense - it’s a hash map because it hashes things for you…


A hash map still stores the entire key, which may be undesirable if the key datum can be large.


Like the other reply said, a monthly link blog would be too much. I find myself ignoring a lot of those links of the month posts because it covers too many topics and I end up not liking most of it. Whereas one which has small commentary on a single link is great for me, since if i like the commentary enough, I would add the link to my read later.


There is a book called "Talking to your daughter about Economics", where the author explains one of the problems with bitcoin is, you cant print more of it in a crisis. Maybe thats what happened here.


That may be one of the problems with bitcoin as legal tender and official state currency but not a problem with Bitcoin itself, quite the opposite: as evidenced by the genesis block’s embedded message: “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks”


Ironically part of the long-term problem was El Salvador's dollarization, also preventing them from printing in a crisis (that crisis being the COVID pandemic).


Dollarization was unpopular at first in El Salvador but after 24 years of dollarization, and both right and left wing governments, there are are no official plans of rolling it back. It's way too convenient to use the world's most used currency as our everyday currency.


I hated SAS with a passion when I was forced to work with it for 2 years. One of the biggest problems I faced was, it would take me a long time to find out if something was doable or almost impossible in that language.

It wanted to be more than just SQL, but the interoperability with other languages was awful, we couldnt even work with it like SQLite.


The thing about the notifications though, is you can turn off every single one of them in the settings.

When I tried to turn it off though, I was hit with 100s of different types of notifications. I generally like it when apps/sites do this. This way I can turn off the garbage i dont need, and keep the ones I want. But 90% of thosr categories were garbage. It is really shocking that one can take a good idea this far and make it frustating.


Could it be due to the fact that, college level mathematics are meant to prepare extraordinary students to graduate into research. I imagine research often involves not-so sexy parts of mathematics.

I hear a lot of similar sentiments in school level mathematics in the country i live in. The problem comes in the trade-off. Do you cater to the topmost/ most promising students, who will end up contributing a lot in science, or do you cater to the ones who need most help, who will end up getting some skills on this, to make a living ?


I love how the guy is getting punished for misusing a math term.


'Asien' didn't misuse any math terms. I don't know why they're being downvoted.

'Shaburn' made some claims that sounded like they want to be math, but didn't make any sense as math. Hence my confusion.

I recognise that people use terms colloquially, like 'exponential', but given all the (pseudo?) sophisticated, math-y language in the comment

> Given the rate of mutation and transmission in bugs with natural gestation and migration, the probability of catastrophic outcomes is exponential without a similar dataset in other human food sources.

I had hoped that Shaburn actually had a more concrete model in mind that they could explain.


1. Number of people eating bug(driven by mimesis pushed through a media narrative and thus typically viral(often exponential). 2. Number of variety of bugs being eaten(regionality and entrepreneurialism((often referred to as Cambrian explosions in perfectly competitive markets, thus exhibiting exponential growth functions)). 3. Number of geographies bugs for consumption being grown in. 4. Number of production methods and processes. 5. New combinations of genetics of peoples and insects/infectious organisms being consumed. Think Montezuma's revenge or lactose intolerance in certain regions of the world except possibly contagious and deadly Multiply all that by orders of magnitude faster gestation cycle and thus the chance for mutation, aside from technology developed to support existing food chain, Number of mutations per lifecycle, increasing the chances of deadly DNA combination by 12x, so order of magnitude.

Average lifespan of... A. Bacteria: 12 hours B. Insect: 12 months C. Mammal: 12 years (shortest being the primary disease harbinger, the rat).


Could you please point to any source for your claim that the number of people eating insects has grown dramatically?

I could find this study from 2015 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552653/ which says:

> At least 2 billion people globally eat insects in over 113 entomophageous countries though this habit is regarded negatively or as revolting by others [4–6]. More than 1900 species are consumed by local populations globally but insect consumption (entomophagy) shows an unequal distribution.

There's roughly 8 billion people on the globe. Between 2 billion to 8 billion, there's not much room left for exponential growth.


Thats an excellent point. Insects dont need to be hardened for their pathogens if they dont necessarily have to survive for the species to stay alive.

Also loved the computer program analogy.


You can deliver the package to a amazon approved distribution center ( I dont know what they call them. Basically a shop where they hold your stuff until you come around and pick it up ). If you want to anonymize it, you can deliver it to some other state's distribution center and drive there to pick it up. Even better is to give a stranger your phone, to go and fetch it from the store, so that your face isnt visible in a CCTV cameras near the store, and while they come back to deliver it to you, you can fake a mugging and "steal" your own phone and the gift card while wearing a PPE kit or something, so that they dont know your dimensions.


Do you, by chance, write cheap adventure stories for a living?

Cause this sounds like something I read a few months ago. A pretty silly plan.


Wasn't it clear enough that I meant for it to be silly ?


Poe's law strikes again :P


"Officers, I just saw a mugging. Can you please send someone?"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: