Hey, I'm Mark Nadal and I was accepted into Mozilla's "Fix the Internet" program - it has been fantastic so far & huge respect to everyone at Mozilla for putting this on.
Note: I doubt anyone will see this comment as HackerNews has shadow banned my account because my Open Source project competes with some of YC's investments (it undermines & destroys the need/market for some of their crypto-coin scams).
I interviewed with YC a few years ago and their entire process was terribly unprofessional - I've gone through 3 accelerator programs now, and interviewed at many more. YC itself seems pretty awesome, but beware of the politics - if a single person (HN mod in my case) feels like you are a threat to their status, that person has enough marketing power to hurt you. Find the GOOD people in YC and work with them, don't leave it to chance or you'll fall out of good graces.
In contrast, look what Mozilla is doing! They're helping move the internet (the world, the community) forward, and are a non-profit foundation. I highly recommend applying to their program. They are GOOD people, doing good work. Please please do everything you can to help them make this a success - the internet depends on it.
We banned you after many years of you promoting your product aggressively on HN, using shady tactics such as voting rings and links to things that looked unrelated but were in fact stealthy ways of promoting the exact same thing over and over.
Meanwhile I can barely get Chrome/NodeJS to parse 20MB in less than 100ms :(.
How useful (or useless) would Simdjson as a Native Addon to V8 be? I assume transferring the object into JS land would kill all the speed gains?
I wrote my own JSON parser just last week, to see if I could improve the NodeJS situation. Discovered some really interesting factoids:
(A) JSON parse is CPU-blocking, so if you get a large object, your server cannot handle any other web request until it finishes parsing, this sucks.
(B) At first I fixed this by using setImmediate/shim, but discovered to annoying issues:
(1) Scheduling too many setImmediates will cause the event loop to block at the "check" cycle, you actually have to load balance across turns in the event loop like so (https://twitter.com/marknadal/status/1242476619752591360)
(2) Doing the above will cause your code to be way slow, so a trick instead, is to actually skip setImmediate and invoke your code 3333 (some divider of NodeJS's ~11K stack depth limit) times or for 1ms before doing a real setImmediate.
(D) I'm seeing this pure JS parser be ~2.5X slower than native for big complex JSON objects (20MB).
(E) Interestingly enough, I'm seeing 10X~20X faster than native, for parsing JSON records that have large values (ex, embedded image, etc.).
(F) Why? This happened when I switched my parser to skip per-byte checks when encountering `"` to next indexOf. So it would seem V8's built in JSON parser is still checking every character for a token which slows it down?
(G) I hate switch statements, but woah, I got a minor but noticeable speed boost going from if/else token checks to a switch statement.
Happy to answer any other Qs!
But compared to OP's 2.5GB/s parsing?! Ha, mine is a joke.
the thing is, it really was faster than gnu cat. I suspect it is because gnu cat does other things than just using Linux splice to a file descriptor and has options to count lines and such, and doesn't (didn't?) bother to use SSE. I just thought cat would give me a practical maximum to compare to when reading from disk.
I've also written and tried to optimize a hand-rolled JSON parser for exchange messages, just to see how fast pure JS could go. I tried many different things, but I only ever got near to the native implementation once I started assuming certain offsets in the buffer or optimistically parsing whole keys which were highly unsafe. My verdict was that you will never really get close to native, let alone close to hand-optimized C/C++.
> JSON parse is CPU-blocking, so if you get a large object, your server cannot handle any other web request until it finishes parsing
Well, your CPU core is busy on one request or another, so I don't understand why this is an issue as long as you're guarding against maliciously large bodies. Blocking I/O is different because your core is partially idle while other hardware is doing async work. Using Node.js' cluster module lets you keep more cores busy. Chunking CPU-limited work increases total CPU time and memory required. (This is a pet peeve of mine and a hill I'm willing to die on :-) .)
I think that is a good hill to die on, tho I would rather prioritize UX (browser not freezing) and server responsiveness. Ideally we'd have no CPU chunking & good UX, but if we have to choose one, which would you sacrifice?
There are third party bindings for nodejs https://github.com/luizperes/simdjson_nodejs. As you suspected, converting the entire document to a JS object is not recommended. [0] There is an additional API that allows you to query keys without conversion.
Yes, that is correct. I spent a lot of time on issue #5 to make as user-friendly as I could, but the only way I found to not have all the C++/JS conversion overhead was to keep the pointer to the external C++-parsed object. There might have other options that I haven't thought of, so if anyone knows of a better approach, let me know.
Hey, my Open Source database is the #1 CRDT rated system on GitHub, and is used in-production by 10M+ people a month, at non-profits like the Internet Archive, and others.
As usual, you continue to mention GUN one way or another in every post that is slightly related to CRDTs or decentralization. Your comment would actually fit this submission but I guess at this point people are so tired of your spamming that they had enough, even when it's relevant.
> To all interested in CRDTs: Marc Shapiro, @anne_biene
and I have set up a little CRDT community website. Lots of links to papers, blog posts, talks, and implementations. Contributions welcome!
> It is used in production by HackerNoon, non-profits like Internet Archive, & other large sites.
> Handling 10M+ monthly users.
I'm not sure I or you misunderstand "Just to be clear, Martin (site owner) added GUN to the list, not me" but it seems pretty clear that you prompted him to add GUN to the list, he didn't discover GUN on his own and then added it.
Mark, when you're gonna realize that this excessive spamming of GUN is not helping your case?
Yeah Mark is the only username I consistently recognize on HN other than Dang (the moderator), and it's because he promotes GUN without fail on just about every post I see about CRDTs, IPFS or decentralized anything. I wouldn't call it spam but it does get over the top.
Just for the record, I was already aware of GUN previously, and think it is a good project to include in the list. We had just forgotten about it when putting together this list of links. I guess I don't check HN all that often. ;)
That makes sense, thanks for adding that explanation. Just wanted to refute the statement that Mark didn't prompt you to add it, as we seem to be a bunch of HNers that are getting tired of the spam.
I get some hate my Open Source project & actively abuse downvotes/flagging to censor me.
But for every 1 hater there are 100s of hackers that have found, starred, used, or told me (in our chat channel) they were thankful they discovered GUN via HackerNews.
& HN guidelines encourage on-topic submissions & comments (https://news.ycombinator.com/newsguidelines.html) "Anything that good hackers would find interesting ... anything that gratifies one's intellectual curiosity" even you have to admit that GUN discussions have sparked a lot of intellectual & algorithm chats.
Finally, you state I'm not spamming then say I am spamming. Spam is indiscriminate posting of something, yet your very own comment says "you always post GUN in discussions about CRDTs & decentralization." That is indeed on-topic according to HN policy.
Nearly every comment you post has some cunning (or not so cunning) promotional reference baked into it. Examples are legion: https://news.ycombinator.com/item?id=22252497. Your rare comments that don't include something like this are so boilerplate as to come across as shameless padding.
Given that we've banned countless other users for lesser abuses, and given the regularity with which this pattern devolves into user complaints and off-topic flamewars like the current thread and https://news.ycombinator.com/item?id=22499177, I think it's time to bite the bullet and ban your account. I don't really want to—which is no doubt why it's taken so long—but you're clearly not using this site in good faith, and enough is enough.
> I get some hate my Open Source project & actively abuse downvotes/flagging to censor me.
The "hate" (I'd say critique) is not against GUN as a project nor the code or the project itself. It's about you constantly posting about GUN.
> "you always post GUN in discussions about CRDTs & decentralization."
This is a bit unfair, that's not a direct quote of what I wrote, and I'm sure you know this. What I wrote is this: "every post that is slightly related to CRDTs or decentralization". The "slightly" part is important, because many of your comments mentioning GUN is on submissions not related to GUN at all. For example, if a submission is about Mastodon, I'm fairly certain I'll see a comment from you promoting GUN with your usual metrics.
Again, I have nothing against GUN itself, but people do get tired, even if it's interesting the first time, when someone continues to mention their project over and over again.
Others have mentioned this to you in the past as well (one example: https://news.ycombinator.com/item?id=21383815) but it doesn't seem to stick. Maybe it's time for dang or other moderators to have a chat with you, if they haven't already.
Because I'm bored, here is some examples of comments you've made promoting GUN on unrelated posts with low effort content in order to get in a link to GUNs website (in the last 3 months) [maybe this list will help you realize what people are getting tired off]
I think parent was saying the projects using IPFS are relatively unknown, so where are the millions come from?
I have the same question.
In contrast, GUN, has 20M+ downloads/month from known sources: Internet Archive, HackerNoon, etc. (https://github.com/amark/gun see jsdelivr download stats).
Gun is currently a pile of shit. Anyone saying otherwise hasn't used it. The sole founder and developer is a very nice person, but it's the reality of Gun tech right now.
Could you expand further what didn't work? We have some pretty heavy load in-production systems running. Definitely some rough edges still, so I'd like to hear what problems you had, so I can focus on fixing them in the future.
It was a month ago I checked it out and it just wasn't usable - i do not remember the details but even basic things did not work. I had a json dataset i just wanted in a local db, and that was proving challenging for simple operations is what I remember, didn't even get into the p2p side yet. If I remember correctly, it wasn't returning accurate results when queried.
try creating a youtube video of getting it to work from scratch, reading and writing data and i think you will see the problems. Or maybe ask a friend that is less familiar than yourself to try it, the problems seemed rather obvious as it just didn't work for a basic use case.
If you ever have time, it'd be great to snag whatever code or data you had, to create a replicable test case for us to make a fix against. Sorry about the wasted time you experienced, I hate that.
lol it's not a HN post about a decentralized project without marknadal shitting on it and shilling GUN. got a running bet it always happens, and you don't disappoint <3 -- keep it up!
I'm not trying to censor anything, lol. Just referencing many hilarious moments in your comment history where the HN mods tell you to quit spamming, and here we are again with more off topic mark spam ;)
I gave a presentation at Berkeley a few years back, right before Bram Cohen (BitTorrent) presented his Chia proof, Proofs of Space and Time (before IPFS ripped the name from Bram).
Note: I doubt anyone will see this comment as HackerNews has shadow banned my account because my Open Source project competes with some of YC's investments (it undermines & destroys the need/market for some of their crypto-coin scams).
I interviewed with YC a few years ago and their entire process was terribly unprofessional - I've gone through 3 accelerator programs now, and interviewed at many more. YC itself seems pretty awesome, but beware of the politics - if a single person (HN mod in my case) feels like you are a threat to their status, that person has enough marketing power to hurt you. Find the GOOD people in YC and work with them, don't leave it to chance or you'll fall out of good graces.
In contrast, look what Mozilla is doing! They're helping move the internet (the world, the community) forward, and are a non-profit foundation. I highly recommend applying to their program. They are GOOD people, doing good work. Please please do everything you can to help them make this a success - the internet depends on it.