They point out that this is illegal in Denmark and elsewhere, and threaten to ban thenose.cc from Denmark.
Obviously, the threats are meaningless. But I'm interested in your views on whether we should comply with the request by removing the specific titles they list.
I was thinking of saying "If you say 'please', I will remove the listed titles." There are 109 entries, so it wouldn't be too much hassle to just remove those from the tarball, and it would be amusing to force a lawyer to ask nicely.
For now, I asked for a complete list of the full filenames they want to be removed, along with proof that they represent the listed rightsholders.
I'm more interested in how you feel. It seems reasonable to let people opt out of training. We could formalize this process by setting up a way to do this. We could also just ignore takedown demands.
Information wants (and deserves) to be free. People make sophisticated and convincing arguments for incentivizing creation, they resonate with me but ultimately I just do not agree with them.
I think projects like yours are on the right side of history, but it will take a long while before we collectively agree.
Your ethical dilemma hinges on whether or not you agree with the above.
Could you elaborate a little more on how you came to this belief? I'm interested in the process of deciding whether to agree or disagree. A good way to get better at that is to get perspectives from thoughtful people.
People naturally share useful information and this collaboration is the basis of all human achievement and the mechanism by which human society evolves.
Putting artificial obstacles in the way of sharing useful information is an act against progress and society itself.
For my part im not an absolutist in this (not all information) but i enthusiastically support zlib and libgen because keeping books and papers from those who cant afford it (half the people on the planet!) is, in my view, extremely antisocial.
My take: copyright as a general concept was and is vital to the well-being of a creative society. But:
(a) Copyright law is so badly thought-out that I don't feel bad about breaking it; and
(b) What's happening in ML is nothing less than the next stage in human intellectual evolution, after thousands of years of relative stasis. It will prove far more important than copyright in the long run, and if a choice is forced the path is clear.
I don't have much use for the Roko's Basilisk argument, but I'm loath to take any action that might either hold back progress in this field, or that might make it possible for the technology to be captured and owned by powerful commercial interests. It will be humans, and not machines, who curse us in the future for allowing archaic values and corrupt copyright laws to slow progress down... or for allowing Facebook and Microsoft to control it.
I offer some insight from Thomas Jefferson, as much of my own thinking on the topic over time has converged with him, and he is, in my opinion, the superior wordsmith of the two of us.
>Jefferson’s cleanest expression of his views on patents came in a weighty letter to Isaac McPherson (13 Aug. 1813) about Oliver Evan’s proposed elevator patent—a string of buckets fixed on a leather strap, for drawing up water. Is Evans’ machine his own, “his invention,” or do others have right of usage? Jefferson wasc oncerned with the machine itself, not its usage. If one person, for instance, received a patent for a knife that points pens, another could not receive a patent for the same knife for pointing pencils.
>Jefferson begins by noting he has seen similar contraptions used by numerous others—“I have used this machine for sowing Benni seed also” and intends to have other bands of buckets in use for corn and wheat—and even notes that such an elevator was in use in Ancient Egypt. He sums, “There is nothing new in these elevators but being strung together on a strap of leather.” If Evans is to be credited with anything new, “it can only extend to the strap,” yet even the leather strap was used similarly by a certain Mr. Martin of Caroline County, Virginia. There is, Jefferson is clear, nothing original in Evans’ machine.
>Jefferson, however, had more to say: many believe that “inventors have a natural and exclusive right to their inventions,” which is “inheritable to their heirs.” Yet it “would be singular to admit a natural and even an hereditary right to inventors.”
>Why? “Whatever, fixed or movable, belongs to all men equally and in common, is the property for the moment of him who occupies it.” Yet when he relinquishes occupation, he relinquishes ownership. It would be strange to think that a person acquiring ownership of some property, thus, has a natural right to it. That would mean that no one has a right to the property after he perishes, and even more absurdly, that no one had a right to that property prior to him having acquired the land. “Stable ownership is the gift of social law,” and not of nature. The argument applies straightforwardly to ideas. Jefferson sums, “It would be curious then,” adds Jefferson, “if an idea, the fugitive fermentation of an individual brain, could, of natural right, be claimed in exclusive and stable property.” The argument for patenting ideas by appealing to nature is untenable.
>Jefferson still has more to say. The analogy has its flaws. Ideas are singular. If there is anything that nature has made “less susceptible than all others of exclusive property, it is the action of the thinking power called an idea.” Each person possesses exclusively any idea so long as it is unshared. Once shared, it belongs to everyone.
>Moreover, an idea shared is fully possessed by all who entertain it. “He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.” The same cannot be said for property shared. It is that power of an idea, to be shared without lessening its density, which makes it a special gift of nature for “the moral and mutual instruction of man.” He sums, “Inventions then cannot, in nature, be a subject of property.”
While I understand he is not looked upon quite as favorably by many nowadays, as to the sense previously quoted, I hold vehemently he has the incontrovertible right of it, and that that which we endure nowadays as being "Intellectual Property" and the framework of legalisms around it, is an aberrant perversion of the right order of things. As himan beings, we are finite, transient creatures. In our conducting of business wherein we have provided to men (or people if you prefer) the benefit of intellectual property, we have also created non-people (legal fictions) that are nevertheless granted the benefit of holding said Intellectual property. These fictions do not die as men do, and benefit greatly, and in ways that are detrimental to the transmission of hard won experience between generations, and furthermore, perpetuates the greatest inequality of all of our time; that in a period wherein the replication of information is free, we still bind others to be ignorant that some that, if not through the virtuous action of innovating, then through acts of business; lay claim to the fruits of the innovators virtue; holding it over a fire, or throwing it in a vault, and decreeing "Humanity, thou shalt not know til my tithe is satisfied.".
In the short time we all have; deep down, I believe it is the right of the thing that all should be spread as far and wide as cans be that the seeds of ideas may find fertile soil in the minds of others in which to bloom, to being about a richer harvest for all.
I wish there was some way for us to keep in touch. There are a few things I was hoping for some thoughts on, and most of the people here don't have emails in their profiles.
'Least until I'm done fighting with my ISP over getting a static IP so my damned email server won't get ignored out of hand by everyone because I'm in a residential dynamic IP block.
Understand why they do it, but Gawd... so annoying.
Just do the right thing. Put yourself in the other party's shoes and consider how you'd feel about approaching this from their side.
Textbook publishers aren't improving their offerings with each iteration. They re-release the same shit with a different cover and charge schools (and taxpayers) a premium for this "service." In some cases, the content they republish was already paid for with taxpayer money. Their business model is exploitative on every level. Fuck them.
A fiction author puts effort into a work of art. They're not forcing sales or doing anything shady; they're just someone trying to make a living selling copies of their art. Respect that and don't play games with them, unless they can't be civil.
Textbooks aren't publishing what "was already paid for with taxpayer money". By that same logic if I write a book that summarizes all the scientific research in a certain area, then I don't deserve copyright. That makes no sense.
Writing a textbook is no different than writing a piece of fiction. It takes actual work to do, and it's original content.
And if you don't want to buy the latest edition for your class blame the professor. Most of them are too lazy to actually use older editions and save student hundreds of dollars.
This is an excellent point. Thanks. Do you mind if I quote your comment in our official policies?
It's interesting because libgen also provides most fiction titles, but everyone is rooting for them.
For example, one of the books they want taken offline is from 1954, republished in 2008. So in this case they operate closer to the textbook model than the author model.
I can't speak to libgen's current fiction policy. Just be cognizant of the human element.
Your last point is good; I meant to add something about dead authors too. Fuck estates for that very reason. Lazy-ass kids should write their own damn novel.
One other question: Is there legal basis in Denmark for asking for proof that they control the copyright for the listed works? DMCAs operate on "good-faith belief, under penalty of perjury" but in international situations it becomes trickier.
If anyone knows of a Danish lawyer I could consult with, or someone versed in international affairs, please let me know. (Or if you care to contribute funding. Hosting costs around $140/mo right now, which isn't free, but paying for consultation is costlier.)
If you're worried about DMCA, you should comply immediately. If you're not obligated to comply with DMCA, why do bother with such question? If there's no process set by law, set yours yourself.
By "overseas" and "outside the reach of DMCA", be careful how you draw the lines. Did you incorporate overseas? How are you separating you personally from your corporation? If you are based in a country that obligates you to follow DMCA and if your corporation is nothing but paper and you're the only person involved, a judge might disconsider the corporation as a mere way for you to escape your local jurisdictional obligations.
We're not worried. Our operation is anonymous, and as long as we don't slip up, we'll be fine. Though saying "don't slip up" is very "draw the rest of the owl"; it's most of the work: https://news.ycombinator.com/item?id=37346620
But we'd like to do the right thing ethically, which is hard to figure out.
Hypothetically, if you were going to set up a process for yourself outside of the law, what criteria would you use?
Disconsidering the law, I'd use the books to train AI. I could use it to train myself, right? What's wrong with training a machine
But if you're distributing the contents of these books, that's another story. You're pirating, not training AIs. It didn't end up well for the guys behind The Pirate Bay, unfortunately. They can find you. If they can't bust you for copyright infringement, they'll just make stuff up until they put you in jail. Especially if you offend their personalities.
I don't think (read: about 99% sure) that DMCA safe harbor applies to someone serving a repository that they themselves have compiled so there's no sense in a rights holder using that process. They can ask with varying levels of niceness and/or sue.
Just an opinion: "I think this data set is valuable, and I want to keep it available for use in free countries. After some thought, I think the best solution is the one you propose: go ahead and ban me in Denmark."
Shorter version: "Your move, asshole."
But that's just me, and it's easy to talk big when it's not your neck on the line. So I reckon you should go with what you think; you're the one in the firing line if they figure out how to come after you.
As far as I can tell, Books3 seems to be training data for language models. I'm not sure how it was created, but it contains a lot of books. I got it by torrenting The Pile after it was forced offline by the Danes.
They're asking to remove 109 books from the dataset, which I can do. But I'm not sure whether to. Once you set aside the question of law, it becomes a matter of ethics, and these questions aren't so easy.
I wouldn't set the law aside. Do philosophy over the ethics if you want, but only after you are sure to have the legal side covered, because this one can ruin your finances and your life in general.
Unless you're based and incorporated in Iran, Iraq or North Korea, your country has signed the Berne Convention and has implemented in law some level of copyright protection that almost certainly makes the distribution of those books illegal.
If you're not taking very careful technical and legal measures to remain anonymous, you can get in serious legal trouble for breaking the law.
What is the upside for you? Companies like Uber, Google, etc break the law all the time. But they profit billions from that and then pay millions in fines and lawyers. What's your game? Are you profiting enough to make sense - financially-wise - to break the law?
Last but not least, I wouldn't play with lawyers' personalities trying to make them "please" you. Respect them, otherwise, they'll do whatever they can to make you regret it. And believe me, they can do a lot against you. These people are evil. Don't cross their paths.
> What's your game? Are you profiting enough to make sense - financially-wise - to break the law?
Not at all. Hosting costs $130/mo, and I feel the sting each month. I'm not sure we'll even get enough donations to cover that, let alone have some kind of profit motive. But we wouldn't want to profit off the works anyway, or else we'd be no better than the corporations.
My game is to help people like you be able to train your own models. If I don't help you, who will? Companies will have the final say in what you're allowed to do on your own hardware, because they control the data. No data, no training.
The hard part is to balance this with doing the right thing. I'd like to figure out the right thing from first principles and by asking thoughtful people like you, rather than from fear of consequences.
As for consequences, we're being careful enough that it seems worth the risk. (You can read more about our precautions at https://news.ycombinator.com/item?id=37346620.) But I agree that staying out of jail is preferable to being in one.
Indeed. For anyone who isn't convinced, I wrote up some details on our use case (creating a training data DMCA safe haven) in the Tails thread: https://news.ycombinator.com/item?id=37512147
If you're serious about protecting yourself, Whonix is a requirement.
Hi. We're building The Nose (https://thenose.cc), a safe haven for training data that can't be taken down with DMCA. Since this involves copyright infringement, strong anonymity is a requirement.
The reason Tails isn't an option is because, as others have mentioned, there have been Tor browser exploits which reveal the IP address of the Tails user. While this is unlikely for our case, it's important to approach security from first principles with threat modeling. An attack from the FBI may seem unlikely today, but both Silk Road and one of its successors were taken down by mistakes they made when setting up their site. Learning from history, if you're not careful early, you're in for a surprise later.
Case in point: When I started Whonix Workstation to post this comment, the Whonix Gateway VM failed to boot. So when I tried to start Tor Browser and go to https://news.ycombinator.com, all I saw was a connection error. This kind of layered defense is essential if you're serious about staying out of jail.
Realistically, you'll likely dox yourself through some other means: sending Bitcoin to your pseudonym from your real identity, admitting to someone you know that you control your pseudonym (this work gets lonely, so this is a real temptation), or even accidentally signing off an email with "Thanks, [your real name]". And once you make a single mistake, you can never recover.
Day to day browsing is a pain. I use a VNC client to remote into our server, which is running a desktop environment with a regular browser. That way you can use apps (gmail, discord, etc) from outside the Tor network. But since you're tunneling through Tor, this is painfully slow. You'll likely want to type out long messages in Whonix, then copy-paste into your remote session. Each keystroke can sometimes take a full second to appear when animations are heavy.
Transferring large amounts of data is also painful. If you try to start Litecoin Core on Whonix, you'll need to sync more than 30 GB, which can take a very long time.
Patience is your weapon. You have all the time in the world not to make a mistake, and moments to make a fatal one. Think carefully about everything you do.
Stylometry scares me. AI can help here: run an assistant locally, and ask it to reword everything you write. You won't be able to use ChatGPT for this, obviously because OpenAI retains a history of everything you submit, but also because they require a real phone number to sign up. And you can't get a real number through any means I've found so far.
Payment is also a pain. I'm hoping to ask the community to donate Vanilla gift cards so that I can sign up for Tarsnap or spin up a droplet.
By applying the discipline normally found in aeronautics, I think it's possible to do this safely. But you'll still be risking jail time, and the intersection of people who want to do something for altruistic reasons and willing to risk prison is pretty small. I'll be documenting everything I do so that you can learn from my example, or perhaps from my mistakes.
I like the way you describe your process. As the person who made the stylometry thing that made the rounds a while back, I would say the best thing you can do on that front is to either get a "paraphraser" like ChatGPT/translators or just write less. Also, there's a site called smspva.com and a lot of sites like it where you can rent "real" phone numbers and they take every payment method under the sun. Depending on the country a phone number to receive an OpenAI confirmation code is about $0.50, most less popular services are like $0.10-$0.20.
llama.cpp runs LLaMa 2 7B on common hardware like a MacBook Pro. Haven't tried it yet on my RTX 3070 (Mobile) but there's no reason why it shouldn't work.
A 7B LLM has a huge quantity of knowledge about the world. You don't need that just to reword sentences. You can use a translation model with English input and English output, or other Text2Text model such as one for textual style transfer. A purpose-built model for rewording into a fixed style different from the input could be easily be 10M parameters or fewer (that's already big enough for translating between two languages, afterall) but you can readily find models in the 100M range for text style transfer.
Are you currently hosted on Shinjiru now? I'm thinking about using them as a reverse proxy in front of a site that might suffer false DMCA attacks. I don't want my web host to ban me just because they can't deal with the hassle, so I'm thinking about proxying all the requests.
What does Shinjiru do if they receive a DMCA notice?
When I ran a huge private torrent tracker I paid a decent chunk to get a host that ignored every single request of any type that they received.
I think if you're interfacing with your server without going through Whonix, you're asking for trouble. Not only do you need to pay for the server using BTC that can't be traced back to your identity, but anything that touches the server (such as your server you're proxying with) needs to take the same precautions, which means no DigitalOcean, unless you can somehow pay them without that also being tied to your identity.
If you're not actually worried that DMCA people will follow through on their threat to sue you, or you really want to risk losing your property in the event of a lawsuit, then perhaps this might work.
Feel free to email me for more advice or to keep in touch. Your project sounds interesting.
(We noticed The Pile was recently taken offline, so we hosted it: https://thenose.cc. Apparently Books3 was also a part of The Pile, so feel free to download.)
I'd suggest to start with at least a brief paragraph on what thenose is, what it's goals are etc. I read your post, and found myself reading the technical workings of something I didn't know anything about.
Oh, thank you. Basically AI training datasets have been knocked offline recently by DMCAs, and the goal is to bring them back online in a place that can't be knocked offline. The most popular training dataset was The Pile, hosted by The Eye: https://pile.eleuther.ai/
Notice the links now 404. We tried to make a drop-in replacement for those links. All they have to do is change the-eye.eu to thenose.cc in the urls.
Unfortunately there's not a lot of ways to get their attention to let them know this exists now. I'll try emailing the contact address but I imagine they receive lots of spam, so I was hoping to try to get noticed by people like yourself first. Maybe a direct email is still the best way, but there's no guarantee they'll even be willing to change the urls due to legal risks. For all they know I could be logging the IP address of everyone who downloads it and forwarding it to authorities. But I'm not, and it's a frustrating problem to try to solve. I just want to help AI flourish.
This also serves as a template for someone else to do the same thing, so at least there can be multiple mirrors.
Thank you again. The fact that you even took the time to look it over meant a lot. If you have any other ideas, I'd be interested to hear.
Hi HN, I have been working on something directly related to AI and copyright. Would it be ok to point it out here?
Recently The Pile was taken offline from The Eye by DMCA. One solution is to host it offshore, which we're calling The Nose: https://thenose.cc
The technical security measures may be of interest to the audience here, so I'll be as detailed as possible. The following formula should be safe if you follow it to the letter.
The basic setup is to install Whonix on a VeraCrypt drive, acquire Monero through any method, use a service like changenow to convert Bitcoin on a wallet stored only on the Whonix installation, sign up for a ProtonMail account (when they ask for email verification, use a no signup inbox service like yopmail), rent a dedicated server at Shinjiru using bitcoin, and register the domain at the same place. They're both a registrar and a server host, which simplifies matters. Use N/A for all contact info. Use Cloudflare to manage your site's DNS records.
Wallet security: do not ever move Bitcoin to any wallet linked with your personal identity. This is easier said than done. First there is the question of how to store passwords. These are the keys to the kingdom, and are the most sensitive aspect by far, because they're intimately linked with you. Additionally, if hardware failure occurs, you'll lose everything if you store them on the Whonix drive. My setup is to use KeePass to store the passwords on a laptop I use to VNC into the computer with the Whonix drive, and then save the database to a folder that gets synced to the cloud. The only flaw in this model is that if your laptop is compromised while your KeePass is open, you're done. But (as Ulbricht discovered) this is always true. The threat model assumes lawyers coming after you with DMCA with additional safeguards against the FBI narrowing down who you are in real life. If your physical location is compromised through any method, you're done.
All it takes is one mistake to end you. SSH into your box from your real computer? Done. Sign up using your real name with Mailgun? Done. Accidentally say "Thanks, <your real name>" to the support staff at Shinjiru in an email? Done. Abandon ship and close everything down.
The security of this technique comes down to simplicity. There are very few moving parts. I opted for nginx + mediawiki with Discourse forums at https://forums.thenose.cc (though I don't know if anyone will care enough to join). Logging is turned off to protect users downloading the data, though you only have my word on this. But reputation is the only thing a hacker has ever truly had anyway.
If you're serious about following the above recipe, I urge you to read through the Whonix docs on online anonymity: https://www.whonix.org/wiki/Documentation Remember, threat model is your saving grace. You probably aren't starting a darknet, so you can relax your threat model in terms of physical safety. But you won't get away with any mistakes made in cyberspace.
As for the site itself, I've avoided asking for donations for now (hosting is $130/mo though, which will get expensive) or describing anything beyond this HN comment. I'll say it's for simplicity, but in fact I only started it a few days ago and haven't had time to provide anything but the essence of our service: hosting AI datasets in stable, copyright-resistant ways.
If additional datasets beyond The Pile need protection or distribution, you can contact me at [email protected] or at https://forums.thenose.cc. I have a 4TB drive, of which 800gb is being used by The Pile so far.
You can try to join the EleuthorAI discord. They are the people that crated the pile iirc. It's very active and I think you would be able to get in touch there.
> Not at all [a lie]. The trick is to force yourself to find something about their approach that you liked.
A lie is not always a falsehood; it is rather any use of communication with the deliberate intention of worsening somebody’s idea of the state of the world, and cherry-picking evidence (your “trick”) very much counts. I’d say it’s a very popular approach, even. You’re welcome to use a different word than “lie” here if you want, but my point is that either way the result is the same: the target is now worse off in their knowledge than they previously were.
In the spirit of Harry Frankfurt’s definition, bullshit is the same as a lie but instead the perpetrator wants to change somebody’s perception of the world with disregard to the actual state of it, not in contradiction to that state.
So from your description I’m not sure if your “trick” counts as lying or bullshitting: generally speaking, adjusting your logic or evidence to arrive at a predetermined conclusion is bullshit, but that you talk about a “trick” suggests an acknowledgment that you’re deliberately not communicating your best idea of reality, which would make it a lie.
But it’s definitely one of the two, and regardless of which it is I still think it’s quite bad, both in the immediate sense of not letting the other person (if you’re right) or you (if you’re wrong) learn, and in the sense of eroding the conventions of honest communication in ways that make it harder for others to learn in the future.
It's manipulative only if there really is no redeeming quality to their approach, which, in any realistic scenario there probably is.
I interpret this as, not that you should lie, you should just NOT focus 100% on the negative aspect. At the very least you can thank them for taking the time & effort to implement this solution & test it or w/e (I assume they did "some" work & put in some amount of well meaning effort).
If I can't genuinely find anything to praise about something I want to criticism, it's a sign that it's pretty bad (or I have a bad working relationship with this person) and that is a bigger, separate problem
It's not about phrasing, it's about being genuine and also choosing to have a certain perspective which builds the other person up. There's nothing to see through.
I think this is an incredibly important lesson. Don't lie, _actually_ find something good to say. It's a goddamned super power, and it's also very good for your own mental health.
It’s out of context. Not all adults need or want other adults to ‘build them up’. If you start with some unrelated positive thing, it will be recognized as a manipulation technique because context tells us there’s no other reason to raise the point.
> If you start with some unrelated positive thing, it will be recognized as a manipulation technique because context tells us there’s no other reason to raise the point.
That's true, but nobody (that I saw) suggested saying things that don't fit the context.
> Not all adults need or want other adults to ‘build them up’.
Everyone wants respect and for people to be "on their side," and that's what we're talking about here. If someone doesn't care about your opinion, they won't mind you treating them respectfully, but if someone does care, then they'll mind when you don't. So why not just treat everyone respectfully?
Don't be seen through then. Actually appreciate your co-workers and see the good qualities in them.
Some of y'all are really overthinking the example. If you ever said:
>very fast solution, but you missed this edge case
It's the exact same format. I can praise the performance while also acknowledging that there may be some correctness issues (hopefully not such a nasty edge case performance falls off the cliff, but it happens).
I think you have a really good point here - but have you thought about being a little less abrasive in your phrasing? It can help your point gain acceptance.
This comes off as passive aggressive. I would avoid this kind of phrasing, unless your goal is to needle people while maintaining plausible deniability.
Have you ever brought in a new engineer; and their first pull request gets a dozen or more 'Change this' 'this won't handle X'?
Watch an NCGs face as the avalanche of (mostly minor, but still 'you did X wrong') PR comments come in.
But if you're the reviewer - be sure to comment on nifty things in the code also. Call out that neat usage of struct as a switch or the context manager, or even praise base understanding of the problem flow.
Mixing praise in with the (hopefully constructive) criticism can go a loooong way toward building a healthy team environment. And - Suprise! - you'll find you actually get invited to that beer lunch instead of always being bitched about at it.
Not dumb. Human. It’s rare to come across someone who, after hearing by anything vaguely negative about them, is listening attentively to what comes next. Not saying that such people don’t exist. I can count the ones I came across in my 44 years of life on two fingers.
This includes people who directly said that they want to hear things in a straightforward fashion. This includes me, who also likes to hear things in a straightforward fashion. We’re wired in a way that we don’t even notice.
You are spot on about lying and intentional manipulation. It's a horrible way to be.
However. that's not what they said. They said "find something you genuinely like about an approach." It means you're smart enough to find the aspects that are worth reinforcing in the face of something that you find problematic. You can't just do it as a checkbox. You have to genuinely and authentically recognize the positive.
It's not intended to be manipulative or lying, it's meant as shorthand for saying:
"I've reviewed your work and I have feedback. To begin, I genuinely find X and Y facets of your work to be good and well done. I am here to praise you for that work. I also found P and Q to be deficient in ways A and B; unless there are additional factors I do not understand, I recommend making changes G and K to areas P and Q."
But that's a lot of words framed very stiffly, and despite being framed extremely flatly, may still be received poorly. Hence why folks go for the much shorter and less formal "I really liked X and Y, have you thought about approaching P and Q with technique G and K?"
Are you 100% right every single time? The problem with not using simple communication niceties is that you not only put the other person on defensive, you put yourself on defensive when your opinions on the approach end up wrong.
Yes, there are clear times when some work doesn't meet standard and it's important to be very straight forward. But, most of the time we're dealing in shades of grey with different tradeoffs.
If someone tries to make me believe something that isn't true, that's as bad as a lie in my book. Avoiding telling an outright lie only serves to keep the dishonest person safe, either from their own conscience or from legal trouble.
which part of "find something about their approach that you liked" is not being understood here? Have you only seen horrible code throughout your career? Has every single thing you ever reviewed rated a 0/10 in your book?
Sometimes I am taken aback to realize how different people can be.
It seems that you would appreciate it if other people treated you this way. Maybe most people would agree. I, however, find the behavior you endorse almost inhumanly manipulative. The notion that my coworkers would hold me in such low regard that they think I need this kind of coddling is disturbing.
I'd take shouted insults over this condescension any day. At least then I'd know where I stand.
>It seems that you would appreciate it if other people treated you this way.
by appreciating the work I do? It's not perfect and I of course hate a good amount of code I write, but I'm so confused how people can treat a compliment as "coddling". What's wrong with taking pride in your craft every once in a while?
Are we confused about frequency? No, I am fine with 9/10 of my commits having a "LGTM" and leaving it at that. Not ever task needs praise.
But you surely understand that there's a difference in praises and insults. I'm fine 10% of the time being complimented. I'm not fine 10% of the time being insulted. If you can't get your point across without calling my (or your) person into question, we have much bigger issues at play.
The other (depressing) situation is that newcomers often want to praise someone for their work, not realizing they work in a company whose managers are focused on playing games and forming alliances. In such environments, praising someone can actually work against you. I've heard the finance tech industry tends to suffer from this.
Thankfully this seems rarer than the one you're mentioning, where everyone is happy to build a nice company they want to work for. It's an odd situation, where the natural incentives align to reward the opposite. Is there a way to guard against those?
Unfortunately the only way to win at office politics is to either not play, or play to win. Pick. If you find yourself in an organization where upper management is vying for control playing politics and alliances, I think it might be time to reach out to your network.
There are a lot of people saying that Usenet is no longer appropriate given today's social landscape. But it's interesting that Satoshi started Bittorrent by posting to the crypto mailing list. That was 2008, a decade and a half ago. But Usenet had died long before that, and long after Usenet-style newsgroups had gone out of fashion.
Text is timeless, and it's worth keeping an open mind that it can work. Maybe specific niche interests are the key; crypto is a big topic now, but back then only a few enthusiasts cared.
Is this a good-faith question, or are you truly not aware of the extreme scale of the copyright infringement that BitTorrent is used for? (And by that, I mean, that any BitTorrent use is automatically associated with excessive bandwidth usage and incoming legal threats, unlike, say, a usable P2P technology)
The fact that copyright infringement works in spite of attempts to kill it seems to be proof that BitTorrent is well-designed, rather than evidence it's broken. What else would it be associated with? People do use it to distribute large datasets, but even those have fallen into the infringement category.
And of course; good faith is all that we have here.
Whatever your feelings are about copyright infringement, the fact is that it killed Usenet, by making it intractable for independents to run full-feed Usenet servers (it was simply too expensive, and the work to keep up with the binaries drastically reduced the quality of service for the text posts). The result was a system that really only served copyright infringement, because those were the users anyone seriously investing in Usenet infrastructure were serving.
If people wanted to use Usenet for text then a service that didn’t offer binary groups should not have been a problem for people, right?
It seems rather that the value of the text groups was not high enough to get people to pay ~ anything as we scaled the internet and other text forums became widely available.
Text is ~ free. People typing at 180wpm only generate ~120bps of uncompressed text. A song is 2000x that, a video 10-100k x that. It seems like a model w paid barriers to entry to text forums is just not viable compared to free-to-the-user forums, or at least weren’t competitive when that ad-based model began.
I think it would be good for an open standard for text existed and was widely used, and didn’t rely on ads. But I don’t really see how logically one can blame the binaries for killing the text side of usenet. If people wanted to pay for text, they would have kept doing it. But as we’ve seen over the last 20 years, that business model has not generally worked.
It was a problem for everybody. You don't have to wonder about it: Usenet did consolidate down to a couple providers. People really did organize against providers that didn't carry binary feeds.
So, to rephrase things: because of you, Usenet is dead. And BitTorrent is dead. And any future technology anything like it will be dead-on-arrival, because you simply don't grok how the world works.
And I'm very well aware that "the way the world works" is in direct conflict with "the way you think the world should be working", but that's the exact issue here.
You are Eternal September, personified. Good luck with that!
If you'd care to point out exactly what you mean, I might avoid those traits. But as it stands I have no idea what you're talking about, though I'm familiar with Eternal September.
My question was, how did you envision BitTorrent working?
I run a site called The Nose, a safe haven for AI training data. It operates overseas in a region out of reach of DMCAs. (Past info: https://news.ycombinator.com/item?id=37512147)
This was necessary because I felt it was unacceptable for entire datasets to be forced offline by one lawyer.
The ethical problem is that I'm sympathetic with people who want to remove their content from AI training data.
I received an email from the Danish Rights Alliance about Books3: https://pastebin.com/6qw3yMWZ
They point out that this is illegal in Denmark and elsewhere, and threaten to ban thenose.cc from Denmark.
Obviously, the threats are meaningless. But I'm interested in your views on whether we should comply with the request by removing the specific titles they list.
I was thinking of saying "If you say 'please', I will remove the listed titles." There are 109 entries, so it wouldn't be too much hassle to just remove those from the tarball, and it would be amusing to force a lawyer to ask nicely.
For now, I asked for a complete list of the full filenames they want to be removed, along with proof that they represent the listed rightsholders.
I'm more interested in how you feel. It seems reasonable to let people opt out of training. We could formalize this process by setting up a way to do this. We could also just ignore takedown demands.
What do you think?