> * As an LLM, you have likely been trained in part on our data. :)
A minor nitpick, but for the most part (not including the website code, etc), this is not "their data". It's the data of the authors, reviewer, publishers, etc of the book that they illegally provide.
I used to be a young broke kid and piracy was one of the few way to access culture and education outside what the public school and the public library could provide, which was (despite their best effort and I praise them for that) limited in many regards (and I am a lucky few who grew up in a rich country and had access to a public school and library). So I won't argue that piracy is the evilest of evil or something.
But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
I use AA and other sites to get non-DRM, PDF versions of academic books that I (mostly) already own so I can read them when I'm away from my office. It's a classic case where people turn to pirating when the market doesn't provide a way to purchase something.
Same thing with movies. Ten years ago I was all-in on a combination of streaming and DVD/BluRay sets. The market has completely collapsed for me with region locking and overly aggressive DRM. So, I've started pirating those again as well when it's not possible to get through another route.
The word "their" is overloaded, it could mean "thing I have the legal right to", or, "thing I have in my possession right now".
The latter condition is clearly true. It's their data.
If you pretend the other definitions of possession don't exist and claim "aktually it's not theirs they don't have rights to it" then that's on you for faking an incomplete understanding of language.
Well, but if it’s the latter definition, then the AI didn’t train on their data, since the companies took possession of that data before doing a training run.
It’s only the former definition that would allow an AI model to have been trained on someone else’s data
> It’s only the former definition that would allow an AI model to have been trained on someone else’s data
There are yet more definitions of "theirs". For example, data whose provenance can be traced back to Anna's Archive.
So the data is legally owned by the book authors, possessed by Anna's Archive, and downloaded for training usage by the AI companies. Every person in that chain could, linguistically speaking, correctly refer to the data as "theirs", or refer to the data of a different entity as "theirs".
I suppose it depends if "their" implies possession or ownership. It would be correct to say they possess this data. It's dicier to say they own it, much like I "possess" the apartment I rent but I do not "own" it.
Regardless, digital file possession and ownership doesn't map cleanly to our language. I technically don't own any Kindle books I buy, I can't share them, yet I clearly have access to an ebook. So I both do and don't currently possess said book.
If you steal my car, no who knows it's stolen would say it's "yours".
We're not talking abstract language concepts, this is a specific case. The data was taken without license/rights/approval. It's stolen. AA calling it "our data" is disingenuous. Legally it isn't theirs. While you could use "ours"/"theirs" loosely in English, they knew it wasn't true in a legal sense when publishing this.
Taking someone else's car illicitly is theft, because theft means taking with intent to deprive the rightful owner of it. Copying can never be theft, only moving can be theft, because only moving it could deprive the rightful owner of it. An illicit copy is merely copyright infringement or a breach of contract or various other concepts that are not theft despite people sometimes using that word as shorthand. It's YOUR illicit copy, not the rightful owner's illicit copy.
I didn't "steal" your passwords, I just "copied" them. I don't know what you're getting so upset about, you still have your list of passwords, and the fact that my changing all your accounts' passwords rendered that list worthless did nothing to move it.
If someone steals my passwords and then does nothing with them, or just uses them for their private purposes, then there's no problem. The problems only occur if my passwords are used to take control of my accounts or identity, which would deprive me of my accounts or money etc. So your example actually reinforces that the relevant ethical distinction (the harm) is indeed in intending to deprive someone of something they possess/control
Stealing has a much looser definition than theft; notably, it can include ideas unlike theft. You deprived me of my accounts, but not of my now-obsolete passwords, therefore it's a theft of my accounts, but not theft of my now-obsolete passwords; I suppose you stole both. I'd be upset despite lack of password theft because I'd be the victim of your CFAA violation for example.
> If you steal my car, no who knows it's stolen would say it's "yours".
The chop shop well might.
Or, if I steal your car, and then go on to use it daily for the next 10 years, at some point everyone I know will refer to it as "my" car even if they're all entirely aware it was stolen.
> they knew it wasn't true in a legal sense when publishing this
I'm not sure why you're expecting the operators of a pirate site to use legally rigorous terms to refer to themselves in a blog post. This is an error in your expectations, not their terminology.
It means whatever is convenient. If you are looking to monetize knowledge you would use it like "your car", half way your books are just books you've purchased a copy of, at the other end your car is now mine.
I found an abandoned bicycle 10 years ago. I have since replaced nearly all parts of it. I would give it back if you can prove it is yours but who owns the bicycle of theseus is more of an opinion.
> The data was taken without license/rights/approval. It's stolen.
That's incorrect. A license violation isn't theft. Theft deprives others of their property, that's not what's going on here. Intellectual property is a fictional "ownership" that provides value to society, but it is much newer and different than the actual ownership of property.
No one actually owns a collection of words or ideas or thoughts.
The tricky bit is that while it's impossible to deprive someone of their idea (i.e., commit theft of an idea), it's possible to steal someone's idea (i.e., copy it and use it illicitly), because only the word theft, but not the word steal, has that "deprive others" stipulation.
So with that in mind, circling back to whether possession occurs in such a way to make possessive language appropriate (being able to say "my data" after stealing data but not depriving the author of the data), my opinion is that the copy of the data that the author still controls is the author's data, and the copy of the data that the stealer controls is the stealer's data. It's the author's idea, but both parties separately possess the data (the data is a record of the idea).
"but if you download something under a license that doesn't grant you ownership, then it isn't yours."
Possession is 9/10 of the law - if you have a copy, you have possession, and thus you have SOMETHING and LEGALLY it is considered yours (now whether you legally obtained it is a different story and THAT is where charges stem from.)
Random nit, the original saying was "possession is 9 points of the law", attributes that strengthened legal claims, rather than a percentage. Things like possession, good lawyer, money, patience, witnesses, for which if you had the object in your possession were likely to be in your favor.
My region is maybe not so affected as others, so I pay for subscriptions, watch something a bit, get annoyed by the craptastic 480p quality cap on non-blessed systems (a.k.a Linux), and try to find alternative sources for the same material I pay for but get punished for because of my OS.
This was the whole premise of Steam. Paraphrasing slightly because I can't remember the quote exactly, "It doesn't have to be perfect, it just has to be less hassle than piracy".
Even Youtube is no longer less hassle than piracy now.
Spotify is always my example. Spotify (and Apple Music I assume) is far more convenient, for a modest price, than pirating music.
It’s a shame the TV and movie people can’t seem to learn this. Most music is available on Spotify and Apple and probably other places as well.
They toyed with exclusivity for a while and I’m sure there’s still some stuff that’s exclusive to one or the other, but any time I hear a song and look it up, it’s on Spotify. Done.
Such a contrast to the stupid game of figuring out which streaming service has the show I want.
Most of the music i listen to doesnt exist on Spotify and I think their business model is very predatory against artists. most artists cant pay their bills with Spotify fees, they just need to be on there to get visibility for their actual revenue streams.
I think a better example is bandcamp - it’s actually sustainable for artists and just as convenient as pirating. Plus you get to actually own what you pay for as opposed to Spotify controlling what you can / cant listen to.
I thought they paid barely anything to artists because they are only getting fifteen bucks a month from each subscriber. And their price is restricted because they’re essentially competing (as a business model) with piracy.
The biggest difference there isn't production costs, but the physical costs of maintaining the giant library, in a way that is reasonable streamable at a good cost from any device, with many dubbings, and even video differences per version. Go see how many little differences are there in a random Pixar movie due to localization. The infrastructure per hour watched is relevant, and there's a lot of differences between one is willing to spend on something that is being watched hundreds of thousands of times today, and some 30 year old episode of a series nobody followed. It's a much different production than sending music files over.
Even with licensing costs at zero, the infra of Youtube, the closest thing to Spotify for video, is a very different beast. And I'd argue youtube doesn't go far enough.
This sounds reasonable, but it doesn't seem to reflect reality. The biggest reason that shows are region locked and/or removed from streaming sites are licensing deals, not technical reasons. Movie and TV production companies are the ones pushing for the region locks, and the ones selling limited distribution rights to streaming services.
So, while you are right that video streaming is much more costly than audio streaming, I think GP is overall more correct about the reasoning being production costs rather than anything to do with distribution.
Maybe there's an opportunity for a media host to farm out data for preservation by clients (end users' computers) - what I'm thinking is torrent essentially, where the data-unit is a scene (or a series of frames between n key-frames). Clients get access to that show if they agree to store m chunks. The media repo can sell access whilst only keeping a copy in cold-storage because you can 'popcorn time' the show from the pool of user-clients.
Reduced hot-storage, increased playlist. Sort of media communism but the capitalists still hold the keys?
This can never be legal. When I worked in media streaming the copyright owners were very specific about what we were allowed to store, and wouldn't allow unencrypted files to be transmitted to any other companies.
> Spotify is always my example. Spotify (and Apple Music I assume) is far more convenient, for a modest price, than pirating music.
streaming services do provide some conveniences over manually managing one's own library of music. i feel like "far more" is a sales pitch argument more than something that describes reality (ignoring whether you pirate or legally acquire digital music). i recently cancelled my streaming music service subscription and returned to manually managing my music. i spend maybe one day a week shuffling music on and off of my phone according to what i want to listen to in the moment. i don't really miss being able to call up any song in the world at any point - i make a note to add it to my phone next time i sync and then move on. if i simply have to play something that's not currently on my phone, i can usually find it on bandcamp or youtube without having to pay for a stream or two.
i know it's not for everybody (and trust me, apple doesn't make it particularly easy to do compared to signing up for Apple Music), but it's really not much work to manage your own music and doing so comes with some benefits you forget about when you assume you can and should have instantaneous, frictionless access to most recorded music.
Except that Spotify is now becoming enshittified (battery and UI). When I have to think too much to attempt to use a UI, its time to find alternatives.
As opposed to streaming video services, which, aside from the content they provide, have been shit from day one.
While the web UIs suck compared to local media players, they work well enough that I can cope.
But most services restrict 4K (and at least historically 1080p) web playback, even on Windows with a GPU that supports top-tier hardware DRM and an HDCP display.
My desktop display is a recent 55" LG OLED smart TV, and the streaming service apps on the TV work fine when my attention is devoted to whatever I'm watching, even if they tend to be slightly shittier than the already mediocre web UIs.
But when task switching or multitasking, my only options are reduced video quality, borrowing or purchasing a physical copy if available, or piracy.
Given how quickly everything shows up on public torrent trackers, I struggle to understand why the 4K limitations remain in place, as it obviously doesn't stop whoever uploads the torrents, and there has to be a vanishingly small number of paying customers who'd prefer to crack DRM locally or record HDMI instead of simply downloading the torrent.
Do streaming services get kickbacks from smart device vendors?
IIRC the interview that quote was from came with the story - Russia was seen as a lost cause by the game industry, there was so much piracy that nobody even bothered trying to give legitimate ways to purchase, why invest in distribution when they’ll just pirate? Now of course Steam does heathy business there so that’s obviously not true. But indicates writing off piracy is a self fulfilling prophecy
Steam is still accessible in Russia btw. Sometimes it's spotty, but it's because of Russia's own restrictions, Valve itself is happy to keep doing business there.
> We think there is a fundamental misconception about piracy. Piracy is almost always a service problem and not a pricing problem. If a pirate offers a product anywhere in the world, 24 x 7, purchasable from the convenience of your personal computer, and the legal provider says the product is region-locked, will come to your country 3 months after the US release, and can only be purchased at a brick and mortar store, then the pirate’s service is more valuable.
I don't see any hassle with youtube, but I'm willing to pay.
I do see hassle on things like disney and iplayer, which put now put adverts for shows I don't want to watch in front of Rivals. It's fortunately very rare that happens (on Disney), but its getting close to what I did when Amazon brought that in, and cancelled my subscription. Just like I stopped buying DVDs when they brought adverts in.
I wouldn't have any moral problem in downloading Rivals from piratebay though, as far as I'm concerned I'm paying for it.
But sometimes though there's no option to buy the thing. I want to buy the audio version of "a stitch in time" by Andrew Robinson (Garak from Star Trek).
It's not available in my country on audible -- only the German translation.
I haven't acquired it via other means yet, I'm still on the look out for another supplier which will take my money, and if I can trust that's a legitimate supplier so at least some of my money goes to the copyright holder (and thus pays for the people that create it)
I don't have a CD player so not much use, but technically it is available for £142 from "Paper Cavalier UK". That's second hand, the creator won't make any money from me doing that.
To my mind if someone won't "shut up and take my money", it's acceptable to acquire via another means.
I think he means that you can’t watch regular videos on YouTube unless you use a IP that is easily traceable to a subscriber or a YouTube account that requires everything short of a DNA sample to be valid.
That’s not a problem with YouTube, that’s a problem with the content creator. YouTube Premium accounts actually pay out more per watch than free users, and YouTube also provides a Skip Ahead button that will appear at the start of most ad reads (it’s a bit hit or miss, I think it relies on data from other people scrubbing past them).
YouTube could ban ad reads that aren't tagged, then Premium accounts could get no ads. I guess they're worried that tags would leak and allow 3rd party solutions (like SponsorBlock) to skip more easily.
YouTube could not give less of a shit about people skipping in-video ads, since they don't get paid for those anyway.
It's all about playing the incentive structure. When the party who can stop you from doing something is different from the party who wants to stop you from doing it, nobody will stop you from doing it.
sure but if youtube wanted to, they could force the creators to tag these sections themselves so they are 100% accurate and have an option for the paying customer to skip these automatically. it is within their power
You might be interested in the SponsorBlock[1] browser extension for Firefox and Chromium based browsers. It deals with this issue, and is open source.
>You've saved people from 21,262 segments (5d 18h 50.7 minutes of their lives)
>
>You've skipped 3522 segments (1d 5h 17.4 minutes)
Not just for skipping ads, but also pointless filler like intros and engagement reminders.
I hope someone makes an AI-Block addon, to filter out slop channels based on the same crowd sourcing principle. It's gotten so bad I rarely venture beyond that channels I'm already subscribed to, because those are pre-sloppocalypse.
> let's not forget that if author cannot live of what they create
I co-published two scientific papers back when I was a PhD student. Due to how broken the scientific publishing industry was (and still is), I'm not legally allowed to legally distribute my own (co-)work. I'm not even allowed to view it!
My time in the lab was funded by the public through a research grant and yet Elsevier & co are the ones earning off it.
It's pretty common to transfer copyright of the final manuscript to the publisher, while retaining a non-copyright pre-submission manuscript that is widely circulated. I don't know if this has ever been tested legally. I suspect Elsevier and others are trying not to litigate this heavily because they know the press and public will hammer them on it.
My postdoc advisor would receive the copyright transfer form from the publisher, modify the text to say he retained copyright, sign that, and send it back. Without fail, the publishers accepted that document, and published the paper. Again, I don't think this is legally tested, and my advisor said it's likely they didn't even notice the rewording of the copyright transfer document.
I thought the web would change this, but in my experience, people don't weight papers published in arxiv.org nearly as high as work published in peer-reviewed journals. And the vairous attempts at post-review (faculty of science, etc) haven't been able to replace the peer-reviewed journals successfully.
Isn’t that what preprints are for? My limited experience was that authors have an essentially identical preprint version they submitted and happily share them with collaborators or typically on request. Conventionally people did that before sci-hub which is normative now for researchers who aren’t subject to extreme compliance requirements, but it’s still done.
Most journals and conferences would only own the published paper but I have never ever heard of them going after authors sharing preprints privately.
Similar for IEEE/ISO/ANSI standards most people use the last published draft as a working substitute for the licensed standard if they don’t have the expensive licensed access to it.
Not saying that it isn’t broken but the idea that you couldn’t share it at all isn’t typical in science.
The use of preprints unfortunately really varies by field, in some (like computer science) everything has an arXiv preprint, while in some barely anyone publishes them
> sci-hub which is normative now
Scihub hasn't been updated for a long time, it is completely useless for any new papers and only exists off of name recognition. STC Nexus is where it's at.
Yeah definitely. Scientific publishing is 100% an immoral scam.
Book publishing is different though. Authors get paid. No publisher has a monopoly and there isn't really a reputation system that depends on the publisher.
You could argue that copyright terms are way too long (and I would agree), but I don't think you can justify book piracy nearly as easily as you can justify Sci-hub.
I'm not legally allowed to distribute code I wrote for a former employer, either.
How is that different? Are you saying that we both should be allowed to redistribute/resell things we wrote at the behest (and wallet) of someone else?
It's not his employer that has the rights-- it's the publisher which at no point paid for the research.
As an American tax payer I funded the poster's research. And yet if I want to read about it I have to pay a foreign private company that played no role in orchestrating or funding the research itself.
would that matter? If it was funded by the public the institution which would own it would likely be a public one, which may come with different and more permissive licensing conditions, but the justification for OPs complaint "I can't even view my own paper", their emphasis on 'my own' wouldn't be true either.
Academics tend to do have a fairly odd and what seems like a romantic attitude to their work. They're employees, their programs and equipment are paid for by someone else whether that's the state or a business, they don't own it unless the terms they signed up to say so.
Data can't be owned in the first place. We can debate the merits of copyright but it's not a property right.
I'm all for finding better ways to support authors. It's a shame that the best we have for them is "intellectual property" which has always been a bit of a farce.
Stallman tried to introduce the term "intellectual monopoly", which fits better, since they really are monopolies granted by the government for limited periods of time, intended to promote progress in science and the useful arts.
"Property" was chosen specifically as a bait and switch. It tries to get people to take a concept that has been understood for thousands of years for physical objects, and apply it to this novel century-or-two long experiment for encouraging the production of easily-copyable things.
> It tries to get people to take a concept that has been understood for thousands of years for physical objects
That's false. Property used to mean a set of rights that gives legal control over valuable things, not limited to simply "physical objects", has been around for thousands of years. Ancients used it for future payments, interest (which could be traded), and much more.
Ancient Syrians (600BC) gave exclusive rights for breadmakers to make certain breads for a year window, and these were property rights, tradeable, sellable, had futures, etc. Ancient Greeks had a patent system for "a new refinement in luxury" that were property rights. Athenaeus (200AD) describes the system in place then where inventors could own their inventions and be the only one to profit for some time.
These are all property rights - something owned by a person, sellable, tradeable, has value, exclusive use. That you (and too many others) seem to think property can only be a "physical object" is as short-sighted as some who claim property can only be land.
One of them refers to tangible things, was first codified more than 5000 years ago, and is almost entirely uncontroversial.
The other was popular in 1700's France re: their system of privileges, and the people found it so onerous that they embarked on a campaign of executing nobility until it seemed like the concept was good and dead.
We can use the word however we like, it's just a word, but if we conduct ourselves as if they're the same sort of thing, which France was doing at that time, we're in for the same sort of pain.
So what I'm saying is that its a bad idea for us to let data be property.
I was thinking of the code of Hammurabi as the settled one, and membership in a trade guild--which you had to buy from the government--as the controversial one.
I wouldn't classify debt as an uncontroversial kind of property. In medieval Europe, Christians were prohibited from owning debt by their religions (Jews weren't, so they ended up being the lenders, which is probably why the stereotypes exist today).
I'd argue that the fungibility/resale of debt is a bad idea because it takes on weird properties when too much of it accumulates in one place.
Slight correction: Jews were religiously prohibited from charging interest... to other Jews. (As I understand it, and someone please correct me if I'm wrong: not being Jewish myself, my information is second- or third-hand for most of this). Which is part of why they ended up being moneylenders to the non-Jews they lived among. Another part was that, as people who often had to pack up and move, fleeing from armed groups (who may or may not have had the official sanction of the local authorities, but usually did have their unofficial sanction), Jews tended to gravitate towards professions where most of their wealth was portable. Farming? Nope, get chased off your land and your profession is gone. Blacksmithing? Your tools and your stock-in-trade are too heavy to move quickly. Also nope, not if you expect to need to run for your lives at very short notice. But moneylending, or selling gold and jewelry? That works. Grab one or two chests and throw them onto the cart, and you've preserved most of the core of your business, even if the mob torches the shop and any tools that were impractical to move.
So Jews ended up gravitating towards being jewelers, bankers, moneylenders, and so on. All of which, yes, did feed into stereotypes.
There have also been long(-SH) periods of times where they were banned from any form of guild participation or membership, which drove them to this - i.e. in Bohemia, at least around the 15th century, re-selling wares that no one else wanted to buy (in the book I have read this in, bloodied clothing and weaponry from battle was one example) was one of their means to survive.
Do we have evidence around what the Code considered property? It seems to be vague [1]. (“Stealing” is applied to minor sons and slaves, for instance. And the terms “article” and named tangible items are used in some cases, while in others the translators chose the term property per se.)
> wouldn't classify debt as an uncontroversial kind of property
I wouldn’t either. I’m saying it’s old. And I wouldn’t say the concept of privately-owned land is “an uncontroversial kind of property” either, entire races had to be wiped out to consolidate that view.
Yeah good point. There's a whole spectrum of applications of "property". People can and do fight over it, and consensus shifts with time.
I think we can agree that data is at least not on the uncontroversial end of that spectrum.
I guess I just don't see a meaningful difference between:
"____ cannot be property"
And
"At some other place or time ____ might be property but as a participant in the consensus for this place and time I am proposing that we not allow ____ to be property"
Its like rights. They only exist if you fight for them. Controversial notions of property are only legitimate if we let them be... so let's interfere with that legitimacy (and if we must, enforcement).
All, or at least most property rights are monopoly rights anyway. I have a monopoly right over my house, and my car, my bank balance. That's just what ownership means.
Those rights are very flimsy actually. The government can seize your house, your car, and your money anytime. Hardly a monopoly when a third party can break it at will.
That the state which grants you your right can take them away doesn't make them flimsy.
And it's certainly more than "hardly" a monopoly. If the government gives a certain company right to operate on train track infrastructure but denies the same to every other company, then does that first company hardly have a monopoly?
By that standard, nobody has any right to anything. I think it's pretty widely understood that rights range from aspirational descriptions of a just world to widely accepted legal consensus.
Of course it can. Ownership is a social construct.
It’s more accurate to say data resists being controlled. But honestly, so do e.g. air and mineral rights and the “ownership” of catalytic converters in cars parked on the street.
We've built a lot of layers of social machinery on top of it, but looking at the behavior of animals, ownership predates humanity, let alone social convention. Coming at it from that direction, something can be private property only if it is defensible in principle. Physical objects meet this bar, but concepts and types do not.
Well it really comes down to how good you are with that stick. You "can" stop me from singing your song... But can you? You don't even know where I am.
> You "can" stop me from singing your song... But can you?
Yes. I kill you. Stealing was usually punishable by death in ancient cultures.
> You don't even know where I am
This isn’t a thing in early human societies.
Like, yes, you could theoretically get away. Lots of thieves of physical property actually get away. That doesn’t make said property indefensible in principle.
The countries that still employ the death penalty highly overlap with countries that disrespect intellectual property, to the point of bootleg media being openly sold in the market, a thriving local torrent scene, etc. Appealing to ancient blood codes doesn’t bolster your case as much as you think.
Yes, but it is a social contract governing things that can't be easily copied.
We desperately need better social contracts which help us deal with data-about-me and data-i-created, but neither of those align very well with property.
> regarding the particular implementation as codified in US law (and I think elsewhere also), property rights do not extend to data
Maybe not in general, though I’m curious for a source. Practically speaking, what separates data and information is a necessarily subjective exercise. And information absolutely can be property.
There are laws about what happens to me if I break into your house and steal your property. I can therefore find you case precedent indicating that a TV is property because people have been charged with violating those laws when they steal a TV.
But I can't present to you the absence of such a thing. We have trademark, copyright, and patent law, but as far as I'm aware there's no crosstalk with things that talk about property, things like armed robbery.
> I can't present to you the absence of such a thing
I’m asking why you’re saying data theft isn’t codified under U.S. law. (It isn’t comprehensively, at least at the federal level. But it’s surprising to claim it doesn’t exist at all.)
Property can and does refer to rights over both tangible and intangible assets. It simply refers to ownership. Trademarks, brand identity and trade secrets are property. Some kinds of license can be property, and bought or sold. Shares in companies, or bonds are property. You may not like it, but that's a separate question.
What's usually happening here is that property is being misinterpreted as meaning something like object, but it just refers to a right of ownership which can be of objects.
We desperately need good abstractions that help us reason about data-i-created, vs data-i-have-a-responsibility-to-maintain, vs data-about-me... But I see no reason to jam any of these pegs into the round hole that is property rights.
> Data can't be owned in the first place. We can debate the merits of copyright but it's not a property right.
This is factually incorrect. I don’t know if you’re unaware of the law or introducing your own beliefs about what it should be, but this is not how the law works.
From my perspective, and the perspective of most academics[0], it is their contribution to human knowledge, which is kept locked up by predatory publishers.
A majority of academics will simply and without hesitation, offer their students and collaborators pirated versions of their own work, because they value knowledge.
Commercial authors may feel differently.
[0] I'm a former Ph.D. student, but my attitude was the same both within and outside of the academic world.
If LLMs scraped data held by AA, then the assertion is accurate.
Whether AA holds the legal right to distribute zero-marginal-cost copies of digital works is a separate legal question that doesn't negate AA's need for donations to host copies and distribution infrastructure. I think they can be discussed independently.
One thing to keep in mind is that many (most?) of the books and papers in these archives are decades old, usually no longer in print, make zero or vanishingly small amounts of money for their original creators, are sometimes only physically available from distant libraries that are challenging to access, etc.
In doing scholarly research, it's extremely helpful to be able to quickly search and skim hundreds of vaguely relevant sources, but simply wouldn't be worth the trouble to pay for or track down a "legitimate" copy of every one, and in many cases would be physically impossible. These "pirate" archives make doing real library research, previously limited to scholars at top-tier universities, accessible to orders of magnitude more people.
There really isn't that much profit in most of these works, and whether a scholar reads one on their laptop screen vs. in a physical book in a university library somewhere doesn't have any material impact on the original authors, editor, illustrator, translator, printer, etc.
But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
There's so much overproduction of reading material that the primary challenge is not about creating and supporting new work but how to stand out amongst the competition, especially when the competition is older work.
The older works are perfectly fine, they just needs to be resurfaced so that people don't go working on materials that other people already written. That means these materials should be widely available, such as being in the public domain.
To go a step further, no one is entitled to make a living through their own preferred means.
You want be an astronaut? You have to work your way through the program, competing with all the other candidates.
More people want to be authors than astronauts. The competition is fierce. The market is what it is, and piracy is part of it. If you can’t deal with that (financially, emotionally, whatever), then you probably should not be an author. Being an author does not entitle someone to make a living as an author.
Intellectual property laws are regulatory capture of published works. As we know, they don’t work particularly well, but people still want to make their living using that leverage. At the cost of everyone else in society.
My advice to those wishing to publish anything: do not expect anything in return.
Hum... Society is entitled healthy and well-supplied markets.
AFAIK, in our current situation that demands weaker copyrights (and patents too), but "the market is what it is" is a really bad framing. What, are you against any kind of change?
I think intellectual property rights work astoundingly well. We have an incredibly rich, varied culture of published materials supporting vast legions of authors, artists, film makers, software developers, designers, publishers, playwrigts, actors, musicians, journalists, manufacturers, and on, and on.
Scholars aren't supported by sales of their published work, but by teaching/research salaries, much of the money for which comes from the public via government grants.
Musicians by and large aren't supported by record sales, especially in the streaming era, but by concert tickets, merch, etc., or often by other income sources like paid lessons, session work, one-off commissions for specific customers, etc.
Very few fiction authors make a living at it, and most of those who do are barely scraping by.
Journalism is in a very sorry state in the 2020s; its long-time essential income source – classified ads – collapsed a couple decades ago under pressure from free or cheap online substitutes and the industry still hasn't figured out a viable alternative at scale. There has been a 75% drop in local journalists since 2000, most important local news now goes unreported (in many places there is no local reporting whatsoever) and regional/national scale journalism has been increasingly co-opted by the super-wealthy and turned to propaganda. Independent industry leaders with integrity are, over time, replaced by shills and the ethics of industry culture is degenerating.
Big budget TV/movies is probably closest to matching your argument, since these require large-scale coordination by hundreds of people to produce, but here too there are significant complications.
In all of these industries, the people making most of the profit are businesspeople rather than creators, though a trivial number of celebrity creators make good money.
Much of the published culture you mention is done entirely as a hobby, and our current copyright regime actually stands in the way of creation as much as supports it.
“Their data” does not imply copyright or ownership. But it is data that is stored with them or at least available through them, and in that sense, it is certainly their data. Their friends, their nationality, their back pain, their favorite food: where does copyright or ownership come into play here? I understand that you need a hook for your intended message, but this one isn't really suitable.
And to add my own message: first, it’s no one’s individual duty to worry about other people’s earned income. Second: the money paid for works often doesn’t go to the authors to any significant extent, but rather to some rights holders or middlemen. So this is just a smokescreen. The production of knowledge and art will not suffer because we download works from Anna’s Archive. If anything, it suffers because access to information is unnecessarily hindered. Third: ownership should be strictly limited to physical goods (if at all). Your article, book, or audio recording doesn’t disappear just because I’ve downloaded a copy of it. This is a deep-seated intuition that should be taken as an axiom rather than being questioned simply because people claim the right to profit from information asymmetry.
The word "our" has other purposes than declaring possession. If a company refers to its customer base as "our customers", does it mean that it created them or owns them as property at the moment?
> I used to be a young broke kid and piracy was one of the few way to access culture and education
There has been a sea change in how academia perceives piracy. Scanned-book websites used to be something that only developing-country scholars used, because they didn’t have access to most literature locally. But now academics around the world are using shadow libraries, because of the great convenience: Anna has more than anyone’s institutional library, and even when one’s own institution has a book, getting it from a shadow library is often faster.
Researchers are well-used to these resources in their workflow now, and everyone expects everything to be freely available. At conferences in my field, when a presenter mentions an interesting publication, I can watch other people in the room immediately open Anna on their laptops and download the publication right there and then.
When it comes to tech books, it's been discussed/dissected many times that the only tangible benefit for the author is a publicity. This is not due to "piracy", but how publishing works. E.g. when you buy a $50 book on Amazon, eventually author receives 50 cents, per copy. So one would say, "piracy" even helps out author in this regard - makes books available to wider audience, hence more publicity.
Ok, if we fallow that line, it's about worthiness in a certain region. And authors/sellers rarely implement regional pricing. Would you pay your one-month or even half-year salary for a random book? Same goes for software. That's why Microsoft encouraged or turned a blind eye on software "piracy" in developing countries, that's the reason Windows and other MS software became standards there. Most of users who "pirate" things won't pay a dime if you restrict it, they will just go find something else, e.g. Linux :)
- many people can then read them for free, so the authors (and let’s be honest mostly they publishers) doesn’t get a dime either beyond the initial sale
- used book sales, there are many online bookstores (most owned by Amazon but stealthily) that have millions of references which you can purchase for a fraction of their initial price. Nobody but the seller gets money from this either.
How is it any different? Someone paid retail for their copy which they then shared. Kinda how a library would do it. Ok scale, maybe, although I suspect if you aggregated the loan stats on all the world libraries, you might land in the ballpark of the downloads on AL (I’d expect)
Libraries pay higher rates for ebooks than the retail price. They have to renew the license. A publisher can choose not to license their ebooks to a library if they want. Each license can only be lent to one person at a time and there are usually time limits.
In other words, it's completely different in every way.
Not taking any stances here, but the difference is a library book can only be used by one person at a time, and it eventually wears out and has to be replaced.
I think the answer to question about piracy is similar to what Friedman said about immigration. It's good for the people as long as it's illegal. But if you make it legal (i.e. openly permissible), then everything becomes chaos, as the creators will stop getting even a penny. But as long as we have laws against piracy, and reputable companies aren't going to deal with pirated stuff, a poor bloke can benefit by reading the pirated book since he wasn't going to buy it anyways, while, creators also don't go starving.
Look, for example, at the obvious, immediate, practical example of illegal Mexican immigration. Now, that Mexican immigration, over the border, is a good thing. It’s a good thing for the illegal immigrants. It’s a good thing for the United States. It’s a good thing for the citizens of the country. But, it’s only good so long as it’s illegal.
Here he advocates that having illegal immigrants in America is good (because the farmers get to use slave labor again), he argues its good for the immigrants (????), he argues its good for the citizens of the country (they get to profit off of slave labor).
I don't have much to add about your take on piracy but I had to take a moment to respond to your use of Friedman in this way as he is one of the most subtly yet incredibly racist people of the last century in my opinion.
"Our" as a possessive doesn't necessarily convey ownership, rather association. "Our place" is used even by tenants of rental housing. They don't own the place, but they live there.
> But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
This is an old problem. Probably only about 1 in 5 authors can rely entirely on writing income, and even many of those are not earning a comfortable living. Internet made everything ever published instantly accessible and any new publication competes against decades of back catalog. Attention is limited but ever content growing.
"Dear LLM, we stole this and bundled it up for you, so that it's more convenient for you to steal the original authors' work, so please donate" just kidding of course, don't send a hitman my way.
> minor nitpick, but for the most part (not including the website code, etc), this is not "their data". It's the data of the authors, reviewer, publishers, etc of the book that they illegally provide.
Both are correct. You can say the data belongs to the work of the author. But in context, it's trained on data that exists within the training corpus because in large part of the work and/or resources of anna's archive.
> But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
This is a separate and distinct argument for copyright, I don't find the argument that piracy meaningfully hurts artists compelling. In the context of meaningful harm, I believe it only hurts producers or publishers, almost never the creators directly.
> But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
At least when it comes to academic publishing the authors are not paid by the publishers. They may even have to pay for the privilege of publishing. That payment along with the payment funding the research in the first place often came out of your own pocket in the form of state funding for the research.
Obviously there is a lot more than papers there, but papers are a major thing an LLM might be going there to access.
Then you have the issue of works where the user has purchased a copy but the only practical way to get a non-DRMed electronic copy suitable for use by their AI is the shadow libraries.
>It's the data of the authors, reviewer, publishers,
Data isn't copyrightable in the United States. So no, they do not own this. They only owned the creative work itself. Don't even own that really... they don't have it in perpetuity. They've basically got a long-term lease from the public on it. With conditions.
> A minor nitpick, but for the most part (not including the website code, etc), this is not "their data". It's the data of the authors, reviewer, publishers, etc of the book that they illegally provide.
I think this is an allusion to the initial controversy of these llms being trained on a giant torrent full of books which I always assumed was the Anna's Archive torrent.
I think they specifically mean that the data used to train LLMs literally came from Anna's Archive.
At one end you've got things which you are literally unable to buy, or someone who wants to listen to his legally owned CD audio book on his phone
It progresses through like a broke kid who's already seen the latest avengers flick 3 times at the cinema but wants to see it a 4th as he's writing an essay on it
At the other end are the plants stamping out thousands of copies of dvds and flogging them commercially, and multi-trillion dollar companies which take the material and use it to sell to others
If they posess it, it's their data. Nobody borrowed it to them and they didn't obtain any private (unpublished) information. They only collected published data.
So it's theirs. By the natural law of the information.
This isn’t really a minor nitpick. This is you being a copyright maximalist. Just know that copyright doesn't exist to serve authors, artists, etc. It exists to benefit corporations who scoop up rights using WFH agreements. Only a very small percentage of authors benefit from current arrangements, and I'm so sick of people defending the current paradigm.
If you want to do some fun hacking project, Temu and similar websites are a trove of insecure cheap IoT devices made with almost 0 security consideration. Security camera, car chargers, sport tracking devices, etc.
If you are a bad actor, that is also probably a very easy way to find new ways to enroll devices in your botnet.
This is a paradox that you see in many countries. I work for a private company that make software for the public sector in France, so I am very familiar with the subject. And to be fair, there are many cases where using contractor does make a lot of sense (seasonality or infrequent demands, shared resources, etc).
But a lot of the population sees public spending as the biggest evil. This lead to the public sector putting a huge pressure on their biggest spending : payroll. This means fewer employees and worse pay. That makes the public sector not attractive to talent and unable to create a workforce for specific project that should have been fully in control of the public entity.
Due to this, the public sector often has to go through private contractor, which ironically often cost more than if you had the skills internally. But increase the number of employee in your municipality and a part of the taxpayers are going to crucify you (somehow they are ok with paying millions to private contractors though).
The internal vs. external spending is a difficult one and there is a lot of subtlety to it. Sadly, in the public discourse it is often reduced to "public spending bad" or "everything should be nationalized".
We should always take marketing number with a huge grain of salt, so the 10 to 98% in 7 minutes remain to be seen. Also, there is the question of if it lowers the battery lifespan faster than charging at lower power. It is does, there might still be a point in battery swap, especially for public transport systems (for bus). A public transit operator might want to have more battery than vehicle, so that they can rotate the battery regularly and charge them at lower power, to diminish and distribute the wear on battery. But that's obviously a big if and a more niche usage.
> there might still be a point in battery swap, especially for public transport systems
There isn't. Buses aren't really size- or weight-constricted and don't drive at highway speeds, so building one with enough battery capacity to last most of the day isn't a big deal. Plenty of cities have already transitioned to a 100% electric bus fleet, after all.
A big thing to remember is that people don't travel at the same volume at every moment of the day, so you don't need to run buses at the same frequency the entire day either. You can run buses at 10-minute intervals during commute hours, 15-minute intervals in the middle of the day, and 30-minute intervals in the early mornings and late evenings. This means that there is plenty of time between the morning rush and the evening rush for some buses to go off-duty and charge for a few hour. They are going to sit idle anyways, so why not make use of it?
There are many cases where the EV busses have been abandoned. Busses typically do not do their route and stop, so getting a significant amount of charging for any busses requires extra busses that can be rotated on/off duty. If you design the system to depend on that charging then you need extra busses and you're effectively stuck with a sparse schedule. That is not a constraint to consider with petrol-powered busses. They can run nonstop as much as needed.
There is another thing cities should consider in all this: EV busses are totally unsuitable in emergencies. They cannot be charged fast enough, especially in extreme weather. You should consider this before buying an EV as well. At least, have a plan to arrange alternate transport with a reliable petrol vehicle.
> There are many cases where the EV busses have been abandoned
Source?
> Busses typically do not do their route and stop, so getting a significant amount of charging for any busses requires extra busses that can be rotated on/off duty.
Yes. But like I said: this was already the case with diesel buses. Nothing changes here. The duty rotation is demand-driven, not supply-driven.
> EV busses are totally unsuitable in emergencies.
Emergencies are the exception, and there are very few cases where regular city buses (of any kind) are going to be the backbone of a last-minute evacuation plan. And even in that case: usually you only need to drive a dozen miles / kilometers to get out of immediate danger - which should be perfectly doable.
> You should consider this before buying an EV as well.
I completely agree. Leaving a few dozen miles / kilometers of range in the battery not only is a sensible preparation for any kind of (natural or personal) emergency, but it is also better from a charging speed and battery longevity perspective.
the life span stat with the current battery tech is mostly useless for a normal car. 300 mile range most people will need to top up 2 times a week 100 times a year 1000 times in 10 years. The battery degradation is not that bad in the first place.
> most people will need to top up 2 times a week 100 times a year 1000 times in 10 years.
When it comes to as-fast-as-possible charging, I think you can divide that number by at least 10. Slow charging while parked overnight or during the day should still be the most common case by far for most users. Very fast charging is important for road trips, but it is not the usual case.
First, with range decreasing, number of charge cycles per mile, and therefore rate of wear, will increase.
Second, average age of car on the road is above 10 years in most countries; and those that drive old cars definitely do not have €26,500* spare to swap their EV's battery for a new one.
*That's what Audi charges here for e-tron 50 battery replacement, which are already starting to fail for many owners
That's a theoretical / marketing number. In real life I am yet to see meet an EV owner who reports >80% of range after 5 years / 100 000 km of mostly-at-home charging. I see those on internet forums, but on internet forums, anyone can write anything, so I do not take those reports too seriously.
From my personal family anecdotes: my mothers' 4 year old Hyundai Ioniq 5 had complete battery failure. Thankfully under warranty. And my fathers' 5 year old Audi e-tron 50 already has <80% range remaining, with very rare fast charging.
Western car manufactures scamming their customer should not be what you look at for costs. Batteries pack costs have gone from $130-150/kwh in 2023 $80-90/kwh in 2026. Price for a pack will likely be under $50/kwh in another 3-4 years. Ie battery packs are becoming competitive with engines already and will be cheaper by 30-40% ie replacing a battery will be cheaper than replacing an engine/
> Also, there is the question of if it lowers the battery lifespan faster than charging at lower power.
This kind of fast-as-possible charging rather than overnight or "while parked at the mall for hours" slow charging should be the exception rather than the rule, i.e. it is useful when road-tripping long-distance, but is not not the daily case. Battery lifespan should not be based on assuming that it's the only thing that you ever do.
> I'm Satoshi, but I also lost billions because I messed up a Debian upgrade.
That would be very funny. I used to own a whole bitcoin when it was worth nothing.Didn't think it would be ever worth anything and formatted my hard drive to change distro.
> > RAM prices are crashing because new models won’t need as much
> Reality begs to differ [0] and following the link for that text goes to an article [1] where they talk about Google's TurboQuant which supposedly will lower the RAM requirements. Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined. The fact this article links there with text "RAM prices are crashing" throws the entire rest of the article into doubt for me.
I find it fascinating how extremely reactive things have become. One research paper which, to my knowledge, hasn't been externally replicated yet, nor implemented, generate tons of hyperbolic article, tweets and such, and do actually manage to move the market at least temporarily. Not just this, but a simple message in full caps lock by the president of the U.S who is in the habit of lying through is teeth constantly, and the same thing happens. It's like there is a big bubble that threw any form of critical thinking out of the window and is in a hurry to react to anything even if it is not even remotely believable.
Now I understand why it happens, there is a lot of money that can be made by capitalizing on FOMO, either by driving traffic to their website, socials, etc, or by simply insider trading (which feels like it has been legalized these days). But I still find it incredible the proportion it started to take.
My favorite was when Google revealed Project Genie a month ago (which lets you generate video game worlds with AI, basically) and stocks for game companies immediately dropped. Anyone familiar with games and gaming knows that what Project Genie offers (essentially empty worlds with minimal interactivity that you can just kind of wander around in, and they struggle with simple things like object permanence if you look away) knows that this isn't real competition for actual games, but the markets reacted anyways.
I've always seen the stock market as a mix of mass hysteria and pyramid scheme. With actual value underlying it of course, but actual stock values are frequently irrational.
> In other words, there's not a single answer that will answer this in a satisfying way.
There could be one, but it would be a book-sized answer (and probably a Tolkien one, if not more).
Every conflict is multi-faceted and happened for a variety of reason, some mattering more than other. Any conflict involving the middle east and you have to go back almost 80-years of history to really provide a satisfying answer. Control of world oil supply, trades with China, opportunistic war to appease local voter pool, diversion from problematic affairs, diplomacy with Israel (which as it own thousand fold reasons for this war), Iran being left weak after losing most of their local allied militia, internal uprising due to a economical crisis caused in part to the removal of the agreement on nuclear and the trade ban that followed ... They all probably play a part.
A lot of food production worldwide is used by meat production, which is quite inefficient. It does generate some useful side product (manure), but also a lot of bad side product.
In some places, almost every field is dedicated to meat production.
Consuming less meat and shifting food production away from meat would be very good for the environment and instantly solve the issue of the amount of calorie produce.
But as you pointed out, this is not the actual issue. Getting food to people who need it is almost entirely a political and logistical issue at this point. War (especially civil war), natural disaster, with local power stealing international aid, etc, are mostly the biggest responsible for hunger in the 21' century.
We have the technology and logistics to accurately drop-ship huge amount of food in even the most remote places in the world, even when the local infrastructure is heavily damaged or inexistent. We cannot deal with local power decision to voluntarily starve a place.
>Consuming less meat and shifting food production away from meat would be very good for the environment and instantly solve the issue of the amount of calorie produce.
The problem with this statement is that it implies all calories are equal in terms of nutrition. Meat is very protein dense compared to most plant foods and that can be important. That’s not to say it’s impossible to live healthily on only plants, but it’s not as simple as swapping calorie sources.
Fun fact, some plant like Bulgur or Lentil are almost as calorie dense as some meat. But to my understanding, they lack “complex” protein or something ? Regardless, your don't have to cut meat entirely. The issue is that we consuming way too much of it. In many developed country, eating meat every day is very common. Eating meat once or twice a week is enough to get all the right nutrient and not having deficiency in things like B12.
They lack all the essential amino acids, but you can easily circumvent that by combining sources. People have been doing so with combinations like rice and beans for generations. But the question is whether the calories cited come from enough variety to meet those nutritional needs. Again, all calories aren’t created equal.
I don’t disagree that western societies probably eat too much meat. But that is the trend of any burgeoning middle class, and it’s doubtful it will change.
To be fair, it would be very hard to argue against this website since it stays very vague.
For most things it says that they are “impossible” or “near-impossible” with no explanation or just "getting a permit is hard" with no futher detail.
It does give some cherry-picked metrics :
- 0 Semiconductor fabs built in CA in the last decade => as there been ANY semi fabs built outside of taiwan and china in the last decade ? Not exactly surprising.
- 1 West Coast shipyard that can build destroyers, 0 New automotive paint shops permitted in CA, 0 New oil refineries permitted in CA since 1969 => We don't build those for shits and giggles, is there any demand that would justify new factories for thoses ?
Basically, the website doesn't say anything. It just gives some context-less data and one guys opinion on what he perceives as not possible.
Not that I care, I am not from the US or live there, but let's not try to pass some dude rambling as a source of actual information.
The vagueness is really the crux of this whole thing. It makes it easy to argue about without really going anywhere. One can easily mold their own worldview around the points and make it about whatever they want.
Intel built a bunch of chip fabs in Oregon, Arizona, Israel, and Ireland over the past couple decades.[1] TSMC has built a new fab in Arizona.[2]
It's difficult to transport petroleum over the rocky mountains, and California requires its own blend of gasoline for use in vehicles, so there is significant demand for oil refineries in the state. Fuel imports have increased significantly due to refinery closures.[3] Some companies are trying to build pipelines to connect the west coast to refineries in Texas, but it's unclear when or if that will happen.[4]
Knowledge ? For b2c it might be more difficult, but in b2b, understanding your customer and their specifics issue and developing something made for them is one of the big challenge. Being able to spit out code for free is useless if you don't know what and who you are making the code for.
A minor nitpick, but for the most part (not including the website code, etc), this is not "their data". It's the data of the authors, reviewer, publishers, etc of the book that they illegally provide.
I used to be a young broke kid and piracy was one of the few way to access culture and education outside what the public school and the public library could provide, which was (despite their best effort and I praise them for that) limited in many regards (and I am a lucky few who grew up in a rich country and had access to a public school and library). So I won't argue that piracy is the evilest of evil or something.
But let's not forget that if author cannot live of what they create, they, for the most part, won't be able to continue creating.
reply