I find this problem quite difficult to solve: 1. If I as a human request a websi...

itsdesmond · 2025-08-04T15:08:42 1754320122

Some stores do not welcome Instacart or Postmates shoppers. You can shop there. You can shop with your phone out, scanning every item to price match, something that some bookstores frown on, for example. Third party services cannot send employees to index their inventory, nor can they be dispatched to pick up an item you order online.

Their reasons vary. Some don’t want their businesses perception of quality to be taken out of their control (delivering cold food, marking up items, poor substitutions). Some would prefer their staff service and build relationships with customers directly, instead of disinterested and frequently quite demanding runners. Some just straight up disagree with the practice of third party delivery.

I think that it’s pretty unambiguously reasonable to choose to not allow an unrelated business to operate inside of your physical storefront. I also think that maps onto digital services.

rjbwork · 2025-08-04T15:21:02 1754320862

But I can send my personal shopper and you'll be none the wiser.

Polizeiposaune · 2025-08-04T15:45:01 1754322301

To stretch the analogy to the breaking point: If you send 10,000 personal shoppers all at once to the same store just to check prices, the store's going to be rightfully annoyed that they aren't making sales because legit buyers can't get in.

hombre_fatal · 2025-08-04T16:05:16 1754323516

Your comment and the above comment of course show different cases.

An agent making a request on the explicit behalf of someone else is probably something most of us agree is reasonable. "What are the current stories on Hacker News?" -- the agent is just doing the same request to the same website that I would have done anyways.

But the sort of non-explicit just-in-case crawling that Perplexity might do for a general question where it crawls 4-6 sources isn't as easy to defend. "Are polar bears always white?" -- Now it's making requests I wouldn't have necessarily made, and it could even been seen as a sort of amplification attack.

That said, TFA's example is where they register secretexample.com and then ask Perplexity "what is secretexample.com about?" and Perplexity sends a request to answer the question, so that's an example of the first case, not the second.

bayindirh · 2025-08-04T16:11:18 1754323878

As a person who has a couple of sites out there, and witnesses AI crawlers coming and fetching pages from these sites, I have a question:

What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?

Pinky promises? Ethics? Laws? Technical limitations? Leeroy Jenkins?

Aeolun · 2025-08-04T23:26:25 1754349985

> What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?

What prevents anyone else? robots.txt is a request, not an access policy.

utbabya · 2025-08-05T01:17:28 1754356648

This honor system mostly worked at scale because interests align, which seems to be no longer the case.

Does information no longer wants to be free now? Maybe internet, just like social media was just a social experiment at the end, albeit a successful one. Thanks GenAI.

egypturnash · 2025-08-05T03:23:03 1754364183

“Information Wants To Be Free. Information also wants to be expensive. ...That tension will not go away.” - the full aphorism

https://en.wikipedia.org/wiki/Information_wants_to_be_free

windexh8er · 2025-08-05T03:06:07 1754363167

Can the Terms of Service of individual content creators leverage a "death of a thousand cuts" model to produce a legal honeypot which would require organizations like Perplexity to be bound up in 10s of thousands of conciliation court cases?

Big Tech has hidden behind ToS for years. Now, it seems as though it only works for them, but not against. It seems as though this would be easy to orchestrate and prove forcing these companies into a legal nightmare or risk insolvent business stature due to the high load of cases filed against.

Why couldn't something like this be used to flip the table? A conciliation brigading, of sorts.

Eisenstein · 2025-08-05T07:31:58 1754379118

Because lawyers are expensive and big tech companies have lots of them. Because it takes a ton of time and effort to sue someone. Because you need to show standing, which means you need to be able to demonstrate you lost something of value by their actions. Because the power imbalance is heavily weighted towards a corporation. Because the way to deal with such things should be legislation and not court decisions. And lots more reasons...

windexh8er · 2025-08-05T11:32:45 1754393565

That's exactly why I said conciliation court. None of what you've outlined is required nor is it expensive. But, for each case, the defendant is still required to show up.

I've successfully used conciliation court against large corporations in the past which is why I question it here.

And while this should be able to be handled via legislation it won't be. Beyond that a workaround could force that to happen.

Eisenstein · 2025-08-05T16:06:53 1754410013

> conciliation court

Sorry, I had never heard that term before. You would still have to show standing though. How would you try to prove that their violating your TOS cost you money?

windexh8er · 2025-08-06T14:50:00 1754491800

Is it not viable to produce a work of art and say that this is free for humans, but not for bots and cannot be used for training and said violation cost X?

Again, I can't copy and distribute a game Microsoft rents to me. But if I do I can be found held accountable for a ridiculous amount of money. If it's my work of art the terms can dictate who doesn't need to pay and who does. If an LLM is consuming my work of art and now distributing it within their user base how is that not the same?

Eisenstein · 2025-08-06T23:47:59 1754524079

These are arguments you would tell the judge. And the judge would almost certainly tell you 'this is the wrong venue for that. You are in small claims. I need an itemized list of monetary damages you have suffered before I can make a judgement.'

BrenBarn · 2025-08-06T06:03:44 1754460224

Maybe you could say the increase in traffic increased your hosting costs by a penny or whatever.

accrual · 2025-08-04T18:21:32 1754331692

Thanks for sharing your experience. A little off-topic but I'd like to start hosting some personal content, guides/tutorials, etc.

Do you still see authentic human traffic on your domains, is it easy to discern?

I feel like I missed the bus on running a blog pre-AI.

bayindirh · 2025-08-05T14:47:05 1754405225

I intentionally doesn't keep detailed analytics on my homepage server and my digital garden, because I respect my users and don't want to push unnecessary Javascript on them. The blog platform I use (Mataroa) keeps rudimentary analytics (essentially page hit counters, nothing more) on index, RSS and per post.

Both my blog homepage and posts see mostly human traffic. Sometimes bots crawl the site and they appear as spikes in the analytics.

Looks like my homepage which doesn't have anything but links is pretty popular with crawlers. My digital garden doesn't get much interest from them. All in all, human traffic on my sites are pretty much alive.

I don't believe in missing the bus in anything actually, because I don't write these for others, first. Both my blog (more meta) and digital garden (more technical) are written for myself primarily, and left open. I post links to both when it's appropriate, but they are not made to be popular. If people read it and learn something or solve one of their problems, that's enough for me.

This is why my software is GPLv3, Digital Garden is GFDL and blog is CC BY-NC-SA 2.0. This is why everything is running with absolutely minimum analytics and without any ads whatsoever.

Lastly, this is why I don't want AI crawlers in my site and my data in the models. This thing is made by a human for humans, absolutely for free. It's not OK somebody to sell something designed to be free and make money over it.

accrual · 2025-08-05T16:30:39 1754411439

> I intentionally doesn't keep detailed analytics on my homepage server and my digital garden, because I respect my users and don't want to push unnecessary Javascript on them.

Absolutely, I'm in agreement here. I want to run a JS-free blog, just plain old static HTML. I plan to use GoAccess to parse the access logs but that's it. I think I would find it encouraging to see real human traffic.

> I don't write these for others, first. Both my blog (more meta) and digital garden (more technical) are written for myself primarily, and left open.

That is a great way to view it, thank you.

bayindirh · 2025-08-05T17:07:32 1754413652

> That is a great way to view it, thank you.

You're welcome. I'm glad it helped.

> I want to run a JS-free blog, just plain old static HTML.

If you want to start fast until you find a template you want to work with, I can recommend Mataroa [0]. The blog have almost no JS (it binds a couple of keys for navigation, that's it), and it's $10/year. When you feel right in your self-hosted solution, you can move there. It's all Markdown at the end of the day.

> I plan to use GoAccess to parse the access logs but that's it.

That's the only thing I use, too. Nothing else.

If you want to look at what I do, how I do, and reach out to me, the rabbit hole starts from my profile, here.

Wish you all the best, and you may find bliss and joy you never dreamed of!

[0]: https://www.mataroa.blog

kldg · 2025-08-05T00:44:47 1754354687

if you do analytics, it is not so hard, but then you need to store user data (if not directly, then worse, with a third party), which should be viewed as a liability. I see ~2/3 human traffic, ~1/3 bot traffic (I just parse user agent strings and count whitelisted browsers as human), but my main landing page is all dynamic-populated webgl. I just asked Gemini what it sees on website, and it states "The page appears to be loading, with the text "Loading room data...".[1] There are also labels for "BG", "FG", and "CURSOR", and a background weather animation." -so I can be feel reasonably confident I don't need to worry about AI, for now; it needs a machine-friendly frontend.

you could go proper insanomode, too. remaking The Internet is trivial if you don't care about existing web standards -- replacing HTTP with your own TCP implementation, getting off html/js/css, etc. being greenfield, you can control the protocol, server, and client implementation, and put it in whatever language you want. I made a stateful Internet implementation in Python earlier for proof-of-concept, but I want to port it and expand on it in rust soon (just for fun; I don't do serious biznos). you'll very likely have 100% human traffic then, even if you're the only person curious and trusting enough to run your client.

71bw · 2025-08-05T06:36:51 1754375811

  > I made a stateful Internet implementation in Python earlier for proof-of-concept

Is there a repo or some other form of public access? I'd like to see this.

kldg · 2025-08-05T14:51:29 1754405489

it's not in a shareable state; is unsafe as-is. can share general idea and sample "webpage" files, though.

the server ("lodge") passes JSON to the client from what are called .branch files. the client receives JSON, parses it, then builds the UI and state representation from the JSON, then stored in that client's memory (self.current_doc and self.page_state in python client).

branches can invoke waterwheel (.ww) files hosted on the lodge. waterwheel files on the lodge contain scripts which define how patches (as JSON) are to be sent to the client. the client updates its state based on the JSON patch it receives. sample .branch and .ww from python implementation (in pastebin so to not make everyone have to scroll through this): https://pastebin.com/A0DEZDmR

71bw · 2025-08-06T06:43:19 1754462599

I was right to ask, this seems extremely cool. Hit me up via mail [in bio] if you ever end up polishing it enough to share.

1024core · 2025-08-05T01:55:01 1754358901

It's your server. You're free to do whatever you want. You can serve different versions of the page depending on the UserAgent (has been done many times before).

You can put up a paywall depending on UserAgent or OS (has been done).

In short, it's a 2-way street: the client on the other end of the TCP pipe makes a request, and your server fulfills the request as it sees fit.

tempfile · 2025-08-04T16:38:20 1754325500

The way to prevent people from downloading your pages and using them is to take them off the public internet. There are laws to prevent people from violating your copyright or from preventing access to your service (by excessive traffic). But there is (thankfully) no magical right that stops people from reading your content and describing it.

bayindirh · 2025-08-04T16:46:12 1754325972

Many site operators want people to access their content, but prevent AI companies from scraping their sites for training data. People who think like that made tools like Anubis, and it works.

I also want to keep this distinction on the sites I own. I also use licenses to signal that this site is not good to use for AI training, because it's CC BY-NC-SA-2.0.

So, I license my content appropriately (No derivative, Non-commercial, shareable with the same license with attribution), add technical countermeasures on top, because companies doesn't respect these licenses (because monies), and circumvent these mechanisms (because monies), and I'm the one to suck this up and shut-up (because their monies)?

Makes no sense whatsoever.

zzo38computer · 2025-08-05T02:43:38 1754361818

I don't want AI companies to scrape my sites (or use the files I wrote) for training data either, but that is not specifically what I am trying to stop (unless the files are supposed to be private and unpublished). I should not stop them from using the files for what they want, once they have them. (I also specifically do not want to block use of lynx, curl, Dillo, etc.)

What I want to stop is excessive crawling and scraping of my server. Once they have the file they can do what they want with it. Another comment (44786237) mentions that robots.txt is only for restricting recursive access; I agree and that is what should be blocked. They also should not access the same file several times quickly even though it should be unnecessary to do so, just as much as they should not access all of the files. (If someone wants to make a mirror of the files, there may be other ways, e.g. in case there is a archive file available to download many at once (possibly, in case if the site operator made their own index and then did it this way). If it is a git repository, then it can be cloned.)

tempfile · 2025-08-04T20:47:39 1754340459

Of course some people want that. And at the moment they can prevent it. But those methods may stop working. Will it then be alright to do it? Of course not, so why bother mentioning that they are able to prevent it now - just give a justification.

Your license is probably not relevant. I can go to the cinema and watch a movie, then come on this website and describe the whole plot. That isn't copyright infringement. Even if I told it to the whole world, it wouldn't be copyright infringement. Probably the movie seller would prefer it if I didn't tell anyone. Why should I care?

I actually agree that AI companies are generally bad and should be stopped - because they use an exorbitant amount of bandwidth and harm the services for other users. At least they should be heavily taxed. I don't even begrudge people for using Anubis, at least in some cases. But it is wrong-headed (and actually wrong in fact) to try to say someone may or may not use my content for some purpose because it hurts my feelings or it messes with my ad revenue. We have laws against copyright infringement, and to prevent service disruption. We should not have laws that say, yes you can read my site but no you can't use it to train an LLM, or to build a search index. That would be unethical. Call for a windfall tax if they piss you off so much.

bayindirh · 2025-08-05T15:09:37 1754406577

> I can go to the cinema and watch a movie, then come on this website and describe the whole plot. That isn't copyright infringement.

This is a false analogy. A correct one would be going to a 1000 movies and creating the 1001th movie with scenes cropped from these 1000 movies and assemble it as a new movie, and this is copyright infringement. I don't think any of the studios would applaud and support you for your creativity.

> But it is wrong-headed (and actually wrong in fact) to try to say someone may or may not use my content for some purpose because it hurts my feelings or it messes with my ad revenue.

Why does it have to be always about money? Personally it's not. I just don't want my work to be abused and sold to people to benefit a third party without my consent and will (and all my work is licensed appropriately for that).

> We should not have laws that say, yes you can read my site but no you can't use it to train an LLM, or to build a search index.

This goes both ways. If big corporations can scrape my material without asking me and resell it as an output of a model, I can equally distill their models further and sell it as my own. If companies can scrape my pages to sell my content as theirs, I can scrape theirs and unpaywall them.

But that will be copyright infringement, just because they have more money. What angers me is "all is fair game because you're a small fish, and this is a capitalist marketplace" mentality.

If companies can paywall their content to humans that don't pay, I can paywall AI companies and demand money or push them out of my lawn, just because I feel like that. The inverse is very unethical, but very capitalist, yes.

It's not always about money.

P.S.: Oh, try to claim that you can train a model with medical data without any clearance because it'd be unethical to have laws limiting this. It'll be fun. Believe me.

tempfile · 2025-08-05T17:17:28 1754414248

> This is a false analogy.

I think you are describing something much more like stable diffusion. This article is about Perplexity, which is much closer to "watch a movie and tell me the plot" than it is like "take these 1000 movies and make a collage". The copyright points are different - stable diffusion are on much shakier ground than perplexity.

> Why does it have to be always about money?

Before I mentioned money I said "because it hurts my feelings". I'm sorry I can't give a more charitable interpretation, but I really do see this kind of objection as "I don't want you to have access to this web page because I don't like LLMs". This is not a principled objection, it is just "I don't like you, go away". I don't think this is a good principle to build the web on.

Obviously you can make your website private, if you want, and that would be a shame. But you can't have this kind of pick-and-choose "public when you feel like" option. By the way I did not mention, but I am ok with people using Anubis and the like as a compromise while the situation remains unjust. But the justification is very important.

> If companies can scrape my pages to sell my content as theirs, I can scrape theirs and unpaywall them.

This is probably not a gambit you want to make. You literally can do this, and they would probably like it if you did. You don't want to do that, because the output of LLMs is usually not that good.

In fact, LLM companies should probably be taxed, and the taxes used to fund real human AI-free creations. This will probably not happen, but I am used to disappointment.

> P.S.: Oh, try to claim that you can train a model with medical data

Medical data is not public, for good reasons.

account42 · 2025-08-05T12:31:29 1754397089

> Many site operators want people to access their content, but prevent AI companies from scraping their sites for training data.

That is unfortunately not a distinction that is currently legally enforceable. Until that changes all other "solutions" are pointless and only cause more harm.

> People who think like that made tools like Anubis, and it works.

It works to get real humans like myself to stop visiting your site while scrapers will have people whose entire job is to work around such "protections". Just like traditional DRM inconveniences honest customers and not pirates. And to be clear, what you are advocating for is DRM.

> I also want to keep this distinction on the sites I own. I also use licenses to signal that this site is not good to use for AI training, because it's CC BY-NC-SA-2.0.

If AI crawlers cared about that we wouldn't be talking about this issue. A license and only give more permissions than there are without one.

bayindirh · 2025-08-05T14:58:58 1754405938

> It works to get real humans like myself to stop visiting your site

If we talk about Anubis, it's pretty invisible. You wait a couple of seconds in the first visit, and don't get challenged for a couple of weeks, at least. With more tuning some of the sites using Anubis work perfectly well without ever seeing Anubis' wall while stopping AI crawlers.

> And to be clear, what you are advocating for is DRM.

Yes. It's pretty ironic that someone like me who believes in open access prefers a DRM solution to keep companies abusing the small fish, but life is an interesting phenomenon, and these things happen.

> Until that changes all other "solutions" are pointless and only cause more harm.

As an addendum to above paragraph, I'm not happy that I have to insert draconian measures between the user and the information I want to share, but I need a way to signal that I'm not having their ways to these faceless things. What do you propose? Taking my sites offline? Burning myself in front of one of the HQs?

> If AI crawlers cared about that we wouldn't be talking about this issue. A license and only give more permissions than there are without one.

AI crawlers default to "Public Domain" when they find no licenses. Some of my lamest source code repositories made into "The Stack" because I forgot to add COPYING.md. A fork of a GPLv2 tool I wrote some patches also got into "The Stack", because COPYING.md was not in the root folder of the repository. I'd rather add licenses (which I can accept) to things rather than leave them as-is, because AI companies also eagerly grab things without license.

All licenses I use mandate attribution and continuation of license, at least, and my blog doesn't allow any derivations of from what I have written. So you can't ingest it into a model to be derived and remixed with something else.

account42 · 2025-08-05T15:08:29 1754406509

> If we talk about Anubis, it's pretty invisible. You wait a couple of seconds in the first visit, and don't get challenged for a couple of weeks, at least. With more tuning some of the sites using Anubis work perfectly well without ever seeing Anubis' wall while stopping AI crawlers.

It's not invisible, the sites using it don't work perfectly well for all users and it doesn't stop AI crawlers.

bayindirh · 2025-08-05T15:12:30 1754406750

I haven't seen any problems with any Anubis enabled site I encountered. Can you give examples? This is interesting.

fxtentacle · 2025-08-05T18:09:23 1754417363

I've never seen problems with Anubis.

hombre_fatal · 2025-08-04T19:31:55 1754335915

I guess that's a question that might be answered by the NYT vs OpenAI lawsuit at least on the enforceability of copyright claims if you're a corporation like NYT.

If you don't have the funds to sue an AI corp, I'd probably think of a plan B. Maybe poison the data for unauthenticated users. Or embrace the inevitability. Or see the bright side of getting embedded in models as if you're leaving your mark.

miki123211 · 2025-08-05T06:30:30 1754375430

the fact that it would be discovered almost immediately.

If you give them a URL that does not appear in Google, ask them to visit that URL specifically, and then notice the content from that URL in the training data, it's proof that they're doing this, which would be quite damaging to them.

Freak_NL · 2025-08-05T07:22:50 1754378570

> […] it's proof that they're doing this, which would be quite damaging to them.

Is it? It's damning, but is it damaging at all?

I'm not getting the impression that anyone's data being available for training if some bot can get to it is just how things are now, rather than an unsettled point of contention. There's too much money invested in this thing for any other outcome, and with the present decline of the rule of law…

autoexec · 2025-08-05T16:22:23 1754410943

Nothing, and that's why I expect they all do it.

tintor · 2025-08-04T23:05:54 1754348754

technical limitations / data poisoning measures

AuthAuth · 2025-08-04T23:35:25 1754350525

Hacker news wants you to vist the site, look at the main page, enter threads and participate in discussion.

When you swap in an AI and ask what are the current stories. The AI fetches the front page and every thread and feeds it back to you. You are less likely to participate in discussion because you've already had the info summarized.

jychang · 2025-08-05T00:35:28 1754354128

Who cares what Hacker News wants? You’re not obliged to participate in discussion.

Am I supposed to spend money on Amazon.com when I visit the website just because Amazon wants me to?

egypturnash · 2025-08-05T03:29:23 1754364563

If most people quit spending money on Amazon then Amazon stops being worth running.

If most people stop discussing things on HN, and the discussion is indeed one of the major reasons it’s kept running, then HN stops being worth running.

AuthAuth · 2025-08-05T04:34:05 1754368445

Whats the point of a human coming to a site if all the threads and empty and its front page is a glorified RSS feed for lazy peoples AI agents?

AnthonBerg · 2025-08-05T05:12:28 1754370748

Who cares what you want?

danlitt · 2025-08-05T07:15:04 1754378104

Most humans place the desires of human beings over the desires of companies.

lenkite · 2025-08-10T02:59:03 1754794743

Indeed. But that is a false equivalence - this is conflict of desires between small companies and creators and an AI-corp where the AI-corp wants to steal their content and give it to users with their shop branding.

noboostforyou · 2025-08-06T19:59:58 1754510398

> You’re not obliged to participate in discussion.

Are website owners obligated to serve content to AI agents and/or LLM scrapers?

butlike · 2025-08-06T13:05:54 1754485554

It was a corollary example

ithkuil · 2025-08-05T06:24:09 1754375049

Foo news wants you to visit the site, look at the main page, watch the ads, click on them and buy the products advertised by third parties which will give money to Foo news in exchange for this service.

And yet people install ad blockers and defend their freedom to not participate in this because they don't want to be annoyed by ads.

They claim that since they are free to not buy an advertised product, why would they be forced to see ads for it. But Foo news claims that they are also free to not waste bandwidth to serve their free website to people who declare (by using an ad blocker or the modern alternative: AI aummarizera) they won't participate in the funding of the service

skydhash · 2025-08-05T10:40:47 1754390447

It's not ads. We have ads in paper magazines and newspapers and no one went around with scissors to remove them. It's obnoxious ads, designed to violently grabs your attention and trackers (malware). It's like a newspapers giving your address to a whole crew of salemens that intrudes on your property at 3am and looking at you sleeping and installing cameras in your bathroom. All so that they can jump at you in the street to loudly claim they have the underwear you told your partner you like. If you're going to be that invasive about my person, then I'm going to be that forceful about restrictions.

imtringued · 2025-08-05T12:30:06 1754397006

This is one of the dumbest things about ad networks. Google has enough data about your watching habits on Youtube and their algorithm is basically as good as it gets in terms of showing you what you want to watch and getting you hooked on it, but the moment they show you ads, all that technical expertise appears to have vanished into thin air and all they show you is fake mobile ads?

People hate obnoxious ads because the money that pays for them is essentially a bribe to artificially elevate content above its deserved ranking. It feels like you're being manipulated into an unfavorable trade.

Timwi · 2025-08-07T02:48:24 1754534904

> their algorithm is basically as good as it gets in terms of showing you what you want to watch and getting you hooked on it

It is? Are we talking about the same YouTube? I get absolutely useless recommendations, I get un-hooked within a couple videos, and I even keep getting recommendations for the same videos I've literally watched yesterday. Who in the world gets hooked by this??

autoexec · 2025-08-05T16:14:48 1754410488

> We have ads in paper magazines and newspapers and no one went around with scissors to remove them.

I never saw people bother with scissors but I've seen people pulling the ads out of the newspaper countless times.

remus · 2025-08-05T13:34:58 1754400898

> And yet people install ad blockers and defend their freedom to not participate in this because they don't want to be annoyed by ads.

I think this is a pretty different scenario. Here the user and the news website are talking directly to each other, but then the user is making a choice around what to do with the content the news website send to them. With AI agents, there is a company inserting themselves between the user and the news website and acting as a middleman.

It seems reasonable to me that the news website might say they only want to deal with users and not middlemen.

ithkuil · 2025-08-05T13:56:48 1754402208

I understand; but as an excercise to better understand this problem I'll keep doing devil's advocate and I'll raise with:

What if my executive assistant reading the news website and giving me a digest?

Would the website owners rather prefer me doing my reading directly?

fxtentacle · 2025-08-05T18:04:47 1754417087

Yes. Because they want to own your attention and that only works if they are interfacing directly to you.

I remember that Samsung was at one time offering to play non-skippable full-screen apps on their newest 8K OLED TVs and their argument was precisely that these ads will reach those rich people who normally pay extra to avoid getting spammed with ads. Or going with your executive assistant example, there are situations where it makes sense to bribe them to get access to you and/or your data. E.g. "evil maid attack".

trhway · 2025-08-05T02:28:26 1754360906

With all the crypto development how come we haven't got to

  HTTP/1.1 402 Payment Required
  WWW-price: 0.0000001 BTC, 0.000001 ETH, 0.00001 DOGE

> You are less likely to participate in discussion

you (or AI on your behalf) paid instead. Many sites would probably like it better.

autoexec · 2025-08-05T16:20:51 1754410851

If people were forced to pay for websites by the http request people would demand that websites stop loading a ton of externally hosted JS, stop filling sites with ads, and would demand that websites actually have content worth the price.

There are so many links I click on these days that are such trash I'd be demanding refunds constantly.

trhway · 2025-08-05T18:20:47 1754418047

>There are so many links I click on these days that are such trash

That is why AI "summarization" becomes a necessary intermediate layer. You'd not see nor trash nor ads, and thus the payment instead of being exposed to the ads. AI saves the Internet :)

dns_snek · 2025-08-05T06:53:22 1754376802

It's not a development problem, it's an adoption problem. Publishers are desperate to sell us on a $20+/month subscription, they don't want to offer convenient affordable access to single articles.

skydhash · 2025-08-05T10:48:31 1754390911

$20/month would be nice if it wasn't a tier with less ads. I want no ads, and full-text rss feeds (because I want to use my clients to read). It's like how Netflix refuses to build a basic search and filter, or Spotify refuses to an actual library manager. They don't want you in control of your consumption.

cellis · 2025-08-05T00:41:06 1754354466

Easy "By Appointment only" or "rate limited to authenticated users" done.

p3rls · 2025-08-05T02:13:40 1754360020

That is not the breaking point at all of the analogy-- that literally happens to my custom CMS/wiki/image host I built for my niche, kpopping.com. We are constantly attacked by crawlers. Meanwhile google rewards wordpress slop that buys backlinks with #1 pageranks for years. Welcome to the internet.

sublinear · 2025-08-04T15:52:15 1754322735

Too bad. Build a bigger store or publish this information so we don't need 10,000 personal shoppers. Was this not the whole point of having a website? Who distorted that simple idea into the garbage websites we have now?

recursive · 2025-08-04T15:54:43 1754322883

Weird take. The store doesn't owe your personal shippers anything.

drdaeman · 2025-08-04T21:26:23 1754342783

That's fair, but if there's enough of supply and demand for this to get traction (and online shopping is bug, and autonomous agents are sort of trending), this conflict of interest paired with a no-compromise "we don't own you anything" attitude is bound to escalate in an arms race. And YMMV but I don't like where that race may possibly end.

If store businesses at least partially relies on obscurity of information that can be solved through automated means (e.g. storefronts tend to push visitors towards products they don't want, and buyer agents are fighting that and looking for something buyers instructed them) just playing this cat and mouse game of blocking agents, finding workarounds, and repeating the cycle is only creating perverse technological contraptions that neither party is really interested in - but both are circumstantially forced to invest into.

the_real_cher · 2025-08-04T16:08:29 1754323709

In the same token the personal shoppers don't owe the store anything either.

eddythompson80 · 2025-08-04T16:36:26 1754325386

Surely they owe them money for the goods and service, no? I thought that's how stores worked.

the_real_cher · 2025-08-04T18:12:55 1754331175

Context friend. This article and entire comments sections is about questionable web page access. Context.

eddythompson80 · 2025-08-04T18:51:34 1754333494

You're replying in a store metaphor thread though. Context matters.

recursive · 2025-08-04T17:04:15 1754327055

Then they can't complain if they're barred entry.

the_real_cher · 2025-08-04T18:22:03 1754331723

http is neutral. it's up to the client to ignore robots.txt

You can block IP's at the host level but there's pretty easy ways around that with proxy networks.

eddythompson80 · 2025-08-04T19:50:38 1754337038

> http is neutral.

Who misled you with that statement?

the_real_cher · 2025-08-04T21:29:51 1754342991

Http doesnt have emotions or thought last time I checked.

eddythompson80 · 2025-08-04T21:40:09 1754343609

It seems that a 403 makes you sad though.

the_real_cher · 2025-08-04T22:47:54 1754347674

iproyal.com makes me smile again

eddythompson80 · 2025-08-04T23:01:20 1754348480

And Cloudflare makes you cry. See, it's not neutral. Glad you learned something today. The more one learns everyday, the less stupid you become.

drdaeman · 2025-08-04T21:31:43 1754343103

IETF?

dabockster · 2025-08-04T16:51:46 1754326306

> Who distorted that simple idea into the garbage websites we have now?

Corporate America. Where clean code goes to die.

bradleyjg · 2025-08-04T15:28:58 1754321338

It’s possible to violate all sorts of social norms. Societies that celebrate people that do so are on the far opposite end of the spectrum from high trust ones. They are rather unpleasant.

ToucanLoucan · 2025-08-04T15:41:44 1754322104

Just the Silicon Valley ethos extended to it's logical conclusions. These companies take advantage of public space, utilities and goodwill at industrial scale to "move fast and break things" and then everyone else has to deal with the ensuing consequences. Like how cities are awash in those fucking electric scooters now.

Mind you I'm not saying electric scooters are a bad idea, I have one and I quite enjoy it. I'm saying we didn't need five fucking startups all competing to provide them at the lowest cost possible just for 2/3s of them to end up in fucking landfills when the VC funding ran out.

SoftTalker · 2025-08-04T16:18:40 1754324320

My city impounded them and made them pay a fee to get them back. Now they have to pay a fee every year to be able to operate. Win/win.

account42 · 2025-08-05T12:53:45 1754398425

Do those fees actually improve anything for the citizens who now have to deal with vehicles abandoned on sidewalks everywhere or does it just buy the major a nicer yacht?

pixl97 · 2025-08-04T20:14:38 1754338478

[flagged]

tomhow · 2025-08-05T14:00:23 1754402423

> Oh, this is a bunch of baloney...

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Please don't fulminate. Please don't sneer, including at the rest of the community.

Eschew flamebait. Avoid generic tangents. Omit internet tropes.

Please don't use Hacker News for political or ideological battle. It tramples curiosity.

https://news.ycombinator.com/newsguidelines.html

p3rls · 2025-08-05T02:19:10 1754360350

[flagged]

tomhow · 2025-08-05T14:02:15 1754402535

You can't comment like this on Hacker News, no matter what you're replying to. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

Workaccount2 · 2025-08-04T21:02:53 1754341373

[flagged]

_proofs · 2025-08-05T01:12:04 1754356324

this is such a wild comment -- there are countless products where regardless of purchase -- the user is still served advertisements. i have no idea what reality, or timeline, this comment belongs in.

broadcast television, paid streaming entertainment is just straight up the most glaringly obvious example of a paid service overflowing with advertisements.

paid radio broadcasts (xm/Sirius).

operating systems (windows serves you ads any chance it gets).

monthly subscriptions to gyms where youre constantly hit with ads, marketing, and promotions be it at the gym or via push notification (you got opted into and therefore have to opt out of intentionally after the service is paid).

mobile phones, especially prepaid come LOADED with ads and bloatware.

i mean the list goes on -- you cannot be serious.

Workaccount2 · 2025-08-05T04:15:01 1754367301

> pay for services in full directly

Those are hybrid subscriptions/subsidies. Not paid in full.

If you are being exposed to ads in something you paid for, you are almost certainly being charged less money. Companies can compete on cost by introducing ads, and it's why the cheaper you go, the more ad infested it gets.

Pure ad-free things tend to be much more expensive then their ad subsidized counterparts. Ad subsidized has become so ubiquitous though, that people think that price is the true price.

_proofs · 2025-08-05T16:54:50 1754412890

this seems like semantics and corporate hand-waving -- that's not what is conveyed to the user in what i have observed as the context of paid services and the promises asserted around what a purchase gets a customer.

in the subsidized example, xm/Sirius is marketed to users as an "ad-free paid radio broadcast"; the marketing literally attempts to leverage the notion of it being ad-free as a consequence of your purchase (power) in order to highlight its supposed competitive edge and usefulness, and provide the user an incentive to spend money, except for the fact that the marketing is false. you still get served promotions and ads, just less "conventional" ads.

i go to a football game and im literally inundated with ads -- the whole game has time stoppage dedicated to serving ads. i guess my season ticket purchase with the hopes of seeing football in person is.. apparently not spending enough money?

i see this as attempting to move the goalposts and gaslight users on their purchase expectations, as a way to offload the responsibility and accountability back onto the user -- "you don't pay enough, you only think that you pay enough, so we are still going to serve you ads because <insert financial justification here around the expectations we'e undermined>.

why then is there any expectation of a service being ad-free upon purchasing?

who the hell actually enjoys sitting through 1.5 hours of advertisements and play stoppage?

over time users have been conditioned to just tolerate it, and over time, the advertising reclaims ground it previously gave up one inch at a time in the same way people are price-gouged in those stadiums -- they don't have much alternative, but apparently the problem is the user should fork up more money for tickets so as to align their expectations with reality? while they're getting strong-armed at the concession stand via proximity and circumstance and lack of competition, no less.

are you really trying to tell me the problem there is, they need to make... more money? and THEN and only THEN we can have ad-free, paid for entertainment otherwise known as american football? is this really about user expectations, or is this about companies wanting their cake and eating it, too?

sublinear · 2025-08-04T15:57:03 1754323023

[flagged]

arrowsmith · 2025-08-04T16:20:09 1754324409

Go spend some time in Brazil or South Africa or other places where no-one trusts anyone (for good reasons), then report back.

bradleyjg · 2025-08-04T15:59:25 1754323165

A place where you can lose you wallet and get it back with all the cash inside.

The horror!!

sublinear · 2025-08-04T16:01:54 1754323314

[flagged]

arrowsmith · 2025-08-04T16:21:16 1754324476

No, you're describing a low-trust society.

Please learn what words mean before you comment on them.

sublinear · 2025-08-04T16:27:24 1754324844

[flagged]

Imustaskforhelp · 2025-08-04T16:40:49 1754325649

Isn't that the system that we are already living in?

Democracy in its american form or even at many others show almost complete paralysis of the entire system basically if bad actors infiltrate it (Looking at ya donald)

It is honestly a little sad since conservatives usually think of their society as this high trust society and they were the ones who primarily voted and are being taken advantaged of by the few untrustworthy individuals.

Politics is a cult/religion and you can't prove me otherwise.

I vote because I vote for lesser evil not for greater good. I do think that frankly, both the parties or just most parties in every nation are just so short of reality but I created a discord server of 100 people and I can see how I can't manage 100 people and so maybe I expect so much from the govt.

I used to focus so much on history and politics but its bloody mess and there is no good or bad. Now I just feel like going into the woods and into the darks living alone, maybe coding.

Workaccount2 · 2025-08-04T21:05:21 1754341521

Let me guess: A low violence society is bad because people get attacked and beat up?

sensanaty · 2025-08-04T17:30:28 1754328628

That's quite literally the opposite of what high trust means...

fireflash38 · 2025-08-04T16:16:48 1754324208

That's a very sad and lonely way to live.

sublinear · 2025-08-04T16:18:25 1754324305

I don't think we're talking about the same thing.

account42 · 2025-08-05T12:56:52 1754398612

Obviously. You should heed the advice of other posters who told you to look up the meaning of the word.

immibis · 2025-08-04T17:59:39 1754330379

[flagged]

Ray20 · 2025-08-04T19:58:16 1754337496

> High trust is prima facie incompatible with capitalism

Quite compatible

> If you want a high trust society, you don't want capitalism.

There is nothing at all in capitalism that would prevent a high level of trust in society.

> Capitalism is inherently low trust

But that's not true. The thing about capitalism is that it's RESILENT to low trust. It does not require low levels of trust, but is capable of functioning in such conditions.

> If the penalty for deceit was greater than the penalty for non-deceit

Who are the judges? Capitalism is the most resistant to deception, deceivers under capitalism receive fewer benefits than under any other economic system. Simply because capitalism is based on the premise that people cheat, act out of greed, try to get the most for themselves at the expense of others. These qualities exist in people regardless of the existence of capitalism, it is just that capitalism ensures prosperity in society even when people have these qualities.

immibis · 2025-08-04T21:25:07 1754342707

https://theonion.com/this-war-will-destabilize-the-entire-mi...

sublinear · 2025-08-04T18:22:57 1754331777

Why bring up capitalism? I don't get it. What's stopping people from lying and cheating under any other system?

dgshsg · 2025-08-04T19:37:40 1754336260

When lying and cheating doesn't get you ahead, there is no reason to do it.

Workaccount2 · 2025-08-04T21:20:56 1754342456

If we look at any communist society, the only way to get ahead was lying and cheating. China was forced to adopt capitalist markets to deal with this, hence why modern China hardly resembles the USSR, Cuba, Venezuela, or Laos.

immibis · 2025-08-05T11:19:33 1754392773

Communist with a capital C.

I've never seen a stateless, classless, moneyless society. It may be impossible.

ghurtado · 2025-08-04T21:13:33 1754342013

You seriously think that mankind wasn't lying and cheating long before inventing capitalism?

dgshsg · 2025-08-04T21:45:16 1754343916

Sure, but the risk/reward ratio was different.

Ray20 · 2025-08-04T20:04:17 1754337857

The problem is that without capitalism ONLY lying and cheating will get you ahead. Look at ANY country that builds its economy on the restriction of people's economic freedom, on the absence of private property rights - these are the most deceitful and disgusting regimes in the world with zero level of public trust.

rapind · 2025-08-04T15:33:05 1754321585

It's all about scale. The impact of your personal shopper is insignificant unless you manage to scale it up into a business where everyone has a personal shopper by default.

nickthegreek · 2025-08-04T15:59:06 1754323146

How is everyone having a personal shopper a problem of scale? I was going to shop myself, but I sent someone else to do it for me.

At this moment I am using Perplexity's Comet browser to take a spotify playlist and add all the tracks to my youtube music playlist. I love it.

SoftTalker · 2025-08-04T16:12:57 1754323977

We'll see more of this sort of thing as AI agents become more popular and capable. They will do things that the site or app should be able to do (or rather, things that users want to be able to do) but don't offer. The YouTube music playlist is a good example. One thing I'd like to be able to do is make a playlist of some specific artists. But you can't. You have to select specific songs.

If sites want to avoid people using agents, they should offer the functionality that people are using the agents to accomplish.

dylan604 · 2025-08-04T19:33:10 1754335990

Let's look at the opposite benefit to a store if a mom that would need to bring her 3 kids to the store vs that mom having a personal shopper. In this case, the personal shopper is "better" for the store as far as physical space. However, I'm sure the store would still rather have the mom and 3 kids physically in the store so that the kids can nag mom into buying unneeded items that are placed specifically to attract those kids' attention.

pixl97 · 2025-08-04T19:44:14 1754336654

>o that the kids can nag mom into buying unneeded items

Excellent. Personal shoppers are 'adblock for IRL'.

>You owe the companies nothing. You especially don't owe them any courtesy. They have re-arranged the world to put themselves in front of you. They never asked for your permission, don't even start asking for theirs.

rapind · 2025-08-04T16:10:02 1754323802

I didn't use the word "problem". In fact I presented no opinion at all. I'm just pointing out that scale matters a lot. In fact, in tech, it's often the only thing that matters. It's naive (or narrative) to think it doesn't.

Everyone having a personal shopper obviously changes the relationship to the products and services you use or purchase via personal shopper. Good, bad, whatever.

mbrumlow · 2025-08-04T15:41:38 1754322098

Well then. Seems like you would be a fool to not allow personal shoppers then.

The point is the web is changing, and people use a different type of browser now. Ans that browser happens to be LLMs.

Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.

Grow up cloud flare, some is your business models don’t make sense any more.

goatlover · 2025-08-04T16:00:29 1754323229

Some people use LLMs to search. Other people still prefer going to the actual websites. I'm not going to use an LLM to give me a list of the latest HN posts or NY Times articles, for example.

ToucanLoucan · 2025-08-04T15:46:53 1754322413

> Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.

You say this as though all LLM/otherwise automated traffic is for the purposes of fulfilling a request made by a user 100% of the time which is just flatly on-its-face untrue.

Companies make vast amounts of requests for indexing purposes. That could be to facilitate user requests someday, perhaps, but it is not today and not why it's happening. And worse still, LLMs introduce a new third option: that it's not for indexing or for later linking but is instead either for training the language model itself, or for the model to ingest and regurgitate later on with no attribution, with the added fun that it might just make some shit up about whatever you said and be wrong. And as the person buying the web hosting, all of that is subsidized by me.

"The web is changing" does not mean every website must follow suit. Since I built my blog about 2 internet eternities ago, I have seen fad tech come and fad tech go. My blog remains more or less exactly what it was 2 decades ago, with more content and a better stylesheet. I have requested in my robots.txt that my content not be used for LLM training, and I fully expect that to be ignored because tech bros don't respect anyone, even fellow tech bros, when it means they have to change their behavior.

Imustaskforhelp · 2025-08-04T16:30:14 1754325014

Tech bros just respect money. Making money is very easy in the short term if you don't show ethics. Venture capitalism and the whole growth/indie hacking is focused around making money and making it fast.

Its a clear road for disaster. I am honestly surprised by how great Hackernews is, to that comparison where most people are sharing it for the love of the craft as an example. And for that hackernews holds a special place in my heart. (Slightly exaggerating to give it a thematic ending I suppose)

julkali · 2025-08-04T15:48:02 1754322482

Do not conflate your own experience with everyone else's.

tom_m · 2025-08-05T03:33:20 1754364800

Perplexity isn't your personal anything. It's a service just like Postmates and Uber. You want a personal shopper equivalent? You're going to pay more money. It won't say perplexity all over it.

dataflow · 2025-08-05T14:46:18 1754405178

> But I can send my personal shopper and you'll be none the wiser.

They will be quite the wiser if they track/limit how often your shopper enters the store. You probably aren't entering the same store fifteen times every day and neither would be your shopper if they were only doing it on your behalf.

542354234235 · 2025-08-04T15:42:09 1754322129

True, and I would ask, what is your point? Is it that no rule can have 100% perfect enforcement? That all rules have a grey area if you look close enough? Was it just a "gotcha" statement meant to insinuate what the prior commenter said was invalid?

amelius · 2025-08-05T09:00:33 1754384433

But the store owner can ask the personal shopper to leave, if e.g. they find out that they work for a personal shopper service.

account42 · 2025-08-05T13:03:00 1754398980

What the article is advocating for is hiring bouncers that strip all shoppers so they can do just that.

fireflash38 · 2025-08-04T16:14:55 1754324095

And you can be trespassed and prosecuted if you continue to violate.

ghurtado · 2025-08-04T16:06:09 1754323569

Sure. There's lots of things you could do, but you don't do them because they are wrong.

Might does not make right.

rjbwork · 2025-08-04T20:51:10 1754340670

How is it wrong to send my personal shopper? How is it wrong to have an agent act directly on my behalf?

It's like saying a web browser that is customized in any way is wrong. If one configures their browser to eagerly load links so that their next click is instant, is that now wrong?

ghurtado · 2025-08-04T21:17:37 1754342257

Here's a good rule of thumb: if you have to do it without other people knowing, because otherwise they wouldn't let you do it: chances are it's a bad thing to do.

_proofs · 2025-08-05T01:14:20 1754356460

if you send your personal shopper to a store, and the business is... closed for business, or refusing you entry, and you just... go in anyway.

that's called breaking and entering, and generally frowned upon -- by-passing the "closed sign".

itsdesmond · 2025-08-04T15:29:29 1754321369

[flagged]

dang · 2025-08-04T19:14:06 1754334846

Whoa, please don't post like this. We end up banning accounts that do.

https://news.ycombinator.com/newsguidelines.html

itsdesmond · 2025-08-04T22:52:28 1754347948

Aw, alright. I thought it was a funny way to make the point and I figured the yo momma structure was traditional enough to not be taken as a proper insult. Heard tho.

dang · 2025-08-06T06:10:21 1754460621

Thanks for this. Now that you explain your intent, I see the joke. Unfortunately, it's too easy for the intent not to come across in these forsaken little text blobs that we're all limited to here. A lot of it boils down to the absence of voice tone and body language.

indymike · 2025-08-05T12:22:58 1754396578

> I think that it’s pretty unambiguously reasonable to choose to not allow an unrelated business to operate inside of your physical storefront. I also think that maps onto digital services.

The line is drawn for me on my own computer. Even if I am in your building, my phone remains mine.

ugh123 · 2025-08-05T05:45:49 1754372749

What if my local ai model and system crawls, indexes and trains itself on content that only I can see and work with?

cma · 2025-08-04T20:30:47 1754339447

These are more like a store putting up a billboard or catalog and asking people to turn off their meta AI glasses nearby because the store doesn't want AI translating it on your behalf as a tourist.

itsdesmond · 2025-08-04T22:55:37 1754348137

It is not because the store does not expend any resources on the singular instance of the glasses capturing the content of the billboard. Web requests cost money.

tokioyoyo · 2025-08-05T16:14:41 1754410481

> Some stores do not welcome Instacart or Postmates shoppers

First time hearing this. Almost every single grocery store either supports Instacart, or has partnership with a similar service.

jasonjmcghee · 2025-08-04T14:43:24 1754318604

I think it's an issue of scale.

The next step in your progression here might be:

If / when people have personal research bots that go and look for answers across a number of sites, requesting many pages much faster than humans do - what's the tipping point? Is personal web crawling ok? What if it gets a bit smarter and tried to anticipate what you'll ask and does a bunch of crawling to gather information regularly to try to stay up to date on things (from your machine)? Or is it when you tip the scale further and do general / mass crawling for many users to consume that it becomes a problem?

fxtentacle · 2025-08-04T14:54:31 1754319271

Maybe we should just institutionalize and explicitly legalize the Internet Archive and Archive Team. Then, I can download a complete and halfway current crawl of domain X from the IA and that way, no additional costs are incurred for domain X.

But of course, most website publishers would hate that. Because they don't want people to access their content, they want people to look at the ads that pay them. That's why to them, the IA crawling their website is akin to stealing. Because it's taking away some of their ad impressions.

palmfacehn · 2025-08-04T15:32:44 1754321564

https://commoncrawl.org/

>Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.

fxtentacle · 2025-08-05T18:14:21 1754417661

The problem is that many websites and domains are missing from it.

stanmancan · 2025-08-04T15:50:19 1754322619

I have mixed feelings on this.

Many websites (especially the bigger ones) are just businesses. They pay people to produce content, hopefully make enough ad revenue to make a profit, and repeat. Anything that reproduces their content and steals their views has a direct effect on their income and their ability to stay in business.

Maybe IA should have a way for websites to register to collect payment for lost views or something. I think it’s negligible now, there are likely no websites losing meaningful revenue from people using IA instead, but it might be a way to get better buy in if it were institutionalized.

like_any_other · 2025-08-04T22:38:59 1754347139

If magazines and newspapers were once able to be funded by native ads, so can websites. The spying industry doesn't want you to know this, but ads work without spying too - just look at all the IRL billboards still around.

jay_kyburz · 2025-08-05T02:13:53 1754360033

Thanks for pointing this out! This is too often ignored!

stanmancan · 2025-08-05T02:02:47 1754359367

I never said anything about spying.

Magazines and newspapers were able to by funded by native ads because you couldn't auto-remove ads from their printed media and nobody could clone their content and give it away for free.

skeezyboy · 2025-08-05T09:47:11 1754387231

Newspapers sell information. Information is now trivial to copy and send across the globe, when 50 years ago it wasnt. And youre wrong about "nobody could clone their content", because they absolutely could, different editions were pressed throughout the day (morning, lunch, evening newspapers) at the peak of print media. The barrier to entry used to be a printing press, now its just an internet connection, print media has a hard time accepting that

like_any_other · 2025-08-06T07:05:56 1754463956

You can't remove ads that are part of a site's native HTML either - well, not easily, not without an AI determining what is an ad based on the content itself. The few ads I see despite uBlock are like that - something the website author themself included, and not by pulling it in from a different domain.

And those ads don't spy. They tend to be a jpg that functions as a link. That's why I mentioned spying.

theshackleford · 2025-08-05T02:43:47 1754361827

If ads were more respectful I wouldn’t have to remove them. Alas they can’t help themselves and so I do.

When ads were far less invasive, I had a lot more tolerance.

Now they want my data, they want to play audio, video, hijack the content, page etc.

Advertising scum can not be trusted to forever take more and more and more.

stanmancan · 2025-08-05T21:10:09 1754428209

I also have ad-blockers for the same reason. However, if you don't support the people or companies producing the media you consume then don't be surprised when they go out of business.

theshackleford · 2025-08-06T11:13:00 1754478780

> don't be surprised when they go out of business.

I’m ok with this. I support the media I truely want to see, and that media offers alternatives that are not ads.

For instance, I pay for YouTube premium. That said, many will not pay.

ivape · 2025-08-04T15:14:13 1754320453

Or websites can monetize their data via paid apis and downloadable archives. That's what makes Reddit the most valuable data trove for regular users.

ccgreg · 2025-08-04T19:29:39 1754335779

I don't think Reddit pays the people who voluntarily write Reddit content. Valuable to Reddit, I guess.

cj · 2025-08-04T14:54:28 1754319268

Doesn't o3 sort of already do this? Whenever I ask it something, it makes it look like it simultaneously opens 3-8 pages (something a human can't do).

Seems like a reasonable stance would be something like "Following the no crawl directive is especially necessary when navigating websites faster than humans can."

> What if it gets a bit smarter and tried to anticipate what you'll ask and does a bunch of crawling to gather information regularly to try to stay up to date on things (from your machine)?

To be fair, Google Chrome already (somewhat) does this by preloading links it thinks you might click, before you click it.

But your point is still valid. We tolerate it because as website owners, we want our sites to load fast for users. But if we're just serving pages to robots and the data is repackaged to users without citing the original source, then yea... let's rethink that.

Spivak · 2025-08-04T15:45:26 1754322326

You don't middle click a bunch of links when doing research? Of all the things to point to I wouldn't have thought "opens a bunch of tabs" to be one of the differentiating behaviors between browsing with Firefox and browsing with an LLM.

daviddanielng · 2025-08-07T05:43:15 1754545395

> simultaneously opens 3-8 pages (something a human can't do).

Can't you read?

fauigerzigerk · 2025-08-04T18:03:03 1754330583

>Doesn't o3 sort of already do this?

ChatGPT probably uses a cache though. Theoretically, the average load on the original sites could be far less than users accessing them directly.

skeezyboy · 2025-08-05T09:48:50 1754387330

how do you propose we do anything about this? any law you propose would have to be global

tr_user · 2025-08-04T15:56:22 1754322982

I saw someone suggest in another post, if only one crawler was visiting and scraping and everyone else reused from that copy I think most websites would be ok with it. But the problem is every billionaire backed startup draining your resources with something similar to a DOS attack.

npc_anon · 2025-08-05T09:43:09 1754386989

The problem in your logic is that all points starts wit "I".

You're not the only stakeholder in any of those interactions. There's you, a mediator (search or LLM), and the website owner.

The website owner (or its users) basically do all the work and provide all the value. They produce the content and carry the costs and risks.

The pre-LLM "deal" was that at least some traffic was sent their way, which helps with reach and attempts at monetization. This too is largely a broken and asymmetrical deal where the search engine holds all the cards but it's better than nothing.

A full LLM model that no longer sends traffic to websites means there's zero incentive to have a website in the first place, or it is encouraged to put it behind a login.

I get that users prefer an uncluttered direct answer over manually scanning a puzzling web. But the entire reason that the web is so frustrating is that visitors don't want to pay for anything.

danbruc · 2025-08-05T10:12:30 1754388750

But the entire reason that the web is so frustrating is that visitors don't want to pay for anything.

They are already paying, it is the way they are paying that causes the mess. When you buy a product, some fraction of the price is the ad budget that gets then distributed to websites showing ads. Therefore there is also nothing wrong with blocking ads, they have already been paid for, whether you look at them or not. The ad budget will end up somewhere as long as not everyone is blocking all ads, only the distribution will get skewed. Which admittedly might be a problem for websites that have a user base that is disproportionally likely to use ad blockers.

Paying for content directly has the problem that you can only pay for a selected few websites before the amount you have to pay becomes unreasonable. If you read one article on a hundred different websites, you can not realistically pay for a hundred subscriptions that are all priced as if you spent all your time on a single website. Nobody has yet succeeded in creating a web wide payment method that only charges you for the content that you actually consume and is frictionless enough to actually work, i.e. does not force you to make a conscious payment decisions for a few cents or maybe even only fractions of a cent for every link you click and is not a privacy nightmare collecting all the links you click for billing purposes.

Also if you directly pay for content, you will pay twice - you will pay for the subscription and you will still pay into the ad budget with all the stuff you buy.

shortformblog · 2025-08-05T11:30:48 1754393448

Publishers don't get paid a dime if you block the ad unless they are doing a direct ad transaction. Adtech has largely made that transaction a rarity for like 30 years.

It's not like newspapers where advertising is paid in full before publishers put stories online. It has not been that way for a long time.

Your reasoning for not accessing advertising reminds me of that scene in Arrested Development where, to hide the money they've taken out of the till, they throw away the bananas. It doesn't hide the transaction, it compounds the problem.

If publishers were getting paid before any ads ran the publishing business would be a hell of a lot stronger.

danbruc · 2025-08-05T14:39:00 1754404740

Of course, they will not get paid for me visiting the website if I block the ads, but that was not my point. People have already bought stuff and with that paid for the ad budget. And that money will be spent somewhere. Maybe someone else will see the ad that I blocked, someone who would otherwise not have seen it because the ad budget would have been exhausted. Or maybe the prices for ads go up because there are less impressions to sell. Only if companies would lower their ad budgets in response to ad blocking would there be less money to distribute. If that would be the case, then my argument would fail.

shortformblog · 2025-08-05T14:47:57 1754405277

Your point is illogical. It’s like you’ve invented a theory as to how companies advertise that has zero tethering to reality.

It’s especially stupid because it doesn’t include publishers in the equation at all. It’s just you looping over yourself attempting to validate your choice for running an ad blocker.

Admit you’re doing it because you want to callously screw over publishers. You certainly haven’t put their thoughts into consideration here.

To be clear: Run an ad blocker if you want, but stop acting as if you bought those ads. The chicken dinner I ate the other night has no say how I live my life after our transaction has ended.

danbruc · 2025-08-05T15:02:55 1754406175

If I buy an iPhone, does some fraction of the price contribute to Apple's ad budget? If so, where does that money end up? What would change if I did not block Apple ads?

shortformblog · 2025-08-05T15:10:47 1754406647

It’s up to them how they spend their money, not you. You can complain if they somehow damaged your product, they got your money unfairly, or were somehow doing something bad with your data, but at some point it is their money to spend how they see fit. They earned it, and they might spend it on advertising.

If I buy stuff at a grocery store, I can’t get a random bagger fired just because I feel like it. At some point the transaction ends and they ultimately continue to operate with or without your input.

danbruc · 2025-08-05T15:51:37 1754409097

I am neither complaining nor trying them what to do with their money, that looks like a complete deflection to me.

If I am buying Apple products, am I contributing to their ad budget? If so, where does that money end up? Is it likely that some of it will end up as ad revenue on some website? What difference does it make whether or not I block ads? Or the other way around, if I am visiting websites and look at Apple ads but do not buy Apple products, am I contributing to the ad revenue of the websites?

shortformblog · 2025-08-05T16:07:16 1754410036

Maybe in the cosmic sense you are, in that they have a giant pile of money, and you contributed a few pennies to it, but this is not how accounting works. Your transaction and their ad budget are separate things.

Also, advertising does other things than tell you to buy something, and it doesn’t always take the form of banner ads. Apple, for example, does a ton of brand awareness advertising. Affiliate marketing often targets direct transactions. Maybe your goal is to simply start a relationship that might someday lead to a really big purchase.

Often, in the era of SaaS, people advertise to existing customers. Apple does this—they have a TV service and a music service and a cloud service.

There are plenty of reasons for them to advertise after you bought the original product.

But your original point was that customers bought the ads. Maybe they didn’t! Maybe they were given funding by a VC firm and the company decided it wanted to build an audience. Maybe they want to advocate for a political issue.

I think the biggest problem with your argument is that it has tunnel vision and sees advertising as this one dimensional thing, when in reality it takes many forms. Plenty of those forms are bad, but it is not as simple as “I bought a product, now I never want to see an Apple ad ever again.” Many businesses (Amazon, eBay) make most of their money off of customers they’ve already advertised to that they advertise to again and again.

danbruc · 2025-08-05T16:47:13 1754412433

Well, I don't give a shit about the advertising goals of Apple or anyone else, that is why I block ads. And that is also completely irrelevant, the question was whether I am screwing over websites when I am using an ad blocker. I argue not, because as a consumer I still contribute to the ad budgets that become the ad revenue of the websites. What I am not doing when I block ads is influencing how the money gets distributed among all the websites, I can live with that. And if the money is not consumer money, so what? What do I have to do with companies distributing VC money among websites?

shortformblog · 2025-08-05T16:52:16 1754412736

LOL, you don’t. You really don’t. As I told you like four hours ago, ads are impression-based. Just because you bought something that helped them buy an ad doesn’t mean you did shit for my website.

In fact, you did the opposite.

danbruc · 2025-08-05T17:13:04 1754413984

I know that ads are based on impressions as I told you before, but my money still has to end up somewhere even if I am using an ad blocker. So where does it end up if not as ad revenue on some websites? You must not confuse the people paying for the ads and in turn for the ad revenue of websites by buying stuff with the people deciding how that money gets distributed among all the websites by looking at ads.

We can even go one step further, if anyone is screwing over websites, then that is the ad industry by not paying for blocked ads. I buy an iPhone and Apple takes some additional money from me to spend on advertising. I did not ask for that but I am fine with it. Now I expect Apple to spend the money they took from me on ads in order to support websites. But if the guy that Apple wants to show the ad that I paid for does not want to see it and blocks it, then I want Apple to respect that and still pay the website. I know, not going to happen, but do not put the blame on people blocking ads.

shortformblog · 2025-08-05T17:27:16 1754414836

You’re describing socialism (wealth redistribution to be exact). At this point, just make that money a tax and give it to the publishers directly. Cut out the middlemen.

danbruc · 2025-08-05T20:39:21 1754426361

Well, what is the difference, the ad budget fraction of the price is like a tax. I think given a choice most people would prefer to get their stuff a bit cheaper and not contribute to the ad budget. But we pay it and then the companies hand the money out to various parties to display ads creating the possibility of running a business on ad revenue. And in many cases I can ignore ads, I can not look at billboards, I can switch to a different channel during the commercial break, I can flip over the ad pages in newspapers and magazines but they still get paid. Only on the internet have we decided to only pay for ads when somebody actually looks at them. I just asked for the same thing on the internet, pay for the inclusion on the website, whether someone actually sees it or not. Not sure how that is socialism and wealth redistribute.

hahn-kev · 2025-08-05T16:35:36 1754411736

I feel like this could work if the payment was handled by your ISP. Content provider tells the ISP how much their content costs that there subscribers pay, and the ISP pays them. I already pay my ISP. The real problem is that it's kinda too late for this kind of change. And also the ISP would need to prevent their users from running up a bill that the ISP would be responsible for and without tracking them that's not possible.

ricardo81 · 2025-08-05T09:57:53 1754387873

Agreed.

Cloudflare released these insights showing the disparity between crawling/scraping and visits referred from the AI platforms.

https://radar.cloudflare.com/ai-insights#crawl-to-refer-rati...

yojo · 2025-08-04T14:28:06 1754317686

Ads are a problematic business model, and I think your point there is kind of interesting. But AI companies disintermediating content creators from their users is NOT the web I want to replace it with.

Let’s imagine you have a content creator that runs a paid newsletter. They put in lots of effort to make well-researched and compelling content. They give some of it away to entice interested parties to their site, where some small percentage of them will convert and sign up.

They put the information up under the assumption that viewing the content and seeing the upsell are inextricably linked. Otherwise there is literally no reason for them to make any of it available on the open web.

Now you have AI scrapers, which will happily consume and regurgitate the work, sans the pesky little call to action.

If AI crawlers win here, we all lose.

bee_rider · 2025-08-04T14:49:04 1754318944

I think it’s basically impossible to prevent AI crawlers. It is like video game cheating, at the extreme they could literally point a camera at the screen and have it do image processing, and talk to the computer through the USB port emulating, a mouse and keyboard outside the machine. They don’t do that, of course, because it is much easier to do it all in software, but that is the ultimate circumvention of any attempt to block them out that doesn’t also block out humans.

I think the business model for “content creating” is going have to change, for better or worse (a lot of YouTube stars are annoying as hell, but sure, stuff like well-written news and educational articles falls under this umbrella as well, so it is unfortunate that they will probably be impacted too).

yojo · 2025-08-04T15:24:19 1754321059

I don’t subscribe to technological inevitabilism.

Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.

But I can imagine more extreme measures. e.g. old web of trust style request signing[0]. I don’t see any easy way for scrapers to beat a functioning WOT system. We just don’t happen to have one of those yet.

0: https://en.m.wikipedia.org/wiki/Web_of_trust

bee_rider · 2025-08-04T18:50:20 1754333420

> Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.

I think this might actually point at the end state. Scraping bots will eventually get good enough to emulate a person well enough to be indistinguishable (are we there yet?). Then, content creators will have to price their content appropriately. Have a Patreon, for example, where articles are priced at the price where the creator is fine with having people take that content and add it to the model. This is essentially similar to studios pricing their content appropriately… for Netflix to buy it and broadcast it to many streaming users.

Then they will have the problem of making sure their business model is resistant to non-paying users. Netflix can’t stop me from pointing a camcorder at my TV while playing their movies, and distributing it out like that. But, somehow, that fact isn’t catastrophic to their business model for whatever reason, I guess.

Cloudflare can try to ban bad actors. I’m not sure if it is cloudflare, but as someone who usually browses without JavaScript enables I often bump into “maybe you are a bot” walls. I recognize that I’m weird for not running JavaScript, but eventually their filters will have the problem where the net that captures bots also captures normal people.

Barbing · 2025-08-05T06:11:04 1754374264

>Then they will have the problem of making sure their business model is resistant to non-paying users. Netflix can’t stop me from pointing a camcorder at my TV while playing their movies, and distributing it out like that. But, somehow, that fact isn’t catastrophic to their business model for whatever reason, I guess.

Interested to see some LLM-adverserial equivalent of MPAA dots![1]

[1] https://en.wikipedia.org/wiki/Coded_anti-piracy

skeezyboy · 2025-08-05T09:57:29 1754387849

Netflix CAN "stop you from pointing a camera at your TV and distributing it" because of copyright law.

account42 · 2025-08-05T13:25:03 1754400303

Which is also how AI scrapers should be solved. Papering over the issue with technological "solutions" only hurts real users.

skeezyboy · 2025-08-05T15:59:29 1754409569

in the UK at least, that has been recognized as fair use.

immibis · 2025-08-04T17:58:57 1754330337

Beating web of trust is actually pretty easy: pay people to trust you.

Yes, you can identify who got paid to sign a key and ban them. They will create another key, go to someone else, pretend to be someone not yet signed up for WoT (or pay them), and get their new key signed, and sign more keys for money.

So many people will agree to trust for money, and accountability will be so diffuse, that you won't be able to ban them all. Even you, a site operator, would accept enough money from OpenAI to sign their key, for a promise the key will only be used against your competitor's site.

It wouldn't take a lot to make a binary-or-so tree of fake identities, with exponential fanout, and get some people to trust random points in the tree, and use the end nodes to access your site.

Heck, we even have a similar problem right now with IP addresses, and not even with very long trust chains. You are "trusted" by your ISP, who is "trusted" by one of the RIRs or from another ISP. The RIRs trust each other and you trust your local RIR (or probably all of them). We can trace any IP to see who owns it. But is that useful, or is it pointless because all actors involved make money off it? You know, when we tried making IPs more identifying, all that happened is VPN companies sprang up to make money by leasing non-identifying IPs. And most VPN exits don't show up as owned by the VPN company, because they'd be too easy to identify as non-identifying. They pay hosting providers to use their IPs. Sometimes they even pay residential ISPs so you can't even go by hosting provider. The original Internet was a web of trust (represented by physical connectivity), but that's long gone.

Spivak · 2025-08-04T15:50:41 1754322641

It is inevitable, not because of some technological predestination but because if these services get hard-blocked and unable to perform their duties they will ship the agent as a web browser or browser add-on just like all the VSCode forks and then the requests will happen locally through the same pipe as the user's normal browser. It will be functionally indistinguishable from normal web traffic since it will be normal web traffic.

skeledrew · 2025-08-04T17:54:43 1754330083

Then personal key sharing will become a thing, similar to BugMeNot et al.

account42 · 2025-08-05T13:23:16 1754400196

A web of trust is not going to plug the analog hole gp already mentioned.

Meanwhile its going to fuck over real users.

subspeakai · 2025-08-04T16:10:01 1754323801

This is the fascinating case where I think this all goes - At some point costs come down and you can do this and bypass everything

shadowgovt · 2025-08-04T15:24:29 1754321069

> Otherwise there is literally no reason for them to make any of it available on the open web

This is the hypothesis I always personally find fascinating in light of the army of semi-anonymous Wikipedia volunteers continuously gathering and curating information without pay.

If it became functionally impossible to upsell a little information for more paid information, I'm sure some people would stop creating information online. I don't know if it would be enough to fundamentally alter the character of the web.

Do people (generally) put things online to get money or because they want it online? And is "free" data worse quality than data you have to pay somebody for (or is the challenge more one of curation: when anyone can put anything up for free, sorting high- and low-quality based on whatever criteria becomes a new kind of challenge?).

Jury's out on these questions, I think.

yojo · 2025-08-04T16:13:28 1754324008

Any information that requires something approximating a full-time job worth of effort to produce will necessarily go away, barring the small number of independently wealthy creators.

Existing subject-matter experts who blog for fun may or may not stick around, depending on what part of it is “fun” for them.

While some must derive satisfaction from increasing the total sum of human knowledge, others are probably blogging to engage with readers or build their own personal brand, neither of which is served by AI scrapers.

Wikipedia is an interesting case. I still don’t entirely understand why it works, though I think it’s telling that 24 years later no one has replicated their success.

SoftTalker · 2025-08-04T16:25:43 1754324743

Wikipedia works for the same reason open-source does: because most of the contributors are experts in the subject and have paid jobs in that field. Some are also just enthusiasts.

ndriscoll · 2025-08-04T20:02:43 1754337763

OpenStreetMap is basically Wikipedia for maps and is quite successful. Over 10M registered users and millions of edits per day. Lots of information is also shared online on forums for free. The hosting (e.g. reddit) is basically a commodity that benefits from network effects. The information is the more interesting bit, and people share it because they feel like it.

account42 · 2025-08-05T13:34:34 1754400874

> Any information that requires something approximating a full-time job worth of effort to produce will necessarily go away

Many people put more effort into their hobbies than into their "full time" job.

Some of it will go away but perhaps without the expectation that you can earn money more people will share freely.

> While some must derive satisfaction from increasing the total sum of human knowledge, others are probably blogging to engage with readers or build their own personal brand, neither of which is served by AI scrapers.

We don't have to make all business models that someone might want possible though.

> Wikipedia is an interesting case. I still don’t entirely understand why it works, though I think it’s telling that 24 years later no one has replicated their success.

Actually this model is quite common. There are tons of sources of free information curated by volunteers - most are just too niece to get to the scale of Wikipedia.

skeezyboy · 2025-08-05T10:00:54 1754388054

A large portion of "content" these days is copy/pasted shite so they can get views to get ad revenue, quite simply.