> We believe in training our models using diverse and high-quality data. This in...

bitpush · 2025-07-17T21:24:53 1752787493

When Apple inevitably partners with OpenAI or Anthropic, which by their definition isnt doing "ethical crawling", I wonder how I should be reading that.

jhickok · 2025-07-17T21:38:10 1752788290

They already partnered with OpenAI, right?

DSingularity · 2025-07-17T21:49:19 1752788959

To use their APIs at a discount, so what?

JimDabell · 2025-07-17T22:12:55 1752790375

Apple aren’t paying OpenAI anything:

https://www.bloomberg.com/news/articles/2024-06-12/apple-to-...

sharkjacobs · 2025-07-17T22:23:09 1752790989

That's a big discount

aydyn · 2025-07-18T07:31:22 1752823882

They paid in exposure, unironically.

badwolf · 2025-07-17T22:26:04 1752791164

That's quite a discount! ;)

arthurcolle · 2025-07-18T02:58:39 1752807519

The art of the deal

wmf · 2025-07-17T22:29:23 1752791363

In theory Apple could provide their training data to be used by OpenAI/Anthropic.

bitpush · 2025-07-17T22:31:55 1752791515

It isn't "apple proprietary" data to give it to OpenAI.

Also the bigger problem is, you can't train a good model with smaller data. The model would be subpar.

bigyabai · 2025-07-17T23:06:33 1752793593

"Good artists copy; great artists steal"

- Famous Dead Person

brookst · 2025-07-18T00:09:34 1752797374

I mean they also buy from companies with less ethical supply chain practices than their own. I don’t know that I need to feel anything about that beyond recognizing there’s a big difference between exercising good practices and refusing to deal with anyone who does less.

fridder · 2025-07-17T22:23:53 1752791033

Same way as the other parts of their supply chain I suppose.

darkoob12 · 2025-07-18T07:38:42 1752824322

You shouldn't believe Big Tech on their PR statements.

They are decades behind in AI. I have been following AI research for a long time. You can find best papers published by Microsoft, Google, Facebook in past 15 years but not Apple. I don't know why but they didn't care about AI at all.

I would say this is PR to justify their AI state.

ACCount36 · 2025-07-18T08:56:31 1752828991

Apple used to be at the edge of AI. They shipped Siri before "AI assistant" went mainstream, they were one of the first to ship an actual NPU in consumer hardware and put neural networks into features people use. They were spearheading computational photography. They didn't publish research, they're fucking Apple, but they did do the work.

And then they just... gave up?

I don't know what happened to them. When AI breakthrough happened, I expected them to put up a fight. They never did.

lynx97 · 2025-07-18T11:53:43 1752839623

> I don't know what happened to them.

Tim Cook happened. The fish rots from the head down.

LoganDark · 2025-07-21T12:54:41 1753102481

It's more like "Steve Jobs died". I'm not sure anyone could have replaced him.

lynx97 · 2025-07-23T16:48:41 1753289321

Thats a lame excuse. And a strange attempt on deification.

The predecessor is no reason for the current head to slack off.

mlnj · 2025-07-18T09:59:32 1752832772

>I don't know what happened to them. When AI breakthrough happened, I expected them to put up a fight. They never did.

Apple always had the luxury of time. They work heavily on integrating deeply into their ecosystems without worrying about the pace of the latest development. eg. Widgets were a 2023 feature for iOS. They do it late, but do it well.

The development in the LLM space was and is too fast for Apple to compete in. They usually pave their own path and stay in their lane as a leader. The impact on Apple's brand image will be tarnished if Google, Meta, OpenAI, MS all leapfrog Apple's models every 2-3 months. That's just not what the Apple brand is associated with.

simonw · 2025-07-17T22:19:57 1752790797

One problem with Apple's approach here is that they were scraping the web for training data long before they published the details of their activities and told people how to exclude them using robots.txt

dijit · 2025-07-17T22:40:01 1752792001

Uncharitable.

Robots.txt is already the understood mechanism for getting robots to avoid scraping a website.

simonw · 2025-07-17T22:52:57 1752792777

People often use specific user agents in there, which is hard if you don't know what the user agents are in advance!

lxgr · 2025-07-18T02:05:03 1752804303

That seems like a potentially very useful addition to the robots.txt "standard": Crawler categories.

Wanting to disallow LLM training (or optionally only that of closed-weight models), but encouraging search indexing or even LLM retrieval in response to user queries, seems popular enough.

wat10000 · 2025-07-17T22:56:43 1752793003

If you're using a specific user agent, then you're saying "I want this specific user agent to follow this rule, and not any others." Don't be surprised when a new bot does what you say! If you don't want any bots reading something, use a wildcard.

lxgr · 2025-07-18T02:06:46 1752804406

Yes, but given the lack of generic "robot types" (e.g. "allow algorithmic search crawlers, allow archival, deny LLM training crawlers"), neither opt-in nor opt-out seems like a particularly great option in an age where new crawlers are appearing rapidly (and often, such as here, are announced only after the fact).

simonw · 2025-07-17T23:46:28 1752795988

Sure, but I still think it's OK to look at Apple with a raised eyebrow when they say "and our previously secret training data crawler obeys robots.txt so you can always opt out!"

wat10000 · 2025-07-18T13:54:39 1752846879

I've been online since before the web existed, and this is the first time I've ever seen this idea of some implicit obligation to give people advance notice before you deploy a crawler. Looks to me like people are making up new rules on the fly because they don't like Apple and/or LLMs.

simonw · 2025-07-18T16:12:47 1752855167

I stand by what I said.

Apple are saying you can opt out of their training data collection using robots.txt.

But... they collected their training data before they told people how to opt out.

I don't understand why me pointing that out as "eyebrow raising" is controversial here.

hn_go_brrrrr · 2025-07-18T16:44:08 1752857048

It's not controversial, it's just not how the ecosystem works. There has never been an expectation that someone make a notification about impending crawling.

It might be nice if there were categories that well-behaved bots could follow, as noted above, but even then the problem exists for bots doing new things that don't fall into existing categories.

simonw · 2025-07-18T17:08:38 1752858518

My complaint here isn't what they did. It's that they explain it as "here's how to opt out" when the information was too late to allow people to opt out.

I think that's disingenuous of them.

wat10000 · 2025-07-18T18:33:36 1752863616

It's been common knowledge for anyone running a web server since 1994.

simonw · 2025-07-18T19:12:23 1752865943

I don't think you are reading my posts in full.

pjmlp · 2025-07-18T13:27:03 1752845223

Assuming well behaved robots.

conradev · 2025-07-18T04:01:32 1752811292

They documented it in 2015: https://www.macrumors.com/2015/05/06/applebot-web-crawler-si...

astrange · 2025-07-17T23:26:14 1752794774

> Using our web crawling strategy, we sourced pairs of images with corresponding alt-texts.

An issue for anti-AI people, as seen on Bluesky, is that they're often "insisting you write alt text for all images" people as well. But this is probably the main use for alt text at this point, so they're essentially doing annotation work for free.

simonw · 2025-07-17T23:47:50 1752796070

I think it is entirely morally consistent to provide alt text for accessibility even if you personally dislike it being used to train AI models.

astrange · 2025-07-18T00:15:48 1752797748

It's fine if you want to, but I think they should consider that basically nobody is reading it. If it was important for society, photo apps would prompt you to embed it in the image like EXIF.

Computer vision is getting good enough to generate it; it has to be, because real-world objects don't have alt text.

simonw · 2025-07-18T00:54:38 1752800078

I actually use Claude to generate the first draft of most of my alt text, but I still do a manual review of it because LLMs usually don't have enough contents to fully understand the message I'm trying to convey with an image: https://simonwillison.net/2025/Mar/2/accessibility-and-gen-a...

simonw · 2025-07-18T03:21:22 1752808882

Context not content.

lxgr · 2025-07-18T02:01:04 1752804064

Why would photo apps do what's "important for society"?

Annotating photos takes time/effort, and I could totally imagine photo apps being resistant to prompting their users for that, some of which would undoubtedly find it annoying, and many more confusing.

Yet I don't think that one can conclude from that that annotations aren't helpful/important to vision impaired users (at least until very recently, i.e. before the widespread availability of high quality automatic image annotations).

In other words, the primary user base of photo editors isn't the set of people that would most benefit from it, which is probably why we started seeing "alt text nudging" first appear on social media, which has both producer and consumer in mind (at least more than photo editors).

astrange · 2025-07-18T05:32:03 1752816723

> Why would photo apps do what's "important for society"?

One would hope they're responsive to user demands. I should say Lightroom does have an alt text field, but like phone camera apps don't.

Apple is genuinely obsessed with accessibility (but bad at social media) and I think has never once advocated for people to describe their photos to each other.

barbazoo · 2025-07-18T02:25:32 1752805532

> An issue for anti-AI people, as seen on Bluesky, is that they're often "insisting you write alt text for all images" people as well. But this is probably the main use for alt text at this point, so they're essentially doing annotation work for free.

How did you come to the conclusion that those two groups overlap so significantly?

Karrot_Kream · 2025-07-18T21:13:56 1752873236

This is a well known fact. A bunch of AI researchers tried to migrate to the platform from Twitter but got a ton of hate and death threats from other users so they went back. Bluesky has a pretty strong anti-AI bias and the community of folks talking about it despite that is very small.

astrange · 2025-07-18T05:32:45 1752816765

Well that's easy, I read their posts where they say it.

barbazoo · 2025-07-18T15:15:04 1752851704

So you found a couple people expressing this conflicting view and assumed it applies to a larger group? Doesn’t sound very reliable to me but I see this all the time and it makes sense if you look at it as a mechanism to explain the world .

godelski · 2025-07-18T09:34:47 1752831287

  > this is probably the main use for alt text at this point

Alt text gives you 2k characters. All I gotta say is there's quite a bit of poisoned data

aydyn · 2025-07-18T07:30:36 1752823836

Respect, but its going to be terrible compared to every other company. You can only hamstring yourself so much.

epolanski · 2025-07-18T02:14:06 1752804846

Respect actions, not words and PR.

bigyabai · 2025-07-17T23:51:18 1752796278

Gotta polish that fig-leaf to hide Apple's real stance towards user privacy: arstechnica.com/tech-policy/2023/12/apple-admits-to-secretly-giving-governments-push-notification-data/

> Apple has since confirmed in a statement provided to Ars that the US federal government "prohibited" the company "from sharing any information,"

brookst · 2025-07-18T00:11:49 1752797509

I mean if you throw out all contrary examples, I suppose you are left with the simple lack of nuance you want to believe

bigyabai · 2025-07-18T00:49:18 1752799758

All examples contrary to what? Admitting to being muzzled by feds?

Take all the space you need to lay out your contrary case. Did the San Bernadino shooter predict this?

brookst · 2025-07-18T13:26:44 1752845204

You literally said that we should disregard this example and focus on the “real” situation as evidenced by a different example.

It is exactly the same thing as saying “if you ignore the heads, these coins really always come up tails”.

Does the Chewbacca argument method ever work these days?