> Our datasets and models today [2055] look like a joke. Both are somewhere arou...

ramblerman · on Aug 26, 2023

I think you are limiting yourself by thinking of the dataset of the future as just being more and bigger images.

Perhaps it will be trained on whole videos, or a combination of different inputs from agents that move about in the real world / or a video game.

Lucasoato · on Aug 26, 2023

Maybe the real game changer in the future will be the ability to train the same model on very different kind of inputs like video, images, text, audio... Imagine also all these data cleaning tasks are already automated, you just need to feed the model PDFs and automatically a support model will extract all the relevant metadata... or probably you'll just be able to select a set of books from an online library and your model will train on them as well (of course for a non trivial subscription lol)

ben_w · on Aug 26, 2023

10e6*400e6/8e9/365/18 = 76 images per person per waking hour; it's not implausible given how many cameras there are and how many moments people might snap to share with remote friends — I can easily believe we'll have always-on video chat with multiple people in AR glasses by that point.

anonzzzies · on Aug 26, 2023

Most images are not shared though; just snapped. In the past you had photo albums no-one ever looked in. And that weren't that many pics; now , whenever, people (old and young) take 100s of pictures, on iPhones often by holding the button so it zaps 100s of them in a few seconds.

ben_w · on Aug 26, 2023

> Most images are not shared though

Not yet.

As the joke goes:

People in the 60s:

I better not say that or the government will wiretap my house

People today:

Hey wiretap, do you have a recipe for pancakes?

cma · on Aug 26, 2023

Maybe you won't receive your "world coin" universal income dividend unless you livestream 24/7.

djantje · on Aug 26, 2023

Maybe, but the input in 2055 will be more something in the form of continuous/realtime data input streams.

powera · on Aug 26, 2023

No, there won't. I must assume he is exaggerating for the clicks.

sroussey · on Aug 26, 2023

Generate as many as you need.

sroussey · on Aug 26, 2023

Oh, also curious… today, how many individual image frames from video are there just from Tesla vehicles?

alpaca128 · on Aug 26, 2023

Training models from generated content degrades them over time.

Philpax · on Aug 26, 2023

The generated results can come from other means - for example, pretraining on rendered CG imagery is quite popular in the computer vision world, especially for problems where acquiring ground truth data in the real world is quite difficult.

pyinstallwoes · on Aug 26, 2023

Yet science fiction pushes civilization towards novelty