This sounds kind of cool, but I'm more worried about the e-waste. A device that is cheap, will get "recycled", and sounds cool, will get a tiny amount of use after the initial wow factor wears off or doesn't fit the needs, then gets thrown away or lost. And is this feature in the new repebble watches? I would rather have it there with a bigger battery.
From reading that, I'm not quite sure if they have anything figured out.
I actually agree, but her notes are mostly fluff with no real info in there and I do wonder if they have anything figured out besides "collect spatial data" like imagenet.
There are actually a lot of people trying to figure out spatial intelligence, but those groups are usually in neuroscience or computational neuroscience.
Here is a summary paper I wrote discussing how the entorhinal cortex, grid cells, and coordinate transformation may be the key: https://arxiv.org/abs/2210.12068 All animals are able to transform coordinates in real time to navigate their world and humans have the most coordinate representations of any known living animal. I believe human level intelligence is knowing when and how to transform these coordinate systems to extract useful information.
I wrote this before the huge LLM explosion and I still personally believe it is the path forward.
> From reading that, I'm not quite sure if they have anything figured out. I actually agree, but her notes are mostly fluff with no real info in there and I do wonder if they have anything figured out besides "collect spatial data" like imagenet.
Right. I was thinking about this back in the 1990s. That resulted in a years-long detour through collision detection, physically based animation, solving stiff systems of nonlinear equations, and a way to do legged running over rough terrain.
But nothing like "AI". More of a precursor to the analytical solutions of the early Boston Dynamics era.
Work today seems to throw vast amounts of compute at the problem and hope a learning system will come up with a useful internal representation of the spatial world. It's the "bitter lesson" approach. Maybe it will work. Robotic legged locomotion is pretty good now. Manipulation in unstructured situations still sucks. It's amazing how bad it is. There are videos of unstructured robot manipulation from McCarthy's lab at Stanford in the 1960s. They're not that much worse than videos today.
I used to make the comment, pre-LLM, that we needed to get to mouse/squirrel level intelligence rather than trying to get to human level abstract AI. But we got abstract AI first. That surprised me.
There's some progress in video generation which takes a short clip and extrapolates what happens next. That's a promising line of development. The key to "common sense" is being able to predict what happens next well enough to avoid big mistakes in the short term, a few seconds.
How's that coming along? And what's the internal world model, assuming we even know?
I share your surprise regarding LLM, is it fair to say that it's because language - especially formalised, written language - is a self-describing system.
A machine can infer the right (or expected) answer based on data, I'm not sure that the same is true for how living things navigate the physical world - the "right" answer, such as one exists for your squirrel, is arguably Darwinian: "whatever keeps the little guy alive today".
>There's some progress in video generation which takes a short clip and extrapolates what happens next. That's a promising line of development. The key to "common sense" is being able to predict what happens next well enough to avoid big mistakes in the short term, a few seconds. How's that coming along? And what's the internal world model, assuming we even know?
That GTA demo isn't about control. The user, not the net, is driving.
That's more like the demos where someone trains on a scene and the neural net can make plausible extensions to the scene as you move the viewpoint. It's more spatial imagination, like the tool in Photoshop that fills in plausible but imaginary backgrounds.
It does handle collisions with the edge of the road. Collisions with other cars don't really work; they mostly disappear. One car splits in half in confusion.
The spatial part is making progress, but the temporal part, not so much.
> I used to make the comment, pre-LLM, that we needed to get to mouse/squirrel level intelligence rather than trying to get to human level abstract AI. But we got abstract AI first. That surprised me.
"AI" is not based on physical real world data and models like our brain. Instead, we chose to analyze human formal (written) communication. ("formal": actual face to face communication has tons of dimensions adding to the text representation of what is said, from tone, speed to whole body and facial expressions)
Bio-brains have a model based on physical sensor data first and go from there, that's completely missing from "AI".
In hindsight, it's not surprising, we skipped that hard part (for now?). Working with symbols is what we've been doing with IT for a long time.
I'm not sure going all out on trying to base something on human intelligence, i.e. human neuro networks, is a winning move. I see it as if we had been trying to create airplanes that flap their wings. For one, human intelligence already exists, and when you lean back and manage to look at how we do on small and large problems from an outside perspective it has plenty of blind spots and disadvantages.
I'm afraid if we were to manage a hundred percent human level intelligence AI we will be disappointed. Sure, it will be able to do a lot, but in the end, nothing we don't already have.
Right now that would also just be the abstract parts, I think the "moving the body" physical parts in relation to abstract commands would be the far more interesting part, but since current AI is not about using physical sensor data at all, never mind combining it with the abstract stuff...
You seem to be suggesting that current frontier models are only trained on text and not "sensor data". Multi-modal models are trained on the entire internet + vast amounts of synthetic data. Images and videos are key inputs. Camera sensors are capable of capturing much more "sensor data" than the human eye. Neural networks are the worst way to model intelligence, except all other models.
As soon as you start a response like that you should just stop. After all, this is written communication, and what I wrote is plain to see right there.
When you need to start a response that way you should become self-aware that you are not responding to what the person you respond to wrote, but to your own ideas.
There is no need to "interpret" what other people wrote.
> Here is a summary paper I wrote discussing how the entorhinal cortex, grid cells, and coordinate transformation may be the key: https://arxiv.org/abs/2210.12068 All animals are able to transform coordinates in real time to navigate their world and humans have the most coordinate representations of any known living animal. I believe human level intelligence is knowing when and how to transform these coordinate systems to extract useful information.
Yes, you and the Mosers who won the Nobel Prize all believe that grid cells are the key to animals understanding their position in the world.
>There's a whole giant gap between grid cells and intelligence.
Please check this recent article on the state machine in the hippocampus based on learning [1]. The findings support the long-standing proposal that sparse orthogonal representations are a powerful mechanism for memory and intelligence.
[1] Learning produces an orthogonalized state machine in the hippocampus:
Of course, but the mechanisms “remain obscure”. The entorhinal cortex is but a facet of this puzzle and placement vs head direction etc must be understood beyond mere prediction. There are too many essential parts that are not understood particularly senses and emotion which play the tinkering precursors to evolutionary function that are excluded now as well as the likelyhood that prediction error and prediction are but mistaken precursor computational bottlenecks to unpredictability. Pushing AI into the 4% of a process materially identified as entorhinal is way premature.
This approach simply follows suit with the blundering reverse engineering of the brain in cog sci where material properties are seen in isolation and processes are deduced piecemeal. The brain can only be understood as a whole first. See rhythms of the brain or unlocking the brain.
There’s a terrifying lack of curiosity in the paper you posted, a kind of smug synthetic rush to import code into a part of the brain that’s a directory among directories that has redundancies as a warning: we get along without this.
Your and their view (OSM) is too narrow. eg categorization is baked into the whole brain. How? This is one of 1000s of processes that generalize materially across the entire brain. Isolating "learning" to the allocortex is incredibly misleading.
I kept reading, waiting for a definition of spatial intelligence, but gave up after a few paragraphs. After years of reading VC-funded startup fluff, writing that contain these words tend to put me off now: transform, revolutionize, next frontier, North Star.
She's funded by fascist oligarchs at Sequoia, not hard to connect the dots. Just listen to all the buzzwords and ancient Greek allegories though, totally not a bubble...
>3. Interactive: World models can output the next states based on input actions
>Finally, if actions and/or goals are part of the prompt to a world model, its outputs must include the next state of the world, represented either implicitly or explicitly. When given only an action with or without a goal state as the input, the world model should produce an output consistent with the world’s previous state, the intended goal state if any, and its semantic meanings, physical laws, and dynamical behaviors. As spatially intelligent world models become more powerful and robust in their reasoning and generation capabilities, it is conceivable that in the case of a given goal, the world models themselves would be able to predict not only the next state of the world, but also the next actions based on the new state.
That's literally just an RNN (not a transformer). An RNN takes a previous state and an input and produces a new state. If you add a controller on top, it is called model predictive control. The most extreme form I have seen is temporal difference model predictive control (TD-MPC). [0]
The question, as always, is: can we get any useful insights from all of that?
Trying to copy biological systems 1:1 rarely works, and copying biological systems doesn't seem to be required either. CNNs are somewhat brain-inspired, but only somewhat, and LLMs have very little architectural similarity to human brain - other than being an artificial neural network.
This functional similarity of LLMs to the human brain doesn't come from reverse engineered details of how the human brain works - it comes from the training process.
There's nothing similar about LLMs and human brains. Theyre entirely divergent. Training a machine has nothing remotely to do with biological development.
Hard metrics: LLMs perform NLP, NLU and CSR tasks at humanlike levels.
Research findings: LLMs have and use world models. They use some type of abstract thinking - with internal representation that often correspond to human abstract concepts. Which adds up to a capability profile that's amusingly humanlike.
Humans, however, don't like that. They really don't. AI effect is too strong, and it demands that humans must be Special. So some humans, when faced with the possibility that an AI might be doing the same thing their own brains do, resort to coping and seething.
These are not examples, they’re narrative (false) equivalencies.
Brains don’t perform Natural language CSR etc, those are cultural extensions separate from mental states etc. there are no functional equivalencies here.
There are many many empirical disputes for function eg
Aru et al “The feasibility of artificial consciousness through the lens of neuroscience” December 2023
This is super cool and I want to read up more on this as I think you are right insofar as it is the basis for reasoning. However it does seem more complex than just that. So how do we go from coordinate system transformations to abstract reasoning with symbolic representations?
> if they have anything figured out besides "collect spatial data" like imagenet
I mean she launched her whole career with imagenet so you can hardly blame her for thinking that way. But on the other hand, there's something bitter lesson-pilled about letting a model "figure out" spatial relationships just by looking at tons of data. And tbh the recent progress [1] of worldlabs.ai (Dr Fei Fei Li's startup) looks quite promising for a model that understands stuff including reflections and stuff.
> looks quite promising for a model that understands stuff including reflections and stuff.
I got the opposite impression when trying their demo...[0]. Even in their examples some of these issues exist like how objects stay a constant size despite moving. Like missing the parallax or depth information. Not to mention that they show it walking on water lol
As for reflections, I don't get that impression either. They seem extremely brittle to movement.
No you just don't understand, don't you see! the ancient greeks foresaw this centuries ago, we are just on the cusp of a world changing moment can't you feel the buzzwords flow through you! First it's creating 7 second meme videos w/ too many arms, then it's right to curing cancer and solving physics! Let the power of buzzwords calm your fears of a bubble.
To decipher if there is anything like spatial intelligence, which is an oxymoronic term at most and redundant at least, one has to decipher the base units of the processes prior to their materialization in the allocortex. And to assign a careful concatenated/parametric categorization of what is unitized, where the processes focus into thresholds etc. This frontier propaganda and the few arvix/nature papers here are too synthetic to lead anywhere of merit.
ON battery life, I would love some kind of dumb phone/ ultra low power mode that we can set when we just want watch mode at certain times and nothing else. I imagine that would give us a week of battery.
They used to have this with "Power Reserve Mode" (PRM) which turned off everything except showing the time briefly when you pressed the side button.
If you got below 10% it would ask if you wanted to switch to this mode. You could also turn it on in the battery settings.
I've read that in this mode you could get a week or two on a full battery.
Sometime around watchOS 9 they replaced this with "Low Power Mode" (LPM). LPM reduced things like notifications, background processing, and update frequency enough to get about 50% more life out of a non-Ultra and 100% more on an Ultra 2.
LPM is gone from watchOS now but the underlying functionality still works and there is still a way to access it. You have to turn the watch off. While it is off if you press the crown it will briefly display the time.
If you wanted to frequently switch between normal operation and this low power time-only mode it would be somewhat of a pain since you'd have to turn the watch off and on to switch modes, and watchOS boots really really slow.
If you don't have to quickly switch between modes though it might be reasonable.
On the other hand, I just use my cheap Timex (with years of battery life) unless I need/want the more advanced features of the Apple Watch Ultra. I've been of two minds about the Ultra (and the Apple Watch generally). When I'm really using it like on a hike, I really like it. For more day to day stuff I mostly have notifications turned off and I rarely get legit phone calls when my phone isn't handy.
Mobile phones two decades ago lasted for weeks. I don't remember how many, but long enough, that we constantly lost our (expensive) charging cables. I believe battery time went down as fast as screens got bigger (and colorful).
Full Time and Part Time roles
Distark is hiring, an edutainment brand building learning videos for Education. Our small, passionate team creates animated shows and learning tools that help kids (ages 3-9) fall in love with curiosity and real-world learning. We use custom automation to speed up everything from story writing to animation. See what we’re making: https://www.youtube.com/watch?v=c46VaM_VZGU
Open Roles (REMOTE, non US):
Animators (2D/3D or hybrid—experience with AI tools a plus)
Junior Software Developers (Python, Node, JS, or open to learning new tech)
Interns (all backgrounds, generalists, tech, or creative)
We are also looking for writers (different kind of hacking)
Important: You must be a parent (of any age child) We want people who care deeply about kids and learning. Lived experience as a parent is essential—our mission is to build things real families want.
Why join us?
Fully remote, async-friendly Ship real things that impact how kids learn Fast, creative, zero-corporate-BS environment
Direct access to founders; real ownership
Opportunity to shape our tools and shows from the ground up
To apply: Email jobs [at] studyturtle.com with your background, a few sentences about your kids, and why this mission excites you. Please include links to any relevant work.
We are the hiring company and will reply to all genuine applicants. No recruiters, no agencies, please.
Some feedback for you: the position seemed interesting to me, but this requirement is a huge turn off. I'm not emailing you information about my kids to apply for a job. If I felt this way, it's possible other people did too. You might want to rethink how you're communicating this.
Nice work on offline.kids and kudos for tackling screen-free play. I've been playing in a similar space. If you want something complementary for the inevitable “but why?” moments, you might like StudyTurtle Ask (https://studyturtle.com/ask). It’s a free, no-signup AI Q&A tuned for 3–9 year-olds with:
Strict age calibration (matching phrasing and examples to each developmental level)
Concrete analogies (“volcanoes are like shaken soda bottles”) and kitchen-table experiments you can actually do
Distark is hiring, an edutainment brand building learning videos for Education. Our small, passionate team creates animated shows and learning tools that help kids (ages 3-9) fall in love with curiosity and real-world learning. We use custom automation to speed up everything from story writing to animation. See what we’re making: https://www.youtube.com/watch?v=c46VaM_VZGU
An example of some of our software, ChatGPT for families: https://www.studyturtle.com/ask
Open Roles (REMOTE):
Animators (2D/3D or hybrid—experience with AI tools a plus)
Junior Software Developers (Python, Node, JS, or open to learning new tech)
Interns (all backgrounds, generalists, tech, or creative)
Important: You must be a parent (of any age child)
We want people who care deeply about kids and learning. Lived experience as a parent is essential—our mission is to build things real families want.
Why join us?
Fully remote, async-friendly
Ship real things that impact how kids learn
Fast, creative, zero-corporate-BS environment
Direct access to founders; real ownership
Opportunity to shape our tools and shows from the ground up
To apply:
Email jobs [at] studyturtle.com with your background, a few sentences about your kids, and why this mission excites you. Please include links to any relevant work.
We are the hiring company and will reply to all genuine applicants. No recruiters, no agencies, please.
depends on how the law is written. the primary purpose is to protect parents, and in some jurisdictions favoring parents may actually be welcome exactly because of that.
either way, it should not be illegal to require parenting experience though. if you can demonstrate that experience without having children of your own (because of much younger siblings, fostering, or working as a caregiver or teacher maybe) then maybe you qualify.
reply