> I ended up with a combo of CPU+GPU on networked PC's, which was better than nothing but not ideal.
The biggest constraint I've seen with scaling up these simulations is maintaining coherence of population dynamics across all of the processing units. The less often population members are exchanged, the more likely you will wind up with a decoupled population and stuck in a ditch somewhere. Since modern CPUs can go so damn fast, you need to exchange members quite frequently. Memory bandwidth and latency are the real troublemakers.
You can spread a simulation across many networked nodes, but then your effective cycle time is potentially millions of times greater than if you keep it all in one socket. I think the newest multi-die CPUs hit the perfect sweet spot for these techniques (~100ns latency domain).
> The biggest constraint I've seen with scaling up these simulations is maintaining coherence of population dynamics across all of the processing units
Agreed. For mine I purposefully avoided this problem by making the population+world relatively small (<100 creatures and a relatively finite space they could move in) so each pc handled one instance of a world and after each generation there was a process of sharing data so successful brains were available to each world for continued evolution.
My dream was to scale this thing up significantly, but life intervened, maybe someday.
The biggest constraint I've seen with scaling up these simulations is maintaining coherence of population dynamics across all of the processing units. The less often population members are exchanged, the more likely you will wind up with a decoupled population and stuck in a ditch somewhere. Since modern CPUs can go so damn fast, you need to exchange members quite frequently. Memory bandwidth and latency are the real troublemakers.
You can spread a simulation across many networked nodes, but then your effective cycle time is potentially millions of times greater than if you keep it all in one socket. I think the newest multi-die CPUs hit the perfect sweet spot for these techniques (~100ns latency domain).