Hacker Newsnew | past | comments | ask | show | jobs | submit | jonathanlei's commentslogin

It's as difficult as a serverless provider to grow as it was for CPUs before GPUs came along.

Many companies overinvest in fully-owned hardware, rather than renting from clouds. Owning hardware means you underwrite unrented inventory costs and prevents you from scaling. H100 pricing is now lower than any self-hosted option, even without factoring the TCO & headcount.

(Disclaimer: I work at a GPU cloud Voltage Park -- with 24k H100s as low as $2.25/hr [0] -- but Fly.io is not the only one I've noticed purchase hardware when renting might have saved some $$$)

[0] https://dashboard.voltagepark.com/


Absolutely - GPUs are definitely not a very liquid asset. As someone who works at a GPU neocloud provider (Voltage Park), server assets at scale definitely face a huge slippage, you can buy for $1 and get quotes for $1.50 but only be able to sell for $0.60


Hello! If you're interested in monetizing those GPUs, I'd be happy to rent them (all 400!) and offer those to customers of the cloud I work at :)

jonathan [at] tensordock.com


If you want 512 H100s connected with infiniband: https://lambdalabs.com/service/gpu-cloud/1-click-clusters


Congrats on the launch! This is huge, and it's really cool to see a cloud provider moving this direction - auction pricing for customers so that you always know you're getting the best deal on the market, while providing 100% utilization for you :)

I'm curious what some of the numbers mean: e.g. what does 688/1464 GPUs available indicate in the left gray box? What about there being 1040 GPUs in light gray, and 8 in dark gray?


Thanks Jonathan! Awesome to hear that you're excited about the auction.

Great questions on the auction system dashboard. The left gray box indicates how many GPUs are available for on-demand access. So in your specific example, 688/1464 GPUs means that 688 GPUs (out of a total available pool of 1464 H100s) are ready to be rented. Finally, the 1040 GPUs and 8 GPUs all indicate GPUs that are currently being allocated to customers.


Edit to add: the 1040 GPUs are orders that are below the current market price.


Hmm I did include a training workload as the second chart. My test workload was relatively small so I guess if the workload I ran spends a bit less GPU time comparatively to the CPU, given equal CPU for all workloads, would be an equalizing factor.

But even looking at the Lambda Labs benchmarks, I am surprised that the H100 PCIE barely outperforms the A100 SXM, for example. And it is meant to be a replacement for the A100 PCIE. 20% generational improvement yes, but I would have expected more?


>> My test workload was relatively small

This is the game changer. More memory and more interconnect speed = better

>> H100 PCIE barely outperforms the A100 SXM

This is the better interconnect... its only useful if your using it. IF you can fit your workload in the 80gb of the H100 then the SXM becomes far less useful.


Oops, just noticed the link isn’t clickable, here you go! https://tensordock.com/benchmarks


Utilization is not going to be 100% --- ever. Discounts are often given, and servers break down, etc.


Thanks! :)

We do have a managed container hosting service, but it's built on our own backend that auto-scales nodes for you when average GPU utilization surpasses a certain point, but it's not K8s -- which would have been a pain to configure given the distributed nature of all of our servers.

https://dashboard.tensordock.com/deploy_container


Hi! Jonathan from TensorDock here :)

We have our own supply base (sourced through https://tensordock.com/host), operate some of our own servers, and are not related to any other marketplace :)

We think we have better security & reliability than Vast.ai due to virtualization rather than Dockerization, as well as more strict access controls [1]. Additionally, you can run Windows VMs on us if you want :)

[1]https://tensordock.com/security


Jonathan from TensorDock (https://tensordock.com/) here - we listed two of our A100 and H100 clusters on the site.

The IB equipped on our clusters (can't speak to others) is 8x 400 Gbps. Most customers training foundational models are able to fully utilize that fabric in parallel.


Which HCAs are enabling that? You're using eight 4-link QSFPs here?, presuming this is NDR?

And out of curiosity, is aggregate bandwidth the normal marketing metric in this industry? In my neck of the woods this would be reported as an NDR400 system.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: