Ntropy (https://ntropy.com) | ONSITE, Full-time | Office in San Francisco and London, UK | SWE, ML | $130-200k
Ntropy is building domain-specific caching infrastructure for LLMs that makes it possible to reduce p99 latencies by 1000x and costs by 4-5 orders of magnitude for real-world, high-throughput tasks. Currently we are targeting the financial and banking domain and will expand over time. We are well funded by top investors (QED & Lakestar) and are launching an engineering office in SF this month. We are hiring for the following roles:
- Machine learning engineer (San Francisco, CA)
- Full-stack / LLM engineer (San Francisco, CA)
- (senior) Backend engineer (San Francisco, CA)
- Front-end engineer (London, UK)
It’s mostly coming from using the Arm NEON intrinsics, not much magic. While working on the library, I was shocked to see how under-vectorized LibC is on Arm. A lot of improvement potential beyond strings.
Amazon, Microsoft, Nvidia, Ampere, Apple, Qualcomm, and all the other Arm-based CPU vendors should really consider investing more into the ecosystem. The hardware is very capable, they shouldn’t be losing against x86 in so many benchmarks…
I'd say that SIMD and even moreso CPU internals knowledge is not quite common and upmost performance is I think not among the highest priority goals in libc/libc++/libstdc++. The ones who need it will implement it themselves. The ones that don't need them won't even notice.
Implementation effort and maintanence is by several factors larger than usual "good enough" scalar implementation.
I keep running into issues where I update a selector which change the memory reference and causes performance issues. Have you found any good ways of avoiding that?
We're measuring Google's new INP metric + we log long event handlers and long task. It's really noisy though and hard to see regressions in the aggregate metrics.
We'll probably share another post about this stuff soon. If anyone has found something that works please let me know! ([email protected])
Ntropy enables companies to build a new generation of products and services on top of financial transaction data.
We recently closed our series A and are hiring across data-QA, backend and full-stack roles to work on:
* Optimizing and extending high throughput ML-focused service
* Developing new end-to-end financial insight products on top of our core technology
* Stack: Python / PyTorch / Rust / lots of GPUs / Managed cloud services & k8s