Nanometers

Research, writing, and projects at the intersection of machine learning, systems, and science.

Scroll
0+Publications
0Awards
0Projects
0K+Citations

Featured Research

NeurIPSAccepted

Efficient Sparse Mixture-of-Experts for Sub-Quadratic Inference on Long Contexts

We present a sparse mixture-of-experts architecture that achieves sub-quadratic inference cost on sequences exceeding 128k tokens. By introducing a locality-sensitive routing mechanism that exploits the low-rank structure of attention patterns, our method reduces peak memory by 3.8x while maintaining 98.2% of dense model quality across standard long-context benchmarks. We provide theoretical guarantees on routing stability and demonstrate wall-clock speedups on commodity hardware.

transformersmixture-of-expertsefficient-inference
ICLRIn Review

Differentiable Navier-Stokes Solvers for Turbulence-Aware Neural Surrogate Models

We develop a fully differentiable spectral Navier-Stokes solver that enables end-to-end training of neural surrogate models for turbulent flows. Our approach embeds physical conservation laws directly into the computational graph, allowing gradient-based optimization to respect divergence-free constraints without projection steps. On the Kolmogorov flow benchmark, the resulting surrogates achieve 12x speedup over classical solvers at Reynolds numbers up to 10,000 with bounded error accumulation over 500 rollout steps.

differentiable-physicsturbulenceneural-surrogates
OSDIPublished

Zero-Copy Distributed KV-Cache for Disaggregated LLM Serving

We propose a zero-copy distributed key-value cache architecture for serving large language models across disaggregated GPU clusters. By leveraging RDMA-based memory transfers and a novel page-table abstraction for attention state, our system eliminates serialization overhead during prefill-decode handoffs. Evaluations on a 64-GPU cluster show 2.1x improvement in time-to-first-token and 40% higher throughput compared to state-of-the-art serving frameworks under production trace workloads.

systemsllm-servingdistributed-systems

Latest Writing