Overview

We develop techniques to deploy foundation models on edge devices under tight memory and latency budgets, combining model compression (quantization, distillation, pruning/sparsity) with runtime and system optimizations.

Techniques

Outcomes