Low-cost LLM Deployment — overview
Illustration of low-cost deployment pipeline (quantization, distillation, KV cache, runtime co-design).

Overview

This project targets cost-efficient deployment of large language models across edge and consumer devices. We study quantization, distillation, structured sparsity, KV cache strategies, prompt compression, and runtime co-design to reduce memory and latency under real-world constraints.

Technical Tracks

Outputs