Projects — Zuchao Li

Deploying foundation models on edge devices via quantization, distillation, structured sparsity, and system co-design for efficient memory/latency.

General-purpose optimization for deep models and LLMs.

Efficient LLM inference via KV cache reduction, prompt compression, streaming retention, and token-efficient decoding.

Layout analysis, structure understanding, and intelligent document systems across scanned and born-digital documents.

Syntactic and semantic structure induction with and for LLMs: in-context parsing, alignment, and robust structured reasoning.

All Projects