Latest updates
Streaming Infinite Retentive LLM — long-context streaming with efficient state retention.
Sparse is Enough in Fine-tuning Pre-trained Large Language Models — efficient sparse fine-tuning.