On the Shoulders of LLMs: From LLM Optimization to LLM Agents

Summary

The tutorial will start with the basic architecture of LLMs and move on to advanced optimization techniques. Participants will learn how to enhance the performance of LLMs during the inference stage by optimizing KV cache for quicker response times, extended memory capabilities, and improved answer quality. Furthermore, the tutorial will cover diverse LLM reasoning method including, Chain-of-Thought (CoT) reasoning, to enhance the interpretability, controllability, and flexibility of LLMs.
With a robust understanding of LLM enhancements, we turn to the emerging field of language agents powered by LLMs. The emergence of LLMs has significantly accelerated the evolution of AI agents, pushing closer to the long-standing goal of building intelligent, autonomous agents that can learn and act in distinct environments. This session will guide attendees through the concepts of agents, how LLMs empower these agents, and the challenges these agents might face in the future aiming to equip participants with the knowledge to effectively implement and manage LLMs in a responsible and efficient manner.

Resources

Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents [Paper]
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents [Paper]
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [Paper]
You Only Look at Screens: Multimodal Chain-of-Action Agents [Paper]
Multimodal Chain-of-Thought Reasoning in Language Models [Paper]
Automatic Chain of Thought Prompting in Large Language Models [Paper]
Identifying the Risks of LM Agents with an LM-Emulated Sandbox [Paper]
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents [Paper]
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations [Paper]
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [Paper]
Efficient Memory Management for Large Language Model Serving with PagedAttention [Paper]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints [Paper]
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models [Paper]
SirLLM: Streaming Infinite Retentive LLM [Paper]

Summary

Resources

Tutorial Organizers

Mr. Zuchao Li

Ms. Yao Yao

Mr. Zhuosheng Zhang

Tutorial Contributors

Mr. Teng Xiao

Mr. Luohe Shi

Mr. Tongxin Yuan