This chalk talk delves into optimizing and deploying large language models (LLMs) at scale. Explore large model hosting, optimization techniques, model partitioning, batch processing, and model fine-tuning.
This chalk talk delves into optimizing and deploying large language models (LLMs) at scale. Explore large model hosting, optimization techniques, model partitioning, batch processing, and model fine-tuning.
Haowen Huang / Hong Kong
Amazon
Haowen Huang is senior evangelist at Amazon Web Services, based in Hong Kong. He has more than 20 years of experience in architecture design, technology, and startup management across the telecommunications, internet, and cloud computing industries. Additionally, he has worked for renowned companies like Microsoft, Sun Microsystems, and China Telecom. His current research interests include generative AI, large language models (LLMs), machine learning, and data science.