Artwork

Inhoud geleverd door EDGE AI FOUNDATION. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door EDGE AI FOUNDATION of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Player FM - Podcast-app
Ga offline met de app Player FM !

Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University

30:29
 
Delen
 

Manage episode 453994436 series 3574631
Inhoud geleverd door EDGE AI FOUNDATION. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door EDGE AI FOUNDATION of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

Send us a text

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Hoofdstukken

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

20 afleveringen

Artwork
iconDelen
 
Manage episode 453994436 series 3574631
Inhoud geleverd door EDGE AI FOUNDATION. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door EDGE AI FOUNDATION of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

Send us a text

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Hoofdstukken

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

20 afleveringen

همه قسمت ها

×
 
Loading …

Welkom op Player FM!

Player FM scant het web op podcasts van hoge kwaliteit waarvan u nu kunt genieten. Het is de beste podcast-app en werkt op Android, iPhone en internet. Aanmelden om abonnementen op verschillende apparaten te synchroniseren.

 

Korte handleiding