Quantization in LLMs - Search Videos

Quantization in modern LLMs - Advanced Quantization Techniques for Large Language Models Video Tutorial | LinkedIn Learning, formerly Lynda.com

Quantization in modern LLMs - Advanced Quantization Technique…

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Hammad Armghan, PhD posted on the topic | LinkedIn

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Ham…

1 views1 month ago

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB Source: Ivan Fioravanti | Thanh Hoang

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writin…

1.1K views1 month ago

FacebookThanh Hoang

What is Quantization? | IBM

What is Quantization? | IBM

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16)

LLMs can take gigabytes of memory to store, which limits what can be …

6.8K viewsApr 15, 2024

FacebookAndrew Ng

[LoRA] Unsloth Fine-Tuning: LoRA and QLoRA Guide. Efficient LLM fine-tuning using low-rank adapters

[LoRA] Unsloth Fine-Tuning: LoRA and QLoRA Guide. Efficient LLM fi…

389 views1 month ago

YouTubeAI Podcast Series. Byte Goose AI.

LLM Inference on a Budget: Speed vs. Cost! #llm #inference #optimization

LLM Inference on a Budget: Speed vs. Cost! #llm #inference #optimiz…

YouTubeThe Code Architect

Run Giant AI Models on Your Laptop 🚀 (INT8 Explained)

6 views2 months ago

YouTubeForward Logic

🤯 Run LLMs on Your Laptop?! The Quantization Secret! #Shorts

YouTubeCodeTapasya

Quantization Making LLMs Lightning Fast & Tiny

8 views2 months ago

YouTubeThe Code Architect

What Is Quantization | Quantization | TensorTeach

315 viewsNov 20, 2024

YouTubeTensorTeach

Understanding Symmetric Quantization | Quantization | Tens…

276 viewsNov 20, 2024

YouTubeTensorTeach

Host a AI Server

453 viewsMar 27, 2024

YouTubeAI Arcade

Optimize Your AI - Quantization Explained

406.9K viewsDec 28, 2024

YouTubeMatt Williams

LLM Explained | What is LLM

399.7K viewsAug 22, 2023

YouTubecodebasics

What is LLM quantization?

25.6K viewsNov 6, 2023

YouTubeAirtrain AI

MR-GPTQ: Better FP4 Microscaling for LLMs

109 views5 months ago

YouTubeAI Research Roundup

Quantization in Deep Learning (LLMs)

11.5K viewsSep 22, 2023

YouTubeAI Bites

BitNet Distillation: 1.58‑bit LLMs from FP16

171 views4 months ago

YouTubeAI Research Roundup

LLM Mastery in 30 Days | Course Introduction

3.2K viewsSep 7, 2024

YouTubeNeural Hacks with Vasanth

AGI Dreams Podcast – October 01, 2025

2 views5 months ago

YouTubeRobert Lee

Understanding Double Quantization for LLMs

80 views8 months ago

YouTubeMachine Learning Courses

This Training Trick Fixes AI Quantization (3-Bit Secret)

5 views4 months ago

YouTubeCollapsedLatents

L 2 Ollama | Run LLMs locally

8.8K viewsJul 15, 2024

YouTubeCode With Aarohi

Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)

22.4K viewsNov 18, 2024

YouTubeAdam Lucek

NVIDIA GPU Quantization Support for LLMs

31 views3 months ago

YouTubeAIProgrammingHardware

QLoRA: The Gen AI Breakthrough You Need to See

457 viewsJan 20, 2025

YouTubeSuper Data Science

LLMs Quantization Crash Course for Beginners

5.7K viewsMay 19, 2024

YouTubeAI Anytime

Scale-Aware Memory Strategies for Reasoning LLMs

15 views5 months ago

YouTubeAI Research Roundup

Ollama.ai: A Developer's Quick Start Guide!

6.3K viewsFeb 2, 2024

YouTubeAI Arcade

See more videos