My Blog

September 13, 2025

An explanation of how the KV Cache makes Transformer inference fast and efficient.

August 31, 2025

A deep dive into the math and implementation styles of Rotary Positional Embeddings.

August 24, 2025

A practical demonstration of how fused quantization works.

August 23, 2025

Study of Grouped-Query, Sliding Windows, and the Attention sink with Code.

August 19, 2025

A step-by-step walkthrough of LLM quantization techniques.

All Posts