A collection of my thoughts and writings.
September 13, 2025
An explanation of how the KV Cache makes Transformer inference fast and efficient.
August 31, 2025
A deep dive into the math and implementation styles of Rotary Positional Embeddings.
August 24, 2025
A practical demonstration of how fused quantization works.
August 23, 2025
Study of Grouped-Query, Sliding Windows, and the Attention sink with Code.
August 19, 2025
A step-by-step walkthrough of LLM quantization techniques.