Skip to content

TinyLM

Building a language model from scratch — pure Python loops to CUDA kernels.

What is this?

TinyLM is a documented journey of building a language model from absolute scratch. Not "from scratch using PyTorch's transformer module" — actually from scratch. Every operation implemented by hand, every gradient derived manually, every optimization measured before and after.

Why?

Most LLM learning resources either:

  • Hand-wave the internals ("the transformer processes tokens")
  • Dump production code without building intuition first

This project does neither. We start with Python loops, break things intentionally, fix them, profile them, and optimize them one step at a time.

The Journey

Phase What Why
0 Pure Python — forward pass See the naked math
0.1 Manual backprop Truly understand gradients
0.2 NumPy + PyTorch autograd Verify your gradients
0.3 PyTorch manual ops + GPU Real training
0.4 PyTorch proper Establish baseline
1 Modernization (RoPE, GQA, Flash Attention) One change at a time
2 Hardware optimization (CUDA, Triton, quantization) Squeeze the hardware

Hardware

  • GPU: NVIDIA RTX 4050 Laptop (6GB VRAM)
  • OS: Ubuntu 24.04.3 LTS
  • CUDA: 12.8 (via PyTorch wheel)

Quick Start

git clone https://github.com/thatAverageGuy/TinyLM.git
cd TinyLM
uv venv --python 3.11.9
source .venv/bin/activate
uv sync