Release Notes
v1.0 - Mono Quant Initial Release (2026-02-03)
Features
Core Quantization
- ✅ INT8 quantization with per-channel scaling
- ✅ INT4 quantization with group-wise scaling (group_size=128)
- ✅ FP16 quantization for memory reduction
- ✅ Dynamic quantization (no calibration data required)
- ✅ Static quantization with calibration
Calibration
- ✅ MinMaxObserver (default, fast)
- ✅ MovingAverageMinMaxObserver (robust, EMA smoothing)
- ✅ HistogramObserver (outlier-aware, KL divergence)
- ✅ Calibration data from tensors or DataLoader
User Interface
- ✅ Unified
quantize()Python API - ✅
QuantizationResultwith.save()and.validate()methods - ✅ CLI with git-style subcommands (
monoquant) - ✅ Progress bars with CI/TTY auto-detection
Serialization
- ✅ PyTorch format (.pt/.pth) support
- ✅ Safetensors format support
- ✅ Metadata preservation (bits, scheme, scales, zero-points)
- ✅ Model dequantization back to FP32
Validation
- ✅ SQNR (signal-to-quantization-noise ratio) computation
- ✅ Model size comparison
- ✅ Load testing (round-trip validation)
- ✅ Accuracy warnings for aggressive quantization
Advanced Features
- ✅ Model-agnostic design (any PyTorch model)
- ✅ Layer skipping for INT4 (protects sensitive layers)
- ✅ Symmetric and asymmetric quantization schemes
- ✅ Custom exception hierarchy with actionable suggestions
- ✅ Zero-point clamping to prevent runtime errors
Statistics
- Requirements delivered: 30/30 (100%)
- Integration points: 8/8 verified
- E2E flows: 8/8 working
- Lines of code: 5,228 Python
- Files: 26 source files
- Technical debt: None identified
Dependencies
Required: - Python >= 3.11 - torch >= 2.0 - numpy >= 1.24
Included: - safetensors >= 0.4 (Safetensors format) - click >= 8.1 (CLI) - tqdm >= 4.66 (progress bars)
Optional: - mkdocs >= 1.6.0 (documentation) - mkdocs-material >= 9.7.0 (documentation theme) - mkdocstrings[python] >= 1.0.0 (API documentation)
Documentation
- Installation guide
- Quick start tutorial
- Basic usage examples
- CLI reference
- API documentation
- User guide
- Examples (dynamic INT8, static INT4, custom observer, CI/CD)
Known Limitations
- CLI does not support loading calibration data from files (use Python API)
- INT4 quantization requires calibration data (no dynamic INT4)
- No quantization-aware training (QAT) - build-phase only
- No ONNX/TFLite export (use dedicated conversion tools)
Future Enhancements (v2)
- Genetic optimization for quantization parameters
- Experiment tracking and logging
- Mixed precision (different bits per layer)
- LLM.int8() style outlier detection
- Automatic layer sensitivity analysis
Older Versions
No older versions. v1.0 is the initial release.