Mono Quant

Ultra-lightweight, model-agnostic quantization for PyTorch

What is Mono Quant?

Mono Quant is a simple, reliable model quantization package for PyTorch with minimal dependencies. Just torch and numpy, no bloat.

Key Features

Model-Agnostic - Works with any PyTorch model: HuggingFace, local, or custom
Multiple Modes - INT8, INT4, and FP16 quantization
Flexible Calibration - Dynamic (no data) or static (with calibration data)
Robust Validation - SQNR metrics, size comparison, and accuracy warnings
Dual Interface - Python API for automation, CLI for CI/CD
Build-Phase Only - Quantize during build, deploy lightweight models

Installation

pip install mono-quant

Quick Start

from mono_quant import quantize

# Quantize a model to INT8
result = quantize(model, bits=8, dynamic=True)

# Save the quantized model
result.save("model_quantized.pt")

# Check metrics
print(f"Compression: {result.info.compression_ratio:.2f}x")
print(f"SQNR: {result.info.sqnr_db:.2f} dB")

Or use the CLI:

monoquant quantize --model model.pt --bits 8 --dynamic

Why Mono Quant?

Most quantization tools are tied to specific frameworks (HuggingFace, TFLite) or require heavy dependencies. Mono Quant fills the niche of "just quantize the weights, nothing else."

Design Philosophy

Aspect	Approach
Model Loading	You load the model, we quantize it
Dependencies	Only `torch` required
Use Case	Build-phase (CI/CD, local development)
Scope	Quantization only, no runtime or serving

Quantization Modes

Dynamic Quantization (No Calibration)

Fastest option, no data required. Good for inference speedup.

result = quantize(model, bits=8, dynamic=True)

Static Quantization (With Calibration)

Best accuracy, requires representative data.

result = quantize(
    model,
    bits=8,
    dynamic=False,
    calibration_data=calibration_tensors
)

INT4 Quantization

Maximum compression with group-wise scaling.

result = quantize(
    model,
    bits=4,
    dynamic=False,
    calibration_data=calibration_tensors,
    group_size=128  # Default
)

What's Next?

Installation Guide - Set up Mono Quant
Quick Start - Step-by-step tutorial
User Guide - Deep dive into features
CLI Reference - Command-line usage
API Reference - Python API details
Examples - Real-world code samples

License

MIT License - see LICENSE for details.