CLI Reference
Mono Quant includes a command-line interface monoquant for quick quantization and CI/CD integration.
Installation
The CLI is installed automatically with Mono Quant:
Verify installation:
Overview
Global Options
| Option | Short | Description |
|---|---|---|
--version |
Show version and exit | |
--verbose |
-v |
Enable verbose output |
--help |
-h |
Show help message |
Commands
quantize
Quantize a model.
Options
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
required | Path to input model file |
--output |
-o |
auto | Output file path (auto: {input}_quantized.{ext}) |
--bits |
-b |
8 |
Quantization bits: 4, 8, or 16 |
--dynamic |
-d |
false |
Use dynamic quantization (no calibration) |
--scheme |
-s |
symmetric |
Quantization scheme: symmetric or asymmetric |
--observer |
minmax |
Calibration observer: minmax, movingaverage, histogram |
|
--strict |
false |
Exit immediately on errors (CI/CD mode) | |
--help |
Show command help |
Examples
Dynamic INT8 quantization:
Static INT4 quantization:
# Note: Calibration data must be prepared via Python API first
monoquant quantize --model model.pt --bits 4 --observer histogram
Custom output path:
CI/CD mode with strict error handling:
validate
Validate an existing quantized model.
Options
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
required | Path to quantized model file |
--reference |
-r |
Path to original model (for comparison) | |
--help |
Show command help |
Examples
monoquant validate --model model_quantized.pt
# With original model for comparison
monoquant validate --model model_quantized.pt --reference model.pt
info
Display information about a model file.
Examples
Output includes: - Model size - Quantization parameters (bits, scheme) - Format (PyTorch/Safetensors) - Metadata (if available)
compare
Compare two model files.
Examples
Output includes: - Size comparison - Compression ratio - Format differences
calibrate
Prepare calibration data for static quantization.
Options
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
required | Path to model file |
--data |
-d |
required | Path to calibration data |
--output |
-o |
calibration_data.pt |
Output calibration data file |
--samples |
-n |
150 |
Number of calibration samples |
--help |
Show command help |
Note: Calibration data preparation typically requires Python API. This command provides basic functionality.
Exit Codes
| Code | Meaning |
|---|---|
0 |
Success |
1 |
General error |
2 |
Configuration error |
3 |
Quantization error |
4 |
Validation error |
5 |
I/O error |
CI/CD Integration
GitHub Actions Example
name: Quantize Model
on: [push]
jobs:
quantize:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python: '3.10'
- name: Install Mono Quant
run: pip install mono-quant
- name: Quantize model
run: monoquant quantize --model model.pt --bits 8 --dynamic --strict
GitLab CI Example
quantize:
image: python:3.10
script:
- pip install mono-quant
- monoquant quantize --model model.pt --bits 8 --dynamic --strict
artifacts:
paths:
- model_quantized.pt
Progress Bars
The CLI automatically shows progress bars for: - Calibration processing - Quantization progress - Model loading/saving
Progress bars are disabled in CI environments (detected via CI environment variable).
To manually disable:
# Set CI environment variable
CI=1 monoquant quantize --model model.pt --bits 8
# Or redirect output
monoquant quantize --model model.pt --bits 8 | cat
Aliases
The mq command is an alias for monoquant:
Next Steps
- Commands - Detailed command reference
- Examples: CI/CD - CI/CD pipeline examples