User Guide | PyTorch Segmentation Models Trainer

📄️Building a Segmentation Dataset

This guide explains how to prepare data and configure the SegmentationDataset class for semantic segmentation training. The dataset system is built around a CSV index file that points to image and mask pairs on disk.

📄️Building a Frame Field Dataset

Frame field models (such as the Frame Field Learning approach for polygon extraction) require additional auxiliary masks beyond a standard polygon segmentation mask. This guide explains the extended CSV schema, the FrameFieldSegmentationDataset class, and how to configure it.

📄️Sliding-Window Patch Dataset

RasterPatchDataset provides systematic, deterministic sliding-window training directly from full-size raster images (GeoTIFF, etc.) — no pre-generated tiles on disk required.

📄️Building Detection & Instance Segmentation Datasets

This guide covers the ObjectDetectionDataset and InstanceSegmentationDataset classes, which extend the base CSV-driven dataset system with bounding-box and instance mask support.

📄️Building Training Masks from Vector Data

The build-mask CLI mode automates the generation of all raster mask files needed for segmentation and frame field training. Given a set of georeferenced raster images and a vector polygon source, it produces every mask type and writes a ready-to-use CSV dataset index.

📄️Training a Semantic Segmentation Model

This guide walks through setting up and running a full semantic segmentation training job using the Model base class, which wraps a segmentationmodelspytorch architecture inside PyTorch Lightning.

📄️Balanced Dataset Sampling

Class imbalance is a common problem in geospatial segmentation: a dataset of 155 000 patches

📄️CSV Windowed Dataset

CSVWindowedSegmentationDataset provides a way to read specific patches from large GeoTIFFs based on coordinates (offsets) defined in a CSV file. It uses rasterio windowed read to load only the required pixels, making it extremely memory-efficient for large images.

📄️Export MBTiles Mask-Aligned Images

Use this tool before training when imagery is stored in MBTiles and labels are

📄️Training a Frame Field Segmentation Model

Frame field segmentation extends standard semantic segmentation with an additional crossfield output that encodes the local orientation of boundaries. This makes the predicted contours geometrically regular and well-suited for building footprint extraction and subsequent polygon reconstruction.

📄️CoreSet Selection

After generating a balanced dataset CSV with build-balanced-dataset, the pool may still be

📄️CSV Windowed Image Dataset

CSVWindowedImageDataset is the image-only counterpart to CSVWindowedSegmentationDataset. It reads specific patches from large images based on coordinates (offsets) defined in a CSV file, without requiring masks.

📄️MBTiles Mask Dataset

MBTilesMaskWindowedDataset trains segmentation models from MBTiles imagery and

📄️MBTiles Crops Dataset

MBTilesCropsGeoTifMaskDataset trains segmentation models from pre-selected crop windows paired with a spatially-aligned mask from any rasterio-readable source.

📄️Training Object Detection & Instance Segmentation Models

This guide covers training object detection and instance segmentation models using ObjectDetectionPLModel and InstanceSegmentationPLModel. Both classes wrap torchvision.models.detection architectures inside PyTorch Lightning using the same Hydra config system.

📄️Advanced Training Features

This guide covers advanced configuration topics that apply across all model types: compound losses, GPU augmentations, mixed precision, gradient clipping, OneCycleLR, multispectral weight adaptation, and checkpointing.

📄️H3 Spatial Val/Test Split

Standard random train/val/test splits leak spatial autocorrelation: patches from the same area

📄️MBTiles Multiclass Mask Builder

Use this tool when you have an MBTiles file that defines the reference grid,

📄️Autoencoder Clustering Losses

Three loss functions designed for Phase-2 DCEC-style fine-tuning of

📄️Running Inference

After training a model, you can run inference on new images using either of two CLI modes: predict for single-image sliding-window processing, or predict-from-batch for batch processing via PyTorch Lightning's Trainer.predict.

📄️Reproducible Training

Adding a seed to your training YAML guarantees that two runs with the same configuration, dataset, and hardware produce byte-identical results. This is essential for ablation studies, debugging, and comparing experiments.

📄️Dataset Builder Tools

The dataset builder tools help you prepare segmentation datasets from raw rasters and vector annotations. They are accessible via the pytorch-smt-tools CLI.

📄️Dataset Distillation (DDOQ)

The Dataset Distillation pipeline in pytorch_smt implements the DDOQ (Dataset Distillation by Optimal Quantization) method. This approach reframes the compression of massive datasets as an "optimal quantization" problem within latent spaces.

📄️Generic Autoencoder

The GenericAutoencoder is a flexible architecture designed for image reconstruction and self-supervised learning tasks. It allows combining encoders from Segmentation Models PyTorch (SMP) or HuggingFace Transformers with a reconstruction decoder.

📄️MBTiles Polygon Dataset

MBTilesPolygonDataset reads paired image and mask tiles directly from

📄️Parquet Support & Caching

The framework supports Apache Parquet for dataset metadata, offering significantly faster loading times and lower memory consumption compared to standard CSV files.

📄️Polygonization: Masks to Vector Polygons

Polygonization converts a raster segmentation mask — a grid of predicted class probabilities or binary labels — into a set of vector polygon geometries. The output can be written as GeoJSON, Shapefile, or directly into a PostGIS database.

📄️Windowed Image Datasets

These datasets are designed to extract patches from full-size rasters using a deterministic sliding-window (grid) approach. Unlike random-crop datasets, they allow you to process the entire area of your images in a fixed grid, which is particularly useful for validation, testing, and consistent performance monitoring.

📄️Evaluation Pipeline

The evaluation pipeline lets you compare the segmentation quality of one or more trained models side-by-side on a shared test dataset. It runs predictions for each experiment, computes pixel-level metrics from confusion matrices, aggregates results across images, and optionally generates comparison visualizations.

📄️Raster Utilities

Utility tools for preprocessing raster files. Accessible via the pytorch-smt-tools CLI.

📄️Dataset Conversion

This guide covers converting segmentation datasets into formats required by specialized models. Currently the primary supported conversion target is the Polygon-RNN format, which requires cropped per-object images, normalized polygon files, and a generated CSV index.