Building a Segmentation Dataset
This guide explains how to prepare data and configure the SegmentationDataset class for semantic segmentation training. The dataset system is built around a CSV index file that points to image and mask pairs on disk.
Building a Frame Field Dataset
Frame field models (such as the Frame Field Learning approach for polygon extraction) require additional auxiliary masks beyond a standard polygon segmentation mask. This guide explains the extended CSV schema, the FrameFieldSegmentationDataset class, and how to configure it.
Sliding-Window Patch Dataset
RasterPatchDataset provides systematic, deterministic sliding-window training directly from full-size raster images (GeoTIFF, etc.) — no pre-generated tiles on disk required.
Building Detection & Instance Segmentation Datasets
This guide covers the ObjectDetectionDataset and InstanceSegmentationDataset classes, which extend the base CSV-driven dataset system with bounding-box and instance mask support.
Building Training Masks from Vector Data
The build-mask CLI mode automates the generation of all raster mask files needed for segmentation and frame field training. Given a set of georeferenced raster images and a vector polygon source, it produces every mask type and writes a ready-to-use CSV dataset index.
Training a Semantic Segmentation Model
This guide walks through setting up and running a full semantic segmentation training job using the Model base class, which wraps a segmentationmodelspytorch architecture inside PyTorch Lightning.
Balanced Dataset Sampling
Class imbalance is a common problem in geospatial segmentation: a dataset of 155 000 patches
CSV Windowed Dataset
CSVWindowedSegmentationDataset provides a way to read specific patches from large GeoTIFFs based on coordinates (offsets) defined in a CSV file. It uses rasterio windowed read to load only the required pixels, making it extremely memory-efficient for large images.
Export MBTiles Mask-Aligned Images
Use this tool before training when imagery is stored in MBTiles and labels are
Training a Frame Field Segmentation Model
Frame field segmentation extends standard semantic segmentation with an additional crossfield output that encodes the local orientation of boundaries. This makes the predicted contours geometrically regular and well-suited for building footprint extraction and subsequent polygon reconstruction.
CoreSet Selection
After generating a balanced dataset CSV with build-balanced-dataset, the pool may still be
CSV Windowed Image Dataset
CSVWindowedImageDataset is the image-only counterpart to CSVWindowedSegmentationDataset. It reads specific patches from large images based on coordinates (offsets) defined in a CSV file, without requiring masks.
MBTiles Mask Dataset
MBTilesMaskWindowedDataset trains segmentation models from MBTiles imagery and
MBTiles Crops Dataset
MBTilesCropsGeoTifMaskDataset trains segmentation models from pre-selected crop windows paired with a spatially-aligned mask from any rasterio-readable source.
Training Object Detection & Instance Segmentation Models
This guide covers training object detection and instance segmentation models using ObjectDetectionPLModel and InstanceSegmentationPLModel. Both classes wrap torchvision.models.detection architectures inside PyTorch Lightning using the same Hydra config system.
Advanced Training Features
This guide covers advanced configuration topics that apply across all model types: compound losses, GPU augmentations, mixed precision, gradient clipping, OneCycleLR, multispectral weight adaptation, and checkpointing.
H3 Spatial Val/Test Split
Standard random train/val/test splits leak spatial autocorrelation: patches from the same area
MBTiles Multiclass Mask Builder
Use this tool when you have an MBTiles file that defines the reference grid,
Autoencoder Clustering Losses
Three loss functions designed for Phase-2 DCEC-style fine-tuning of
Running Inference
After training a model, you can run inference on new images using either of two CLI modes: predict for single-image sliding-window processing, or predict-from-batch for batch processing via PyTorch Lightning's Trainer.predict.
Reproducible Training
Adding a seed to your training YAML guarantees that two runs with the same configuration, dataset, and hardware produce byte-identical results. This is essential for ablation studies, debugging, and comparing experiments.
Dataset Builder Tools
The dataset builder tools help you prepare segmentation datasets from raw rasters and vector annotations. They are accessible via the pytorch-smt-tools CLI.
Dataset Distillation (DDOQ)
The Dataset Distillation pipeline in pytorch_smt implements the DDOQ (Dataset Distillation by Optimal Quantization) method. This approach reframes the compression of massive datasets as an "optimal quantization" problem within latent spaces.
Generic Autoencoder
The GenericAutoencoder is a flexible architecture designed for image reconstruction and self-supervised learning tasks. It allows combining encoders from Segmentation Models PyTorch (SMP) or HuggingFace Transformers with a reconstruction decoder.
MBTiles Polygon Dataset
MBTilesPolygonDataset reads paired image and mask tiles directly from
Parquet Support & Caching
The framework supports Apache Parquet for dataset metadata, offering significantly faster loading times and lower memory consumption compared to standard CSV files.
Polygonization: Masks to Vector Polygons
Polygonization converts a raster segmentation mask — a grid of predicted class probabilities or binary labels — into a set of vector polygon geometries. The output can be written as GeoJSON, Shapefile, or directly into a PostGIS database.
Windowed Image Datasets
These datasets are designed to extract patches from full-size rasters using a deterministic sliding-window (grid) approach. Unlike random-crop datasets, they allow you to process the entire area of your images in a fixed grid, which is particularly useful for validation, testing, and consistent performance monitoring.
Evaluation Pipeline
The evaluation pipeline lets you compare the segmentation quality of one or more trained models side-by-side on a shared test dataset. It runs predictions for each experiment, computes pixel-level metrics from confusion matrices, aggregates results across images, and optionally generates comparison visualizations.
Raster Utilities
Utility tools for preprocessing raster files. Accessible via the pytorch-smt-tools CLI.
Dataset Conversion
This guide covers converting segmentation datasets into formats required by specialized models. Currently the primary supported conversion target is the Polygon-RNN format, which requires cropped per-object images, normalized polygon files, and a generated CSV index.
GEE LULC Downloader
Download land-use/land-cover (LULC) rasters from Google Earth Engine for any
Segmentation Visualization
Tools for colorizing segmentation masks and building comparison grids that put ground truth alongside one or more prediction sets.
Experiments Runner
The Experiments Runner lets you repeat a training configuration multiple times
K-Fold Cross-Validation
This guide explains how to run spatially-correct k-fold cross-validation with
AlphaEarth Foundation Embeddings
The soft-label preprocessing tools can blend AlphaEarth Foundation (AEF)
Classic ML Pipeline
The classic_ml subpackage provides a complete GPU-accelerated classic machine
Soft-Label Training
Soft-label training replaces hard, one-hot segmentation masks with
Co-Teaching Training
Co-teaching is a noise-robust training strategy for weakly supervised
LULC Input Dataset
LulcInputDataset, LulcInputWindowedDataset, and