Skip to main content

Building a Frame Field Dataset

Frame field models (such as the Frame Field Learning approach for polygon extraction) require additional auxiliary masks beyond a standard polygon segmentation mask. This guide explains the extended CSV schema, the FrameFieldSegmentationDataset class, and how to configure it.

What Extra Masks Are Needed

Standard segmentation only requires a polygon mask. Frame field models also learn:

Mask typeWhat it encodes
boundary_maskPixels belonging to polygon edges / boundaries
vertex_maskPixels at polygon corners / vertices
crossfield_maskA two-channel angle map encoding the dominant polygon edge orientation
distance_maskPer-pixel distance transform from the nearest polygon edge
size_maskPer-pixel polygon-size normalisation map

All of these can be generated automatically from vector polygon data using the build-mask CLI mode. See the Building Training Masks from Vector Data guide.

Extended CSV Schema

The CSV index file extends the base schema with a column for each auxiliary mask. Any column whose corresponding return_* flag is false may be omitted.

ColumnDescription
imagePath to the input image
polygon_maskPath to the polygon segmentation mask (default mask key)
boundary_maskPath to the boundary mask
vertex_maskPath to the vertex mask
crossfield_maskPath to the crossfield angle map
distance_maskPath to the distance transform mask
size_maskPath to the size normalisation mask

Example CSV

image,polygon_mask,boundary_mask,vertex_mask,crossfield_mask,distance_mask,size_mask
images/tile_001.tif,polygon_masks/tile_001.png,boundary_masks/tile_001.png,vertex_masks/tile_001.png,crossfield_masks/tile_001.png,distance_masks/tile_001.png,size_masks/tile_001.png
images/tile_002.tif,polygon_masks/tile_002.png,boundary_masks/tile_002.png,vertex_masks/tile_002.png,crossfield_masks/tile_002.png,distance_masks/tile_002.png,size_masks/tile_002.png
tip

The build-mask CLI generates all of these mask files and writes the CSV in one step. You rarely need to populate this CSV manually.

The FrameFieldSegmentationDataset Class

FrameFieldSegmentationDataset extends SegmentationDataset with logic for loading multiple mask channels and assembling the structured output dictionary expected by frame field model trainers.

Constructor Parameters

ParameterTypeDefaultDescription
input_csv_pathPathrequiredPath to the CSV index file
root_dirstrNoneRoot directory for relative paths
augmentation_listlistNonealbumentations transform list
data_loaderconfigNoneDataLoader keyword arguments
image_keystr"image"CSV column name for images
mask_keystr"polygon_mask"CSV column name for polygon masks
multi_band_maskboolFalseIf True, load polygon/boundary/vertex from a single 3-band mask file
boundary_mask_keystr"boundary_mask"CSV column name for boundary masks
return_boundary_maskboolTrueWhether to load and return the boundary mask
vertex_mask_keystr"vertex_mask"CSV column name for vertex masks
return_vertex_maskboolTrueWhether to load and return the vertex mask
crossfield_mask_keystr"crossfield_mask"CSV column name for the crossfield angle map
return_crossfield_maskboolTrueWhether to load and return the crossfield mask
distance_mask_keystr"distance_mask"CSV column name for the distance transform
return_distance_maskboolTrueWhether to load and return the distance mask
size_mask_keystr"size_mask"CSV column name for the size mask
return_size_maskboolTrueWhether to load and return the size mask
image_widthint224Width used by the fallback resize transform when a crop yields no objects
image_heightint224Height used by the fallback resize transform
n_first_rows_to_readintNoneLimit CSV rows read

The multi_band_mask Option

When multi_band_mask: true, the dataset expects the mask_key column to point to a single multi-band image whose:

  • Band 0 = polygon mask
  • Band 1 = boundary mask
  • Band 2 = vertex mask

This can reduce the number of files on disk at the cost of more complex mask preparation.

Fallback Resize Transform

If an augmented crop contains no object pixels (all-background), the dataset automatically falls back to a resize-based transform (albumentations.Resize to image_height × image_width followed by Normalize and ToTensorV2) to guarantee that the returned sample has valid content.

Dataset Output Dictionary

__getitem__ returns a dictionary with the following keys:

KeyShapedtypeDescription
idxscalarSample index
pathstrAbsolute path to the source image
image(C, H, W)torch.TensorImage tensor
gt_polygons_image(3, H, W)torch.float32Stacked polygon / boundary / vertex masks
class_freq(3,)torch.float32Per-class mean pixel frequency
gt_crossfield_angle(1, H, W)torch.float32Crossfield angle map (only if return_crossfield_mask)
distances(1, H, W)torch.float32Distance transform (only if return_distance_mask)
sizes(1, H, W)torch.float32Size map (only if return_size_mask)
note

The crossfield, distance, and size masks are loaded as raw floating-point arrays (not binarised), unlike the polygon/boundary/vertex masks which are loaded as binary uint8 arrays.

Full YAML Configuration Example

# configs/dataset/frame_field_train.yaml

train_dataset:
_target_: pytorch_segmentation_models_trainer.dataset_loader.dataset.FrameFieldSegmentationDataset
input_csv_path: /data/buildings/train.csv
root_dir: /data/buildings
# Mask column names (must match CSV header)
mask_key: polygon_mask
boundary_mask_key: boundary_mask
return_boundary_mask: true
vertex_mask_key: vertex_mask
return_vertex_mask: true
crossfield_mask_key: crossfield_mask
return_crossfield_mask: true
distance_mask_key: distance_mask
return_distance_mask: true
size_mask_key: size_mask
return_size_mask: true
# Fallback resize dimensions when a crop has no objects
image_width: 512
image_height: 512
multi_band_mask: false
augmentation_list:
- _target_: albumentations.RandomCrop
height: 512
width: 512
- _target_: albumentations.HorizontalFlip
p: 0.5
- _target_: albumentations.VerticalFlip
p: 0.5
- _target_: albumentations.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- _target_: albumentations.pytorch.ToTensorV2
data_loader:
shuffle: true
num_workers: 4
pin_memory: true
batch_size: 4
drop_last: true

val_dataset:
_target_: pytorch_segmentation_models_trainer.dataset_loader.dataset.FrameFieldSegmentationDataset
input_csv_path: /data/buildings/val.csv
root_dir: /data/buildings
mask_key: polygon_mask
boundary_mask_key: boundary_mask
return_boundary_mask: true
vertex_mask_key: vertex_mask
return_vertex_mask: true
crossfield_mask_key: crossfield_mask
return_crossfield_mask: true
distance_mask_key: distance_mask
return_distance_mask: true
size_mask_key: size_mask
return_size_mask: true
image_width: 512
image_height: 512
multi_band_mask: false
augmentation_list:
- _target_: albumentations.Resize
height: 512
width: 512
- _target_: albumentations.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
- _target_: albumentations.pytorch.ToTensorV2
data_loader:
shuffle: false
num_workers: 4
pin_memory: true
batch_size: 4
drop_last: false

Disabling Specific Mask Types

If your model does not use distance or size masks, you can disable them to save I/O:

return_distance_mask: false
return_size_mask: false

The corresponding keys (distances, sizes) will then be absent from the output dictionary.

Generating the Masks

All auxiliary masks are generated from polygon vector data using the build-mask mode. Refer to the Building Training Masks from Vector Data guide for the complete workflow.