Skip to main content

Monte Carlo Dropout at Test Time

What is MC Dropout?

MC Dropout (Gal & Ghahramani, 2016) produces uncertainty estimates from any model that already uses Dropout — with no changes to training.

The technique keeps Dropout layers active during inference and runs T stochastic forward passes. Because each pass uses a different random dropout mask, each pass is equivalent to sampling from an approximate posterior over model weights. The mean prediction across the T passes is the final output; their disagreement is a proxy for epistemic uncertainty.

PhaseStandard inferenceMC Dropout
TrainingDropout active (regularisation)Identical — no change
Inferencemodel.eval() disables dropoutDropout kept active
Forward passes1 (deterministic)T (stochastic)
OutputSingle predictionMean prediction + uncertainty map

Reference: Gal, Y. & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML 2016. https://arxiv.org/abs/1506.02142


Prerequisite: the model must have Dropout layers

Without Dropout layers, all T forward passes are identical and the uncertainty map is uniformly zero. A UserWarning is emitted at processor construction time if no layers are found.

For SMP models, set decoder_dropout in the model config:

model:
_target_: segmentation_models_pytorch.Unet
encoder_name: resnet50
encoder_weights: imagenet
in_channels: 3
classes: 5
decoder_dropout: 0.2 # <-- activates Dropout2d in the decoder

Other architectures that already have Dropout2d layers (e.g. HRNetOCR, custom UNet with dropout_2d) work out of the box.


Architecture compatibility

Model typeMC Dropout compatible?Notes
SMP models with decoder_dropout set✅ YesSet decoder_dropout > 0 in config
SMP models without decoder_dropout⚠️ DegenerateAll samples identical; uncertainty = 0
Custom UNet with dropout_2d✅ YesAlready has Dropout2d
HRNetOCR✅ YesInternal Dropout2d layers
EDL (EvidentialWrapper)⚠️ Not recommendedTwo separate uncertainty sources; semantics conflict
MoE models⚠️ RiskyRouter noise + dropout may interact unpredictably
Detection / PolygonMapper❌ Out of scopeMulti-head outputs; aggregation undefined

Uncertainty modes

"entropy" (default)

Predictive entropy of the mean probability distribution:

p̄(c|x) = (1/T) Σ_t p_t(c|x) # mean of T samples
H = -Σ_c p̄_c · log(p̄_c) # entropy of the mean

What it measures: Total uncertainty (epistemic + aleatoric combined). High wherever the model is unsure — class boundaries, shadows, out-of-distribution regions.

Range: [0, log(num_classes)]


"mutual_information" (BALD)

MI = H[p̄] − (1/T) Σ_t H[p_t]

What it measures: Epistemic uncertainty only — the disagreement between the T samples, regardless of each sample's individual confidence.

High when the samples give very different predictions (the model genuinely does not know). Low when all samples agree, even if they are all uncertain.

Range: [0, log(num_classes)]


When to use which

GoalRecommended mode
Filter low-quality predictionsentropy
Active learning: pick images to labelmutual_information
Out-of-distribution detectionmutual_information
General quality controlentropy

The uncertainty map

The uncertainty map is a single-band float32 GeoTIFF with the same resolution and CRS as the input image.

  • Each pixel contains a scalar value in [0, log(num_classes)]
  • Low values (near 0) → model is confident
  • High values → model is uncertain

Visualising in QGIS

  1. Open the _mc_uncertainty.tif file in QGIS
  2. Layer Properties → Symbology → Render type: Singleband pseudocolor
  3. Choose a colormap (e.g. viridis or hot)
  4. Apply: dark regions = confident, bright regions = uncertain

Usage: MCDropoutInferenceProcessor

Without uncertainty map (faster)

Runs T stochastic forward passes and returns the mean prediction — more robust than a single deterministic pass, with no extra output files.

inference_processor:
_target_: pytorch_segmentation_models_trainer.tools.inference.mc_dropout_inference_processor.MCDropoutInferenceProcessor
n_samples: 10
export_uncertainty_map: false # default — no uncertainty file written
num_classes: 5
model_input_shape: [512, 512]
step_shape: [256, 256]

With uncertainty map

inference_processor:
_target_: pytorch_segmentation_models_trainer.tools.inference.mc_dropout_inference_processor.MCDropoutInferenceProcessor
n_samples: 10
uncertainty_mode: entropy # or mutual_information
export_uncertainty_map: true
num_classes: 5
model_input_shape: [512, 512]
step_shape: [256, 256]
output_uncertainty_dir: /results/uncertainty # optional; defaults to seg dir

Output files per image:

  • {stem}.tif — class-index segmentation (uint8, 1 band)
  • {stem}_mc_uncertainty.tif — uncertainty map (float32, 1 band)

Usage: TTA uncertainty in MultiClassInferenceProcessor

TTA also generates T predictions (one per augmentation). When export_uncertainty_map=True and a tta_mode is set, the processor computes the uncertainty from those T TTA samples and exports a _uncertainty.tif alongside the segmentation.

inference_processor:
_target_: pytorch_segmentation_models_trainer.tools.inference.inference_processors.MultiClassInferenceProcessor
tta_mode: d4 # "d4" (8 transforms) or "flip" (4)
uncertainty_mode: entropy
export_uncertainty_map: true
num_classes: 5
model_input_shape: [512, 512]
step_shape: [256, 256]

MC Dropout vs TTA uncertainty

SourceStochasticity originWhat it measures
MC DropoutRandom dropout masksModel weight uncertainty
TTAGeometric augmentationsSensitivity to input orientation
  • MC Dropout uncertainty is high where the model's internal representations are unstable.
  • TTA uncertainty is high where the model's output changes when the input is rotated or flipped — indicating lack of geometric invariance.

Both produce a [0, log C]-valued float32 GeoTIFF that can be interpreted identically in QGIS.


Sliding-window behaviour

In both processors, the uncertainty map is computed per tile and merged back to full image resolution using the same Gaussian/pyramid importance weighting as the segmentation output. Tile overlap regions receive weighted contributions from all tiles that cover them — the same mechanism that eliminates border artefacts in the segmentation.

For each tile:
T forward passes → T probability maps
p_mean → argmax → class prediction (fed to seg TileMerger)
uncertainty → per-pixel scalar (fed to uncertainty TileMerger)

After all tiles:
Merge seg tiles → full-image class-index raster
Merge unc tiles → full-image uncertainty raster

Choosing n_samples

n_samplesInference timeUncertainty quality
5~5× standardCoarse estimate
10~10× standardGood for most use cases
30~30× standardHigh-quality estimate
50+~50×+ standardDiminishing returns

The mean prediction quality also improves with more samples, but converges quickly (around 10–20 for typical segmentation models).


Limitations

  • Large images: The MCDropoutInferenceProcessor skips the striped inference path (used automatically for images > 50 MP) and falls back to standard tiled inference. A warning is logged. Reduce model_input_shape or use a machine with more VRAM if OOM occurs.
  • n_samples = 1: Valid but equivalent to a single stochastic forward pass. mutual_information uncertainty will be zero; entropy gives single-sample entropy (not a MC estimate).
  • No training changes: The model trains exactly as before. You can enable MC Dropout on any existing checkpoint that was trained with Dropout.