Skip to main content

DANN — Domain Adversarial Neural Network

DANNMethod is the built-in implementation of the Domain Adversarial Neural Network strategy from:

Ganin, Y. & Lempitsky, V. Unsupervised Domain Adaptation by Backpropagation. ICML 2015.

It adapts a segmentation network trained on a labeled source domain to an unlabeled target domain by forcing the encoder to produce features that a domain classifier cannot distinguish — while simultaneously keeping them discriminative for segmentation.


How it works

The training loop adds an adversarial branch to the normal segmentation training:

Source images → Encoder → GRL(λ) → DomainClassifier → L_domain
Target images → Encoder → GRL(λ) → DomainClassifier ↗

Source images → Encoder → Decoder → L_seg (labeled, source only)

Gradient Reversal Layer (GRL): during the forward pass it acts as identity. During backpropagation it multiplies the incoming gradient by −λ. This forces the encoder to produce features that fool the domain classifier — i.e., features that look the same regardless of which domain they came from.

What each part of the U-Net learns:

ComponentGradient sourceEffect
EncoderL_seg (normal) + L_domain via GRL (reversed)Learns features that are both discriminative and domain-invariant
DecoderL_seg only, source domain onlyLearns to decode source-domain features
DomainClassifierL_domain (normal)Learns to distinguish source from target

Total loss:

total_loss = L_seg + λ · L_domain

λ starts at 0 and grows following the DANN schedule, so the encoder has time to stabilize before adversarial pressure is applied.


Quick start

1. Find in_channels for your encoder

in_channels must match the number of output channels of the hooked encoder layer. For SMP models, inspect model.encoder.out_channels — use the last value:

import segmentation_models_pytorch as smp

model = smp.Unet(encoder_name="resnet34", in_channels=3, classes=2)
print(model.encoder.out_channels)
# (3, 64, 64, 128, 256, 512) → use 512

Common values:

Encoderin_channels
resnet18, resnet34512
resnet50, resnet1012048
efficientnet-b01280
efficientnet-b41792
mit_b0256
mit_b2, mit_b4512
timm-resnest50d2048

2. Write the config

domain_adaptation:
# Layer to hook — must appear in feature_layers AND method.feature_layer
feature_layers:
- encoder

method:
_target_: pytorch_segmentation_models_trainer.domain_adaptation.methods.dann.DANNMethod
feature_layer: encoder # dot-separated path to the hooked layer
in_channels: 512 # channels at that layer (see table above)
hidden_size: 1024 # domain classifier MLP hidden width
discriminator_lr: 1.0e-4 # separate LR for the domain classifier
lambda_da: 1.0 # fallback if no lambda_schedule is configured

# Recommended: DANN annealing schedule
lambda_schedule:
_target_: pytorch_segmentation_models_trainer.domain_adaptation.schedulers.DANNScheduler
gamma: 10.0

3. Full working config

Copy and adjust conf/examples/dann_domain_adaptation.yaml, which contains a complete example with a U-Net ResNet-34.


Configuration reference

DANNMethod parameters

ParameterTypeDefaultDescription
feature_layerstringDot-separated name of the encoder layer to hook, e.g. "encoder". Must also appear in feature_layers.
in_channelsintNumber of output channels of feature_layer. See the table above.
hidden_sizeint1024Width of each hidden layer in the domain classifier MLP.
discriminator_lrfloat1e-4Learning rate for the domain classifier. Added as a separate optimizer parameter group.
lambda_dafloat1.0Global DA loss weight. Used as a constant when lambda_schedule is absent.
lambda_scheduledictOptional lambda scheduler config (see Lambda Schedulers).
step_mode"epoch" | "batch""epoch"Granularity at which the lambda schedule is applied. "batch" updates λ every training step for smoother growth (closer to the original Ganin et al. implementation). See Lambda update granularity below.

Logged metrics

The following scalars are logged under the da/ prefix every training step:

MetricDescription
da/domain_lossCross-entropy loss of the domain classifier
da/domain_accAccuracy of the domain classifier (0.5 = fully confused = ideal)
da/lambdaCurrent value of λ
loss/train_segSegmentation loss (source domain)
loss/train_daDA loss before weighting
loss/train_totalseg_loss + λ · da_loss

A domain_acc consistently above 0.9 means the classifier can still distinguish the domains — the adaptation is not working. A value near 0.5 means the encoder has learned domain-invariant features.


Lambda update granularity

By default (step_mode: "epoch"), the GRL coefficient λ is updated once per epoch. Setting step_mode: "batch" updates λ at every training batch, producing smoother growth and matching the original Ganin et al. formulation more closely.

domain_adaptation:
method:
_target_: pytorch_segmentation_models_trainer.domain_adaptation.methods.dann.DANNMethod
feature_layer: encoder
in_channels: 512
lambda_da: 1.0
step_mode: batch # update λ every batch instead of every epoch
lambda_schedule:
_target_: pytorch_segmentation_models_trainer.domain_adaptation.schedulers.DANNScheduler
gamma: 10.0

When to prefer each mode:

ModeProgress counterSuitable for
"epoch" (default)epoch / max_epochsLong training runs (≥ 50 epochs), simpler to reason about
"batch"global_batch / (max_epochs × batches_per_epoch)Short training runs or fine-tuning where epoch granularity is too coarse

In both modes the same DANNScheduler formula is used — only the frequency of updates changes.


Choosing feature_layer

The GRL is applied to the features captured at feature_layer. This determines which encoder layers are affected by the adversarial gradient.

For a U-Net the natural choice is "encoder" (the bottleneck), which exposes the deepest, most semantic features and causes the adversarial gradient to flow through the entire encoder.

For a model with multiple encoder stages you may prefer a mid-level layer if you want only the deeper layers to be domain-adapted:

# Adapt only from layer3 onward (resnet50)
feature_layers:
- encoder.layer3
method:
feature_layer: encoder.layer3
in_channels: 1024

To list all available layer names for a given model:

model = smp.Unet(encoder_name="resnet50", ...)
for name, _ in model.named_modules():
print(name)

Warm-starting from a source checkpoint

The most effective workflow is to first train the model on the source domain, then run DANN adaptation from that checkpoint:

domain_adaptation:
pretrained_checkpoint:
path: /checkpoints/source_model.ckpt
source_format: pytorch_lightning
strict_loading: true

Starting from a pretrained encoder means the adversarial branch starts with already-useful features rather than adapting from a randomly initialized state.


Monitoring adaptation

Add DomainAdaptationMonitorCallback to your callbacks to track forgetting and adaptation progress:

callbacks:
- _target_: pytorch_segmentation_models_trainer.domain_adaptation.callbacks.DomainAdaptationMonitorCallback
num_classes: 2
log_every_n_epochs: 1
forgetting_threshold: 0.05

Watch da/domain_acc alongside iou/target_val:

  • domain_acc near 0.5 → encoder is producing domain-invariant features ✓
  • iou/target_val rising → adaptation is improving target performance ✓
  • iou/source_val stable → no catastrophic forgetting ✓

Limitations

  • Single feature layer: DANNMethod hooks one layer. If you need multi-scale alignment, implement a custom method (see Implementing a DA Method) that combines losses from multiple layers.
  • Symmetric adaptation: DANN does not distinguish which domain is "easier" or "harder". For highly asymmetric domain gaps, methods like ADDA or CycleGAN-based adaptation may perform better.
  • Decoder is not adapted: Only the encoder receives the adversarial gradient. If the domain gap also affects the decoder-level features (e.g., due to different label distributions), consider adding entropy minimization on the target output alongside DANN.

References

  • Ganin, Y. & Lempitsky, V. Unsupervised Domain Adaptation by Backpropagation. ICML 2015.
  • Vega, P.J.S. et al. Weakly Supervised Domain Adversarial Neural Network for Deforestation Detection in Tropical Forests. IEEE JSTARS 2023.