GenSeg-Baselines
Reproducible code for a 2D medical-image segmentation benchmark: 8 methods ร 10 datasets ร 3 seeds/folds, 7 metrics, evaluated under a unified resolution-fair protocol. Companion to the GenSegDataset.
This is a code-only repository โ trained checkpoints and the generated result tables are not hosted here.
Methods: UNet, UNet++, DeepLabV3+ (ResNet-50/ImageNet), Attention-UNet (from scratch), TransUNet (R50-ViT-B/16, input 256), Swin-UNet (Swin-Tiny, input 224), nnU-Net v2 (250 ep), U-Mamba (UMambaBot, 100 ep).
Datasets: cvc_clinicdb, kvasir_seg, fives, busi, refuge2, acdc, idridd, pannuke, isic2018, kits19.
Metrics (computed per image, then aggregated): Dice, IoU, HD95, ASSD, Sensitivity, Specificity, Precision โ plus per-class Dice for the multi-class datasets and paired-Wilcoxon significance on per-image Dice.
Resolution-fair protocol
Convolutional nets are trained at 512; the fixed-input transformers (Swin-UNet 224, TransUNet 256) and nnU-Net / U-Mamba run at their native size; every prediction and ground truth is resized to a common 512ร512 before scoring, so boundary metrics (HD95/ASSD, in pixels) are directly comparable across methods.
Layout (code only)
code/framework/โ training/evaluation framework:train.py,test.py,eval_at_res.py,nnunet_eval.py;metrics/(the 7 metrics + boundary distances);models/(SMP wrappers, Attention-UNet, Swin/TransUNet wrappers, model registry);report/aggregate.pybuilds the summary tables (per-dataset Dice/HD95/IoU, per-class Dice, Sensitivity/Precision, significance).code/sota/{Swin-Unet,TransUNet}/โ upstream network definitions imported by the Swin-UNet / TransUNet wrappers.code/scripts/โ reproduction scripts (unified-512 training & evaluation, nnU-Net / U-Mamba pipelines).code/envs/โ conda environments (seggen.yml,nnunet.yml,umamba.yml).