Understanding Variability in Brain Tumor Segmentation

Understanding Variability in Brain Tumor Segmentation

Research presented by Monika Pytlarz during the Radiological Society of North America 2025 (RSNA) Annual Meeting

Expert annotations in brain tumor MRI segmentation are inherently variable, capturing multiple valid interpretations rather than a single ground truth. Treating this variability as label noise can push models to over-optimize similarity to a single reference, without improving real-world utility. Instead, multiple expert segmentations can be used to estimate a consensus reference, and expert-to-expert differences can be used to define a practical AI performance range—so a model is judged against what’s clinically accurate and achievable, not a single annotator’s mask.

As part of the BraTS-METS team, data analyst Fabian Umeh and Sano researcher Monika Pytlarz, PhD Student in Computational Neuroscience, have been investigating this challenge using inter-/intra-rater reliability analysis and probabilistic consensus modeling.

Investigating Reliability through Multi-Annotator Data

Monika had the opportunity to present the Brats-METS team’s latest research findings at the RSNA 2025 Annual Meeting — one of the world’s leading conferences dedicated to innovation in medical imaging and radiology.

“At RSNA 2025, I had the opportunity to present our BraTS-METS team’s research on inter- and intra-rater variability in brain tumor segmentation on MRI — an important factor to consider when training and validating AI algorithms to measure and track brain lesions. The quality of annotations impacts how well AI tools learn from training data, and it can also affect surgical planning, radiation treatment targeting, and follow-up comparisons. When radiologists segment tumors on MRI, even experienced specialists can draw slightly different borders. Our work focused on brain metastases using cases from the BraTS-METS Lighthouse 2025 Challenge, where each scan was outlined multiple times by expert neuroradiologists (including both fully manual work and AI-assisted refinement). To account for variability, we used a method called STAPLE, which combines multiple expert annotations into a probabilistic consensus map. Our results showed that manual expert segmentation shows only fair inter- and intra-rater reliability (mean Dice score ≈ 0.81; κ ≈ 0.20). Importantly, AI-assisted segmentations were more consistent than manual-only outlines. Why this matters: better estimation of segmentation ground truth improves reproducibility, strengthens quality assurance, and helps build robust AI tools that account for real-world variability, ultimately supporting more reliable clinical decision-making.”

A recording of Monika’s full presentation, where she discusses the results in more detail and reflects on their implications for AI in neuroimaging, is available to watch via the Sano Seminars YouTube channel link

Presentation of Research at RSNA 2025

Monika presented the team’s work in the Neuroradiology (Neoplasms: Diagnosis and Classification) session during RSNA 2025, the annual meeting of the Radiological Society of North America, held from November 30 to December 3, 2025, at McCormick Place in Chicago.

With the theme “Imaging the Individual”, this year’s RSNA highlighted breakthroughs in medical imaging, including artificial intelligence, clinical innovation, and technological progress in precision radiology.