| MS Seminar


Name of the Speaker: Ms. Mansi Kakkar (EE21S063)
Guide: Dr. Mohanasankar S
Online meeting link: http://meet.google.com/dze-bwek-iyq
Date/Time: 2nd May 2025 (Friday), 2:15 PM
Title: Multi-modal Anatomy Detection using Contrastive Language-Image Pretraining approach.

Abstract :

Vision-language models have emerged as powerful tools for tackling complex multi-modal classification challenges in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. While existing research has primarily focused on clinical descriptions for specific modalities or anatomical regions, there remains a critical gap for developing a model capable of providing entire-body multi-modal descriptions. In this study, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine existing methodologies through extensive experimentation, including baseline model fine-tuning, adding hierarchical station(s) as a superset for better correlation between organs, and advanced image and language augmentations. Our approach demonstrates a 47.6% improvement over baseline PubMedCLIP in organ detection accuracy while maintaining strong station (body region) recognition capabilities. The methodology enables robust cross-modal interpretation without requiring pixel-level annotations, significantly advancing automated radiological reporting systems.