
Awesome-Multimodal-Applications-In-Medical-Imaging
This repository includes resources on several applications of multi-modal learning in medical imaging, including papers related to <b>large language models (LLM)</b>. Papers involving LLM are bold.
Contributing
Please feel free to send me pull requests or email to add links or to discuss with me about this area.
Markdown format:
- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)
News
Citation
@article{xia2024cares,
title={CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models},
author={Xia, Peng and Chen, Ze and Tian, Juanxi and Gong, Yangrui and Hou, Ruibo and Xu, Yue and Wu, Zhenbang and Fan, Zhiyuan and Zhou, Yiyang and Zhu, Kangyu and others},
journal={arXiv preprint arXiv:2406.06007},
year={2024}
}
@article{xia2024rule,
title={RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models},
author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Zhu, Hongtu and Li, Yun and Li, Gang and Zhang, Linjun and Yao, Huaxiu},
journal={arXiv preprint arXiv:2407.05131},
year={2024}
}
Overview
Data Source 
Image-Caption Datasets
dataset | domain | image | text | source | language |
---|
ROCO | multiple | 87K | 87K | research papers | En |
MedICaT | multiple | 217K | 217K | research papers | En |
PMC-OA | multiple | 1.6M | 1.6M | research papers | En |
ChiMed-VL | multiple | 580K | 580K | research papers | En/zh |
FFA-IR | fundus | 1M | 10K | medical reports | En/zh |
PadChest | cxr | 160K | 109K | medical reports | Sp |
MIMIC-CXR | cxr | 377K | 227K | medical reports | En |
OpenPath | histology | 208K | 208K | social media | En |
Quilt-1M | histology | 1M | 1M | research papers<br>social media | En |
Harvard-FairVLMed | fundus | 10k | 10K | medical reports | En |
Visual Question Answering Datasets
Survey 
- [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
- [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
- [arXiv 2023] Vision Language Models for Vision Tasks: A Survey [pdf] [code]
- [arXiv 2023] A Systematic Review of Deep Learning-based Research on Radiology Report Generation [pdf] [code]
- [Artif Intell Med 2023] Medical Visual Question Answering: A Survey [pdf]
- [arXiv 2023] Medical Vision Language Pretraining: A survey [pdf]
- [arXiv 2023] CLIP in Medical Imaging: A Comprehensive Survey [pdf] [code]
- [arXiv 2024] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review [pdf] [code]
Medical Report Generation 
2018
- [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
- [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
- [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]
2019
- [AAAI 2019] Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation [pdf]
- [ICDM 2019] Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network [pdf]
- [MICCAI 2019] Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment [pdf]
2020
- [AAAI 2020] When Radiology Report Generation Meets Knowledge Graph [pdf]
- [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
- [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]
2021
- [NeurIPS 2021 D&B] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
- [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
- [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
- [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
- [NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
- [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
- [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
- [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
- [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
- [ACL 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]
2022
- [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
- [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
- [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
- [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
- [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [ICML 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
- [TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
- [MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
- [MedIA 2022] Knowledge matters: Chest radiology report generation with general and specific knowledge [pdf] [code]
- [MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
- [BMVC 2022] On the Importance of Image Encoding in