AI Growth Vectors: Healthcare LLMs

Healthcare AI Systems Are Very Diverse and Range from Single- to Multi-modal and from Niche to Generalist

Apr 29, 2024

This is part 6 in the series of GenAI Growth Vectors, which looks at LLMs and systems developed for healthcare applications. Part 1 and Part 3 analyze cost-focused LLMs, Part 2 is about General Purpose models, Part 4 is on LLMs developed for coding and Part 5 on math-focused models.

Healthcare is a large market with a complex value chain that spans from patient intake and scheduling, to primary care, specialist care, and follow-up across a large variety of pathologies and treatment options. This opens a wide space of opportunity for AI to make patients’ and doctors’ experiences easier, faster and more comfortable across specialties and geographies.

At the same time, healthcare AI applications require a high standard of quality, as they need to always offer accurate information, protect patient privacy and comply with complex regulations. Medicine is a multimodal field which calls for models that can handle text and images, at a minimum. And in order to create credible expert LLMs for all medical specialities, good quality datasets are needed for training.

The list in this article includes representative models released or published in research papers from 2023 and 2024 that highlight the variety of use cases and opportunities for healthcare LLMs. Read below about:

Polaris, an agent system with a primary LLM and multiple support models, designed to assist with non-diagnostic patient voice conversations
MedAlpaca, a text-only, generalist model for medical conversations
SAM-Med2D, a computer vision model for medical image segmentation
Radiology Llama 2, fine-tuned to generate radiology impressions from findings
Med-Flamingo, a computer vision model for clinical evaluation and reasoning
Med-PaLM M, a multimodal, generalist models
ClinicalGPT, a LLM for medical Q&A
OphGLM, a vision-language model for eye pathologies
RadiologyGPT, a LLM for radiology use
LLaVA-Med, a vision-language model for biomedical images
PMC-Llama, a LLM for medical Q&A
BiomedGPT, a generalist model for biomedical tasks

Polaris is a multi-turn audio agent made up of multiple LLMs and developed for non-diagnostic patient conversations. Created by Hippocratic AI, its objective is to perform low-risk non-diagnostic tasks typically done by nurses, medical assistants, social workers and nutritionists, such as checking on patient wellness, reviewing compliance with prescribed medications, confirming appointment details and ensuring that all tasks are completed.

Polaris is a 1T parameter system composed of several multi-billion LLMs that communicate with each other. A stateful primary agent is focused on driving the conversation with the patient and several specialist support agents are dedicated to various tasks. The medium-sized assistant models also act as a safety check for the information provided by the main model, providing not just specialization, but also redundancy. The models are decoder-only Transformers with 30-100 layers, trained using Grouped Query Attention and Flash Attention 2, with RMSNorm normalization layers, SwiGLU activation functions and Rotary Positional Embeddings. With context windows of between 4k to 32k tokens, the models were trained on proprietary data to incorporate fine-grained medical knowledge, reasoning and specialized numerical reasoning.

The agent was evaluated over 3k conversations with 100 US-licensed Registered Nurses and 130 US-licenses Physicians, who rated it on bedside manner, conversation quality, clinical readiness, patient education and medical safety among others.

More details on the system and its architecture are in the Polaris: A Safety-Focused LLM Constellation Architecture for Healthcare paper, published in March 2024.

Overview of the Polaris architecture from the “Polaris: A Safety-Focused LLM Constellation Architecture for Healthcare” paper

MedAlpaca is a text-only model developed for healthcare conversations by fine-tuning the Llama 7B and 13B models on specialized medical data and on the Alpaca instruction-following method. Published in October 2023 by researchers at the University Hospital Aachen, Technical University of Munich, Berliner Hochschule fuer Technik and the Berlin Institute of Health at Charite, MedAlpaca was trained on four datasets:

The Anki Medical Curriculum flashcards created by medical students on topics including anatomy, physiology, pathology and pharmacology
52k question-answer pairs from five StackExchange forums related to biomedical sciences
Question-answer pairs from Wikidoc, a collaborative platform for medical professionals
Medical NLP benchmarks, including the COVID-19 Open Research Dataset Challenge, MedQA and the Pubmed Causal Benchmark

The models are available on Hugging Face and are described in the MedAlpaca - An Open-Source Collection of Medical Conversational AI Models and Training Data paper.

SAM-Med2D is a computer vision model for medical image segmentation across various modalities, anatomical structures and organs. Used to delineate different types of tissues, body components or regions of interest, SAM-Med2D is a fine-tuned version of the Segment Anything Model from Meta AI. Published in April 2023, the base model has a visual Transformer architecture and contains an image encoder, a flexible prompt encoder and a fast mask decoder.

SAM-Med2D was released in August 2023 by researchers from Sichuan University and the Shanghai AI laboratory and was trained on a dataset of 4.6M images and 19.7M masks, composed of 10 different medical modalities, including MRI, CT, ultrasound, PET and X-ray. The model is available on GitHub and described in the SAM-Med2D report.

Radiology Llama 2 is a fine-tuned version of Llama 2 7B Chat developed to generate radiology impressions from findings. Created by researchers at 10 universities in the US and China, Radiology Llama 2 was trained on the MIMIC-CXR dataset of 227k chest X-rays imaging studies from 65k patients and the OpenI dataset of 8k images and 4k corresponding radiology reports. The model is described in the August 2023 “Radiology-Llama2: Best-in-Class Large Language Model for Radiology” paper.

Med-Flamingo is a vision-language model developed for clinical evaluation and reasoning. Published in July 2023 by researchers at Stanford University, Hospital Israelita Albert Einstein in Sao Paulo, Brazil and Harvard Medical School, the model is based on Open-Flamingop 9B, a General Purpose vision-language model. Med-Flamingo is trained on MTB, a set of 4.7k medical textbooks, and PMC-OA, a biomedical dataset with 1.6M image-caption pairs from PubMed Central’s Open Access category. The model can be accessed on GitHub and is described in the “Med-Flamingo: a Multimodal Medical Few-shot Learner” paper.

Med-PaLM M is a multimodal generalist biomedical model based on PaLM-E. Developed by Google Research and Depp Mind, Med-PaLM M has been fine-tuned using the MultiMedBench, a multimodal biomedical benchmark that encompasses 1M samples of 14 tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization and genomic variant calling. The base model, PaLM-E is a General Purpose multimodal model that can process text, vision and sensor signals and that was trained by combining PaLM and a Vision Transformer. Med-PaLM M is described in more detail in the “Towards Generalist Biomedical AI”, published in July 2023.

ClinicalGPT is a language model based on BLOOM 7B and fine-tuned for medical question and answering. Developed by researchers at the Beijing University of Posts and Telecommunications in June 2023, it was trained on datasets including cMedQA2 - a Chinese medical question-and-answer repository of 120 k questions and 226k answers, cMed-K6 - based on knowledge graphs of Chinese medical examination questions, and MedDialog - a dataset of 11.1M medical conversations. The base model - Big Science Large Open Science Open Access Multilingual Language Model (BLOOM), was developed by the Big Science Collaborative initiative and released in July 2022. Clinical GPT was published with the report titled “ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation”.

OphGLM is a Large Language and Vision model developed to diagnose eye conditions, including diabetic retinopathy, age-related macular degeneration, pathological myopia and glaucoma and rare fundus conditions. Published in June 2023 by researchers at Tsinghua University, the National Health Commission Capacity Building and Continuing Education Center and the Beijing Tangren Hospital, OphGLM is based on ChatGLM, a 6.8B open source dialogue language model for English and Chinese and was fine-tuned using an ophthalmology dataset of 20k image-text pairs. The model is able to accept fundus images as input and provide diagnosis findings as output. It is described in more detail in the “OphGLM:Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue” report.

RadiologyGPT is a fine-tuned version of Alpaca 7B for radiology use. Developed by researchers from 11 universities from the United States and China, it was trained on the MIMIC-CXR dataset, which contains de-identified medical data from over 60k patients who were admitted at the Beth Israel Deaconess Medical Center between 2001 and 2012. Details on the model were published in June 2023 in the report titled “Radiology-GPT: A Large Language Model for Radiology”

LLaVA-Med is a vision-language model based on LLaVA and fine-tuned for conversations about biomedical images. Developed in June 2023 by Microsoft, it is trained on 600k image-text pairs from the most common medical imaging modalities including chest X-rays, CT, MRI, histopathology and macroscopic pathology. LLaVA is a General Purpose model with a linear projection layer that connects a vision encoder with a language model. LLaVA-Med is available on GitHub and is described in the “LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day“

PMC-Llama is a fine-tuned version of Llama 13B for medical question and answering. Developed by researchers at Shanghai Jiao Tong University and the Shanghai AI Laboratory in April 2023, it was trained on 4.8M biomedical academic papers and 20k medical textbooks. It is available on GitHub and described in the “PMC-LLaMA: Towards Building Open-source Language Models for Medicine” paper.“

BiomedGPT is a generalist model for diverse biomedical tasks developed to help find the most suitable treatment plans for patients with multiple chronic conditions that require a hollistic care approach. Created in May 2023 by researchers at 12 medical and computer science institutions in the US, BiomedGPT comes in 3 sizes (33M, 93M and 183M) and was trained on 25 datasets that encompass five medical AI tasks: disease classification, medical language understanding, text summarization, image description and visual Q&A. The model can be found on GitHub and is described in the “BiomedGPT: A Unified Biomnedical Generative Pre-Trained Transformer for Vision, Language and Multimodal Tasks” paper.

Healthcare AI applications have significant potential to improve how we approach diagnostics, patient interaction, and personalized medicine. As we get better at building the models, healthcare applications that are reliable, accurate and private get even closer to being a part of our medical experiences. The journey towards fully realizing the capabilities of healthcare LLMs is just beginning, and it is an exciting frontier at the intersection of AI and medical care.

About the Author

AI Growth Vectors: Healthcare LLMs

Healthcare AI Systems Are Very Diverse and Range from Single- to Multi-modal and from Niche to Generalist

Discussion about this post