Large language models propagate race-based medicine

Summary: The study investigates the potential propagation of harmful, race-based medical content by four commercially available large language models (LLMs) in healthcare scenarios. The LLMs, including ChatGPT and GPT-4, displayed instances of perpetuating race-based medicine and inconsistencies in responses. Questions about kidney function and lung capacity revealed problematic race-based answers. The study emphasizes the risk of LLMs amplifying biases and causing harm in healthcare, urging caution in their use for medical decision-making. The need for further evaluation, transparency, and addressing potential biases before clinical integration is emphasized.