Alejandro Lozano

I am a PhD candidate at the Stanford Artificial Intelligence Laboratory working on vision-langue foundation models. I'm fortunate to be advised by Serena Yeung-Levy and to be supported by the Arc Institute. I am also deeply grateful to Nvidia, Amazon, and HAI for generously funding my research.

My work focuses on multimodal learning, multimodal retrieval-augmentation, agent-based systems and the intersection of those topics with real-world applications, with an emphasis on precision medicine. During my free time, I like to ... I don't have free time ;/

Scholar / Github / LinkedIn / Twitter /

Recent News:

[June 2025]	I am interning at Microsoft Research.
[March 2025]	Awarded Nvidia grant.
[February 2025]	3 papers accepted to CVPR 2025.
[January 2025]	2 papers accepted to ICLR 2025
[December 2024]	1 paper accepted to NEJM AI.
[September 2024]	1 paper accepted to NeurIPS 2024.

Selected Publications

(*) denotes co-first authorship. For a full list of publications, please check my Google Scholar

	BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Alejandro Lozano , Min Woo Sun, James Burgess, Liangyu Chen, Jeffrey J. Nirschl, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Anita Rau, Austin Wolfgana Katzer, Collin Chiu, Xiaohan Wang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy CVPR 2025* project page / paper / code / data We introduce BIOMEDICA, a framework to transform PMC-OA into a comprehensive dataset of over 24 million image-text pairs with expert-guided annotations, enabling the training of state-of-the-art biomedical vision-language models across diverse tasks and domains.
	Can Large Language Models Match the Conclusions of Systematic Reviews? Christopher Polzak, Alejandro Lozano* , Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy TBD 2025 project page / paper / code / data Can LLMs match the conclusions of systematic reviews written by clinical experts when given access to the same studies? To explore this question, we present MedEvidence, A human-curated benchmark of 284 questions (from 100 open-access SRs) across 10 medical specialties.
	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D Leonetti, Chad Liu, Emma Lundberg, CVPR 2025* paper / data MicroVQA is an expert-curated benchmark for research-level reasoning in biological microscopy. We also propose a method for making multiple-choice VQA more challenging.
	Time-to-Event Pretraining for 3D Medical Imaging Zepeng Huo,, Jason Alan Fries, Alejandro Lozano , Jeya Maria Jose Valanarasu, Ethan Steinberg, Louis Blankemeier, Akshay S. Chaudhari, Curtis Langlotz, Nigam H. Shah ICLR 2025* paper We propose the first time-to-event pretraining framework for 3D medical imaging models that leverages large-scale temporal supervision from paired longitudinal electronic health records.
	Video Action Differencing James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, Serena Yeung-Levy ICLR 2025 paper / data We propose Video Action Differencing (VidDiff), a new task aimed at detecting subtle differences in how actions are performed across pairs of videos. To support this task, we introduce a benchmark spanning a diverse set of skilled actions, along with a baseline and agentic workflow to investigate the limitations of current VLMs.
	Micro-Bench: A Vision-Language Benchmark for Microscopy Understanding Alejandro Lozano , Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy NeurIPS 2024 project page / paper / code / data A Vision-Language Benchmark for Microscopy Understanding, featuring 17,000 microscopy images sourced from 24 publicly available datasets. As the most diverse microscopy benchmark to date, it spans light (LM), fluorescence (FM), and electron microscopy (EM) across 8 sub-modalities, 91 distinct cell, tissue, and structure types, and 24 staining techniques. Microbench supports tasks including closed-form visual question answering (VQA), object detection, and segmentation.
	Medalign: A clinician-generated dataset for instruction following with electronic medical records Scott L Fleming, Alejandro Lozano*, William J Haberkorn, Jenelle A Jindal, Eduardo Reis, Rahul Thapa, Louis Blankemeier, Julian Z Genkins, Ethan Steinberg, Ashwin Nayak, Birju Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott Adams, Oluseyi Fayanju, Shreya J Shah, Thomas Savage, Ethan Goh, Akshay S Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A Pfeffer, Percy Liang, Jonathan H Chen, Keith E Morse, Emma P Brunskill, Jason A Fries, Nigam H Shah AAAI 2024 / ML4H 2023 (Best Paper Award) project page / paper / code / data We introduce MedAlign, a benchmark of 983 natural language instructions about EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes human-written reference responses, and provides 276 full longitudinal EHRs for grounding instruction-response pairs.
	Clinfo. ai: An open-source retrieval-augmented large language model system for answering medical questions using scientific literature Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah Pacific Symposium on Biocomputing 2024 (Oral) paper / code Introducing Clinfo.ai, the first open-source agentic system designed to answer medical questions using scientific literature. Clinfo.ai employs a chain of large language models to convert a question into a query and explore the most relevant literature to provide an up-to-date answer.
	Orientation-invariant autoencoders learn robust representations for shape profiling of cells and organelles James Burgess, Jeffrey J. Nirschl, Maria-Clara Zanellati, Alejandro Lozano, Sarah Cohen, Serena Yeung-Levy Nature Communications 2024 paper / code Unsupervised shape representations of cells and organelles are often erroneously sensitive to image orientation. We introduce O2VAE, an orientation-invariant autoencoder that mitigates this issue by using equivariant convolutional network encoders.

Teaching

Stanford AI4ALL Medical AI Lead Mentor Stanford, 2024 and 2025.
Head Teaching assistant, CS 235: Computational Methods for Biomedical Image Analysis and Interpretation, Stanford 2025.
Teaching assistant, CS 235: Computational Methods for Biomedical Image Analysis and Interpretation, Stanford 2022.

I stole this website template from Jon Barron who published his source code here.