Senior Research Scientist at Salesforce AI Research.
I work on AI that understands and builds the physical world —
vision-language models, multimodal reasoning, and the gap between pixels and physical reality.
News
About
I am a Senior Research Scientist at Salesforce AI Research. I care about AI systems that reason about physical reality — not just recognizing objects in images, but understanding structural validity, spatial constraints, and how things are actually built.
I did my PhD at the University of Maryland, College Park, advised by Prof. Abhinav Shrivastava and Prof. Larry Davis, working on domain adaptation and model robustness.
Research
DreamHouse: How Far Are Vision-Language Models from Constructing the Real World?
2026
A benchmark for physical generative reasoning using residential timber-frame construction. 26,000+ verified structures, a 10-test structural validation suite, and three evaluation protocols spanning frontier VLMs (GPT-5, Claude 4.5 Opus, Gemini 3 Flash, Qwen).
Scan with WeChat
GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR
arXiv 2024
* equal contribution
Experience