I am a fourth-year Ph.D. student at MIT advised by Professor Pulkit Agrawal. My research focus is reinforcement learning algorithms and their applications, mainly in NLP and robotics. My research has been funded by Qualcomm Innovation Fellowship and MIT-Google Collaboration Grant.
I was most recently at Google DeepMind working on post-training for LLMs. I have a B.Sc. in Computer Engineering from the Technion. While there I worked with Aviv Tamar on reinforcement learning research.
News
Publications
Aligning Language Models From User Interactions
[ArXiv] [Project page]
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
[ArXiv] [Project page]
Self-Distillation Enables Continual Learning
Best Paper Award at Lifelong Agents Workshop, ICLR 2026
[ArXiv] [Project page]
Reinforcement Learning via Self-Distillation
[ArXiv] [Project page]
RL's Razor: Why Online Reinforcement Learning Forgets Less
Outstanding Paper Award at the CCFM Workshop, NeurIPS 2025
[ArXiv] [Project page]
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
[ArXiv] [Project page]
Best-of-n through the Smoothing Lens: KL Divergence and Regret Analysis
[ArXiv]
Language Model Personalization via Reward Factorization
[ArXiv] [Project page]
KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity
[ArXiv]
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
[ArXiv]
The Future of Open Human Feedback
[ArXiv]
Value Augmented Sampling for Language Model Alignment and Personalization
[ArXiv] [Project page]
Curiosity-driven Red-teaming for Large Language Models
[ArXiv] [Project page]
From Imitation to Refinement: Residual RL for Precise Visual Assembly
[ArXiv] [Project page]
Juicer: Data-efficient Imitation Learning for Robotic Assembly
[ArXiv] [Project page]
TGRL: An Algorithm for Teacher Guided Reinforcement Learning
[ArXiv] [Project page]
Selected for Oral Presentation at 2023 ICLR RRL Workshop.
Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies
[ArXiv]
