How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1 https://arxiv.org/abs/2602.19526 https://www.alphaxiv.org/ru/overview/2602.19526