Outperforms Qwen2.5-Omni-7B, Kimi-Audio-Instruct-7B on multiple key audio understanding tasks. Although MiDashengLM demonstrates superior audio understanding performance and efficiency compared to ...
Inspired by the success of reinforcement learning (RL) in refining large language models (LLMs), we propose AR-GRPO, an approach to integrate online RL training into autoregressive (AR) image ...