Multi-Modal Learning Project
Jan 11, 2026·
·
1 min read
张洋
Overview
This project enhances the LLaVA-CoT pipeline with inference-time scaling to better allocate compute across intermediate reasoning stages. We introduce two techniques: Monte Carlo Tree Search (MCTS) to explore promising reasoning paths and Power Sampling to refine the joint distribution over stage-level sequences. The result is improved accuracy and efficiency in constrained settings, validated on HallusionBench and MathVista.