Multi-Modal Learning Project

Jan 11, 2026·
张洋
张洋
· 1 min read

Overview

This project enhances the LLaVA-CoT pipeline with inference-time scaling to better allocate compute across intermediate reasoning stages. We introduce two techniques: Monte Carlo Tree Search (MCTS) to explore promising reasoning paths and Power Sampling to refine the joint distribution over stage-level sequences. The result is improved accuracy and efficiency in constrained settings, validated on HallusionBench and MathVista.

Report