Multi-Modal Learning Project

Jan 11, 2026·

张洋

· 1 min read

Overview

This project enhances the LLaVA-CoT pipeline with inference-time scaling to better allocate compute across intermediate reasoning stages. We introduce two techniques: Monte Carlo Tree Search (MCTS) to explore promising reasoning paths and Power Sampling to refine the joint distribution over stage-level sequences. The result is improved accuracy and efficiency in constrained settings, validated on HallusionBench and MathVista.

Report

Project report (PDF)

Last updated on Jan 11, 2026

Blogs

Authors

张洋

Undergraduate Student

← Machine Learning Course Project Jan 12, 2026

Capstone Project Proposal Dec 23, 2025 →