Toward Effective and Efficient Transformer-Based Multimodal Reasoning

Friday, March 20, 2026 - 3:00pm to 4:00pm

Location:

32-D463 (Star)

Speaker:

Junhong Lin

Biography:

https://junhongmit.github.io/

Seminar group:

Thesis Defenses

Transformer-based models have become the dominant architecture for modeling complex relationships across a wide range of data modalities, including text, graphs, and knowledge bases. Their attention mechanism enables flexible interaction modeling and has driven significant progress in large language models and graph learning. However, the same mechanism also implicitly explores a large interaction space, which can lead to inefficient reasoning, noise accumulation, and scalability challenges in tasks that require structured reasoning over relational data.

This thesis investigates how introducing structured control over exploration can improve both the effectiveness and efficiency of transformer-based reasoning systems across multiple modalities. We focus on two challenging settings: reasoning over large transaction networks and reasoning over evolving knowledge graphs. First, we present FraudGT, a graph transformer framework that incorporates graph inductive biases to improve fraud detection in financial transaction networks. By guiding attention with graph structure, FraudGT improves the model’s ability to capture complex fraud patterns while enhancing computational efficiency. Second, we introduce EvoReasoner, a framework designed for reasoning over evolving knowledge graphs, where knowledge changes over time. EvoReasoner integrates temporal awareness and structured reasoning strategies to improve multi-hop reasoning while reducing noise from irrelevant reasoning paths.

Together, these contributions demonstrate that incorporating structural guidance into transformer-based systems enables more scalable and reliable reasoning across structured and hybrid knowledge domains.