Event Details
PhD Dissertation Defense - Hanchen Xie
Wed, Apr 02, 2025
12:00 PM - 2:00 PM
Location: RTH 217
Dissertation Title: Mitigating Environment Misalignment And Discovering Intrinsic Relations Via Symbolic Alignment
Date and Time: April 2, 12 pm to 2 pm.
Location: RTH 217
Committee: Yue Wang (Chair), Wael Abd-Almageed, Aram Galstyan, Emilio Ferrara, Peter Beerel
Abstract: Deep learning models have achieved remarkable success on various computer vision tasks. Modern state-of-the-art methods can not only recognize the visual appearance of objects but also discover intrinsic relations of objects (e.g., dynamics or causal relations). However, collecting sufficient training data for the intrinsic relations can be expensive or infeasible in many scenarios, such as car incident videos in the real world. As an alternative, one can generate data in a different environment, such as synthetic data, that depicts the same intrinsic relations. Yet, end-to-end models may suffer from environment misalignment challenges, such as visual domain or environment context shift, so the model generality is limited. To mitigate such misalignment challenges, we propose symbolic alignment, a novel learning strategy that utilizes a common symbolic space to align various environments. We first conduct a case study on dynamics prediction to reveal the environment misalignment challenges on our proposed datasets. Next, to obtain insight into the challenge, we provide an investigation of the implicit position encoding in the dynamics prediction model. Then, we present a learning framework that separates the learning of appearance recognition and dynamics relations discovery to improve the generality of the dynamics prediction model. Then, we generalize the symbolic alignment strategy and introduce a novel framework, Look, Learn, and Leverage L3. L3 decomposes the learning process into three distinct phases and achieves promising results on three intrinsic relations discovery tasks. Finally, we extend the environment misalignment discussion to video classification and demonstrate the potential of symbolic alignment to mitigate the video content inconsistency between training and inference.