Siwei Wu

I am a second-year PhD student in the Manchester NLP group at the University of Manchester, supervised by Prof. Chenghua Lin. Since October 2025, I have been interning with the foundation model team at iQuest, where I work on Agentic Post-training. Before that, I interned at the Institute for Intelligent Computing (通义千问) of Alibaba (2024.2-2024.8). I obtained my bachelor's degree from Jiangnan University and my master's degree from Nanjing University of Science and Technology.

My research interests mainly focus on Code Agentic, Agentic Everything, and natural language processing (NLP).

Email / Google Scholar / Twitter / GitHub

Recent News

2026-07-17 – Our paper (TerminalTraj) was accepted to ICML 2026 as a Spotlight!
2026-04-21 – Our paper (TACO) was released on arXiv!
2026-03-17 – Our paper (IQuest-Coder-V1 Technical Report) was released on arXiv!
2026 – Our paper (MMRA) was accepted to Findings of EACL 2026!
2026-03 – Our paper (COIG-P) was accepted to Findings of EACL 2026!
2026-02-03 – Our paper (TerminalTraj) was released on arXiv!
2024-10-18 – Our paper (A Comparative Study on Reasoning Patterns of OpenAI's o1 Model) was released on arXiv!
2024-09-13 – I started my PhD in the NLP group at the University of Manchester!
2024-05-17 – Our paper (SciMMIR) was accepted by ACL'24!
2024-02-18 – I intern at the Institute for Intelligent Computing (通义千问) of Alibaba!
2023-07-25 – Our paper (Dense-ATOMIC) achieved the ACL'23 Outstanding Paper Award!
2023-05-25 – Our paper (Dense-ATOMIC) was nominated for ACL'23 Best Paper!
2023-05-04 – 2 papers (Dense-ATOMIC and CPNC) were accepted by ACL'23!
2023-04-05 – Created this homepage.

Selected Paper (* equal contribution)
	A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, et al. arXiv preprint Paper / Code TACO is a plug-and-play, training-free, self-evolving terminal observation compression framework that discovers, refines, and reuses structured compression rules to improve long-horizon terminal agents.
	Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments Siwei Wu, Yizhi Li, Yuyang Song, et al. ICML 2026 Spotlight Paper TerminalTraj presents a large-scale pipeline for constructing dockerized environments, generating terminal agent trajectories, and verifying them with executable checks.
	COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Siwei Wu, JinCheng Ren, Xeron Du, et al. Findings of EACL 2026 Paper We introduce COIG-P, a large-scale Chinese preference dataset designed for alignment with human values, covering diverse domains including chat, coding, and mathematics.
	IQuest-Coder-V1 Technical Report Jian Yang, Wei Zhang, Shawn Guo, et al. arXiv preprint Paper In this technical report, I was responsible for the construction of TerminalBench-related data.
	DocMMIR: A Framework for Document Multi-modal Information Retrieval Zirui Li, Siwei Wu, Yizhi Li, et al. Findings of EMNLP 2025 Paper We introduce DocMMIR, a unified framework for document-level multi-modal information retrieval across Wikipedia articles, scientific papers, and presentation slides.
	A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, et al. ICLR SCI-FM 2025 Paper / Code In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains, including math, code, and commonsense reasoning.
	SongTrans: An Unified Song Transcription and Alignment Method for Lyrics and Notes Siwei Wu, Jinzheng He, Ruibin Yuan, Haojie Wei, et al. arXiv preprint Paper We first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs.
	MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, et al. Findings of EACL 2026 Paper / Code We propose the multi-image relation association task and a meticulously curated Multi-granularity Multi-image Relational Association (MMRA) benchmark, comprising 1,024 samples. In order to systematically and comprehensively evaluate current LVLMs, we establish an associational relation system among images that contains 11 subtasks at two granularity levels according to the relations in ConceptNet.
	SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Siwei Wu, Yizhi Li, Kang Zhu, et al. ACL 2024 Findings Paper / Code To bridge this information retrieval gap in the scientific domain, this work develops a specialised scientific MMIR benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain.
	Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths *( ACL 2023 Outstanding Paper Award) Xiangqing Shen, Siwei Wu, Rui Xia. ACL** 2023 Paper / Code This work mainly solves the bipartite graph properties of commonsense knowledge graph ATOMIC, mines potential multiple paths in ATOMIC, and builds a more complete knowledge graph Dense-ATOMIC.
	Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering Siwei Wu, Xiangqing Shen, Rui Xia. ACL 2023 Findings Paper / Code This work alleviates the problems of edge sparsity and node redundancy in the commonsense knowledge graph, and proposes a new commonsense knowledge graph completion framework.

Selected Awards

National Encouragement Scholarship, Jiangnan University, 2018
Honorable Mentions of Mathematical Contest in Modeling, 2020
ACL 2023 Outstanding Paper Award

Last modified in Jul. 2026. Design and source code from Jon Barron.