|
Siwei Wu
I am a second-year PhD student in the Manchester NLP group at the University of Manchester, supervised by Prof. Chenghua Lin. Since October 2025, I have been interning with the foundation model team at iQuest, where I work on Agentic Post-training. Before that, I interned at the Institute for Intelligent Computing (通义千问) of Alibaba (2024.2-2024.8). I obtained my bachelor's degree from Jiangnan University and my master's degree from Nanjing University of Science and Technology.
My research interests mainly focus on Code Agentic, Agentic Everything, and natural language processing (NLP).
Email  / 
Google Scholar  / 
Twitter  / 
GitHub
|
|
Recent News
2026-03-17 – Our paper (IQuest-Coder-V1 Technical Report) was released on arXiv!
2026 – Our paper (MMRA) was accepted to Findings of EACL 2026!
2026-03 – Our paper (COIG-P) was accepted to Findings of EACL 2026!
2026-02-03 – Our paper (TerminalTraj) was released on arXiv!
2024-10-18 – Our paper (A Comparative Study on Reasoning Patterns of OpenAI's o1 Model) was released on arXiv!
2024-09-13 – I started my PhD in the NLP group at the University of Manchester!
2024-05-17 – Our paper (SciMMIR) was accepted by ACL'24!
2024-02-18 – I intern at the Institute for Intelligent Computing (通义千问) of Alibaba!
2023-07-25 – Our paper (Dense-ATOMIC) achieved the ACL'23 Outstanding Paper Award!
2023-05-25 – Our paper (Dense-ATOMIC) was nominated for ACL'23 Best Paper!
2023-05-04 – 2 papers (Dense-ATOMIC and CPNC) were accepted by ACL'23!
2023-4-05 – Created this homepage.
|
        Selected Paper (* equal contribution)
|
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
Siwei Wu, Yizhi Li, Yuyang Song, et al.
arXiv preprint
Paper
TerminalTraj presents a large-scale pipeline for constructing dockerized environments, generating terminal agent trajectories, and verifying them with executable checks.
|
|
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Siwei Wu, JinCheng Ren, Xeron Du, et al.
Findings of EACL 2026
Paper
We introduce COIG-P, a large-scale Chinese preference dataset designed for alignment with human values, covering diverse domains including chat, coding, and mathematics.
|
|
IQuest-Coder-V1 Technical Report
Jian Yang, Wei Zhang, Shawn Guo, et al.
arXiv preprint
Paper
In this technical report, I was responsible for the construction of TerminalBench-related data.
|
|
DocMMIR: A Framework for Document Multi-modal Information Retrieval
Zirui Li, Siwei Wu, Yizhi Li, et al.
Findings of EMNLP 2025
Paper
We introduce DocMMIR, a unified framework for document-level multi-modal information retrieval across Wikipedia articles, scientific papers, and presentation slides.
|
|
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, et al.
ICLR SCI-FM 2025
Paper / Code
In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i.e., math, code and commonsense reasoning).
|
|
SongTrans: An unified song transcription and alignment method for lyrics and notes
Siwei Wu, Jinzheng He, Ruibin Yuan, Haojie Wei, et al.
arXiv preprint
Paper
we first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs.
|
|
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, et al.
Findings of EACL 2026
Paper / Code
We propose the multi-image relation association task and a meticulously curated Multi-granularity Multi-image Relational Association (MMRA) benchmark, comprising 1,024 samples. In order to systematically and comprehensively evaluate current LVLMs, we establish an associational relation system among images that contain 11 subtasks (e.g, UsageSimilarity, SubEvent, etc.) at two granularity levels (i.e., “image” and “entity”) according to the relations in ConceptNet.
|
|
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu, Yizhi Li, Kang Zhu, et al.
ACL 2024 findings
Paper / Code
To bridge this information retrieval gap in the scientific domain, this work develops a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain.
|
|
Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths (* ACL 2023 Outstanding Paper Award)
Xiangqing Shen, Siwei Wu, Rui Xia.
ACL 2023
Paper / Code
This work mainly solves the bipartite graph properties of commonsense knowledge graph ATOMIC, mines potential multiple paths in ATOMIC, and builds a more complete knowledge graph Dense-ATOMIC.
|
|
Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering
Siwei Wu, Xiangqing Shen, Rui Xia.
ACL 2023 findings
Paper / Code
This work alleviates the problems of edge sparsity and nodes redundancy in the commonsense knowledge graph, and proposes a new commonsense knowledge graph completion framework.
|
      Selected Awards
National Encouragement Scholarship, Jiangnan University, 2018
Honorable Mentions of Mathematical Contest in Modeling, 2020
ACL 2023 Outstanding Paper Award
Last modified in Apr. 2026. Design and source code from Jon Barron.
|
|