✨ Awesome Issue Resolution

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey

📖 Documentation Website | 📄 Full Paper | 📋 Tables & Resources

🤗 HF Paper: https://huggingface.co/papers/2601.11655 (Upvotes appreciated! ⬆️)

🎙️ Interactive Exploration:

📖 Abstract

Based on a systematic review of 196 papers and online resources, this survey establishes a holistic theoretical framework for Issue Resolution in software engineering. We examine how Large Language Models (LLMs) are transforming the automation of GitHub issue resolution. Beyond the theoretical analysis, we have curated a comprehensive collection of datasets and model training resources, which are continuously synchronized with our GitHub repository and project documentation website.

🔎 Browse & Export: The full paper database is searchable and exportable at deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/admin/ — filter by category, date, or keyword, and export results as CSV.

📰 News

This Month's Papers

2 paper(s) — 2026-03

BeyondSWE: BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
SWE-Adept: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Recent Updates

Survey Update (2026-02): Added 21 new papers covering the latest advances in LLM-based issue resolution!
Survey Launch (2026-01): Our survey paper is now publicly available on arXiv: https://arxiv.org/abs/2601.11655. It covers 175 papers and resources on LLM-based GitHub issue resolution, with continuously updated datasets and leaderboards!

🔍 Explore This Survey:

📊 Data: Evaluation and training datasets, data collection and synthesis methods
🛠️ Methods: Training-free (agent/workflow) and training-based (SFT/RL) approaches
- Training-free Methods
- Training-based Methods
  - 📚 Supervised Fine-Tuning (SFT)
  - 🎮 Reinforcement Learning (RL)
🔍 Analysis: Insights into both data characteristics and method performance
- 📈 Data Analysis
- 🔍 Methods Analysis
📋 Tables & Resources: Comprehensive statistical tables and resources
📄 Full Paper: Read the complete survey paper
🤝 Contributing: How to contribute to this project

📚 Complete Paper List

Total: 196 works across 14 categories

📊 Evaluation Datasets

Benchmarks for evaluating issue resolution systems

(2026-02) SWE Context Bench: SWE Context Bench: A Benchmark for Context Learning in Coding
(2025-12) SWE-InfraBench: SWE-InfraBench: Evaluating Language Models on Cloud Infrastructure Code
(2025-12) SWE-EVO: SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
(2025-11) SWE-Sharp-Bench: SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks
(2025-11) SWE-fficiency: SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
(2025-11) SWE-Compass: SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
(2025-09) SWE-Bench Pro: SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
(2025-07) SWE-Perf: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
(2025-05) SwingArena: SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
(2025-05) OmniGIRL: Omnigirl: A multilingual and multimodal benchmark for github issue resolution
(2025-05) SWE-bench-Live: SWE-bench Goes Live!
(2025-04) Multi-SWE-bench: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
(2025-04) SWE-PolyBench: SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
(2025-04) SWE-bench Multilingual: SWE-smith: Scaling Data for Software Engineering Agents
(2025-03) FEA-Bench: FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
(2025-02) SWE-Lancer: SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
(2024-12) Visual SWE-bench: CodeV: Issue Resolving with Visual Data
(2024-10) SWE-bench Multimodal: SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
(2024-08) SWE-bench-java: SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

🎯 Training Datasets

Datasets for training issue resolution agents

(2026-02) SWE-Universe: SWE-Universe: Scale Real-World Verifiable Environments to Millions
(2025-06) Skywork-SWE: Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
(2025-05) SWELoc: SweRank: Software Issue Localization with Code Ranking
(2025-04) Multi-SWE-RL: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
(2025-04) SWE-Smith: SWE-smith: Scaling Data for Software Engineering Agents
(2025-02) LocAgent: OrcaLoca: An LLM Agent Framework for Software Issue Localization
(2025-01) SWE-Fixer: SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
(2023-10) SWE-bench-extra: SWE-bench: Can Language Models Resolve Real-world Github Issues?

🤖 Single-Agent Systems

Individual autonomous agents for issue resolution

(2025-12) Confucius Code Agent: Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
(2025-10) TOM-SWE: TOM-SWE: User Mental Modeling For Software Engineering Agents
(2025-09) Lita: Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
(2025-08) Live-SWE-agent: SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
(2025-07) Trae Agent: Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling
(2025-05) LCLM: Putting It All into Context: Simplifying Agents with LCLMs
(2025-02) PatchPilot: PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification
(2024-05) SWE-agent: Swe-agent: Agent-computer interfaces enable automated software engineering
(2024-03) Devin: SWE-bench technical report
(2023-06) Aider

👥 Multi-Agent Systems

Collaborative multi-agent frameworks

(2025-08) Meta-RAG: Meta-RAG on Large Codebases Using Code Summarization
(2025-07) SWE-Debate: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
(2025-06) AgentScope: SWE-Bench - AgentScope
(2025-05) Devlo: Achieving SOTA on SWE-bench
(2025-05) Refact.ai Agent: AI Coding Agent for Software Development - Refact.ai
(2025-03) Lingxi: Lingxi/docs/Lingxi Technical Report 2505.pdf at master · lingxi-agent/Lingxi
(2025-02) OrcaLora: OrcaLoca: An LLM Agent Framework for Software Issue Localization
(2025-01) CodeCoR: CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation
(2024-09) MarsCode Agent: MarsCode Agent: AI-native Automated Bug Fixing
(2024-09) HyperAgent: HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
(2024-08) DEI: Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
(2024-07) OpenHands: OpenHands: An Open Platform for AI Software Developers as Generalist Agents
(2024-06) CodeR: CodeR: Issue Resolving with Multi-Agent and Task Graphs
(2024-04) AutoCodeRover: AutoCodeRover: Autonomous Program Improvement
(2024-03) MAGIS: MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

🔄 Workflow-Based Methods

Structured pipeline approaches

(2025-07) SynFix: SynFix: Dependency-Aware Program Repair via RelationGraph Analysis
(2025-06) GUIRepair: Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing
(2024-12) CodeV: CodeV: Issue Resolving with Visual Data
(2024-10) Conversational Pipeline: Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench
(2024-07) Agentless: Demystifying LLM-Based Software Engineering Agents

🛠️ Tool-Augmented Methods

Methods leveraging external tools

(2026-02) Closing the Loop: Closing the Loop: Universal Repository Representation with RPG-Encoder
(2026-01) SWE-Tester: SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories
(2025-12) GraphLocator: GraphLocator: Graph-guided Causal Reasoning for Issue Localization
(2025-11) InfCode: InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution
(2025-10) BugPilot: BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills
(2025-10) TestPrune: When Old Meets New: Evaluating the Impact of Regression Tests on SWE Issue Resolution
(2025-09) Nemotron-CORTEXA: Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity
(2025-08) Git Context Controller: Git Context Controller: Manage the Context of LLM-based Agents like Git
(2025-07) Prometheus: Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases
(2025-06) SACL: SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization
(2025-06) OpenHands-Versa: Coding Agents with Multimodal Browsing are Generalist Problem Solvers
(2025-06) SemAgent: SemAgent: A Semantics Aware Program Repair Agent
(2025-06) Repeton: Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles
(2025-06) cAST: cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
(2025-05) InfantAgent-Next: InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
(2025-05) SWERank: SweRank: Software Issue Localization with Code Ranking
(2025-03) DARS: DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
(2025-03) Issue2Test: Issue2Test: Generating Reproducing Test Cases from Issue Reports
(2025-03) KGCompass: Enhancing repository-level software repair via repository-aware knowledge graphs
(2025-03) CoSIL: Issue Localization via LLM-Driven Iterative Code Graph Searching
(2025-02) OrcaLoca: OrcaLoca: An LLM Agent Framework for Software Issue Localization
(2025-02) Otter: Otter: Generating Tests from Issues to Validate SWE Patches
(2025-02) Quadropic Insiders: Quadropic Insiders : Syntheo Tops Swelite Feb
(2024-12) CoRNStack: CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
(2024-11) AEGIS: AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions
(2024-10) RepoGraph: RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
(2024-09) SuperCoder2.0: SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer
(2024-08) SpecRover: SpecRover: Code Intent Extraction via LLMs
(2024-06) Alibaba LingmaAgent: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration

🧠 Memory-Enhanced Methods

Systems with memory mechanisms

(2026-01) MemGovern: MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences
(2025-10) RepoMem: Improving Code Localization with Repository Memory
(2025-09) AgentDiet: Improving the Efficiency of LLM Agent Systems through Trajectory Reduction
(2025-07) Agent KB: Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
(2025-07) SWE-Exp: SWE-Exp: Experience-Driven Software Issue Resolution
(2025-06) ExpeRepair: EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair
(2025-05) DGM: Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
(2024-11) Infant Agent: Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
(2024-11) EvoCoder: LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

📚 Supervised Fine-Tuning (SFT)

Models trained via supervised learning

(2026-01) SWE-Lego: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
(2026-01) SWE-Replay: SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
(2025-12) SWE-Compressor: Context as a Tool: Context Management for Long-Horizon SWE-Agents
(2025-09) Devstral: Devstral: Fine-tuning Language Models for Coding Agent Applications
(2025-06) MCTS-Refined CoT: MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution
(2025-05) Search for training: Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents
(2025-05) Co-PatcheR: Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models
(2025-05) CGM: Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
(2025-03) Thinking Longer: Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
(2024-12) ReSAT: Repository Structure-Aware Training Makes SLMs Better Issue Resolver
(2024-12) Scaling data collection: Scaling Data Collection for Training SWE Agents
(2024-12) SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym
(2024-11) Lingma SWE-GPT: SWE-GPT: A Process-Centric Language Model for Automated Software Improvement
(2024-11) CodeXEmbed: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval

🎮 Reinforcement Learning (RL)

Models trained via reinforcement learning

(2026-02) SWE-Master: SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
(2026-02) SWE-Protégé: SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents
(2026-02) SWE-MiniSandbox: SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents
(2026-01) MiMo-V2-Flash: MiMo-V2-Flash Technical Report
(2025-12) Self-play SWE-RL: Toward Training Superintelligent Software Agents through Self-Play SWE-RL
(2025-12) SWE-Playground: Training Versatile Coding Agents in Synthetic Environments
(2025-12) SWE-RM: SWE-RM: Execution-free Feedback For Software Engineering Agents
(2025-12) One Tool Is Enough: One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents
(2025-12) Let It Flow: Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
(2025-12) Deepseek V3.2: DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
(2025-11) TSP: Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair
(2025-10) CWM: CWM: An Open-Weights LLM for Research on Code Generation with World Models
(2025-10) FoldGRPO: Scaling Long-Horizon LLM Agent via Context-Folding
(2025-10) GRPO-based Method: A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
(2025-10) Supervised RL: Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
(2025-10) KAT-Coder: KAT-Coder Technical Report
(2025-09) CoreThink: CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
(2025-09) EntroPO: Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization
(2025-09) Kimi-Dev: Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
(2025-09) LongCat-Flash-Think: Introducing LongCat-Flash-Thinking: A Technical Report
(2025-08) Tool-integrated RL: Tool-integrated Reinforcement Learning for Repo Deep Search
(2025-08) SWE-Swiss: SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution
(2025-08) SeamlessFlow: SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling
(2025-08) DAPO: Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
(2025-08) GLM-4.6: gpt-oss-120b & gpt-oss-20b model card
(2025-07) DeepSWE: DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL
(2025-07) Kimi-K2-Instruct: Kimi K2: Open Agentic Intelligence
(2025-06) Agent-RLVR: Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
(2025-06) SWE-Dev2: SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling
(2025-06) Minimax M2: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
(2025-05) SWE-Dev1: SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
(2025-05) Satori-SWE: Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
(2025-05) Qwen3-Coder: Qwen3 Technical Report
(2025-04) Seed1.5-Thinking: Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
(2025-03) SEAlign: SEAlign: Alignment Training for Software Engineering Agent
(2025-02) SWE-RL: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
(2025-02) SoRFT: SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
(2024-10) OSCA: Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation

⚡ Inference-Time Scaling

Methods for scaling at inference time

(2026-01) Agentic Rubrics: Agentic Rubrics as Contextual Verifiers for SWE Agents
(2025-10) SIADAFIX: SIADAFIX: issue description response for adaptive program repair
(2025-09) SWE-PRM: When Agents go Astray: Course-Correcting SWE Agents with PRMs
(2025-01) ReasoningBank: CodeMonkeys: Scaling Test-Time Compute for Software Engineering
(2024-10) SWE-Search: SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement

📥 Data Collection Methods

Techniques for collecting training data

(2026-02) DockSmith: DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder
(2026-01) MEnvAgent: MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering
(2025-12) Multi-Docker-Eval: Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering
(2025-08) RepoForge: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale
(2025-07) SWE-MERA: SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks
(2025-06) SWE-Factory: SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
(2025-05) SWE-rebench: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
(2025-05) RepoLaunch: SWE-bench Goes Live!

🔬 Data Synthesis Methods

Approaches for synthetic data generation

(2026-02) SWE-World: SWE-World: Building Software Engineering Agents in Docker-Free Environments
(2025-09) SWE-Mirror: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories
(2025-06) SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
(2025-04) R2E-Gym: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
(2025-04) SWE-Synth: SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
(2025-04) SWE-smith: SWE-smith: Scaling Data for Software Engineering Agents
(2025-01) Learn-by-interact: Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments

📈 Data Analysis

Analysis of datasets and benchmarks

(2025-12) Data contamination: Does SWE-Bench-Verified Test Agent Ability or Model Memory?
(2025-07) Rigorous agentic benchmarks: Establishing Best Practices for Building Rigorous Agentic Benchmarks
(2025-07) SPICE: SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
(2025-06) UTBoost: UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
(2025-06) Trustworthiness: Is Your Automated Software Engineer Trustworthy?
(2025-06) The SWE-Bench Illusion: The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason
(2025-04) Revisiting SWE-Bench: Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code Models
(2025-03) Patch Correctness: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study
(2024-08) SWE-bench Verified: Introducing SWE-bench Verified | OpenAI

🔍 Methods Analysis

Comparative analysis of different methods

(2025-12) SWEnergy: SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs
(2025-09) Failures analysis: An Empirical Study on Failures in Automated Issue Solving
(2025-07) Security analysis: How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench
(2025-06) Dissecting the SWE-Bench Leaderboards: Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
(2025-05) GSO: GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
(2025-05) Strong-Weak Model Collaboration: An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
(2025-05) Agents in the Wild
(2025-04) SeaView: SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow
(2025-03) Beyond final code: Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
(2025-02) Overthinking: The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
(2024-10) Evaluating software development agents: Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios
(2024-06) Context Retrieval: On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing

Evaluation & Training Datasets

A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.

Dataset	Language	Multimodal	Repos	Amount	Environment	Link
Single-PL Datasets
SWE-Fixer	Python	No	856	115,406	No
SWE-smith	Python	No	128	50k	Yes
SWE-Lego	Python	No	3,251	32,119	Yes
SWE-rebench	Python	No	3,468	21,336	Yes
SWE-bench-train	Python	No	37	19k	No
SWE-Flow	Python	No	74	18,081	Yes
Skywork-SWE	Python	No	2,531	10,169	Yes	-
R2E-Gym	Python	No	10	8,135	Yes
RepoForge	Python	No	-	7.3k	Yes	-
SWE-bench-extra	Python	No	2k	6.38k	Yes
SWE-Gym	Python	No	11	2,438	Yes
SWE-bench	Python	No	12	2,294	Yes
SWE-bench-java	Java	No	19	1,797	Yes
FEA-bench	Python	No	83	1,401	Yes
SWE-bench-Live	Python	No	164	1,565	Yes
Loc-Bench	Python	No	-	560	No
SWE-bench Verified	Python	No	-	500	Yes
SWE-bench Lite	Python	No	12	300	Yes
SWE-MERA	Python	No	200	300	Yes
SWE-Bench-CL	Python	No	8	273	Yes
SWE-Sharp-Bench	C#	No	17	150	Yes
SWE-Perf	Python	No	12	140	Yes
Visual SWE-bench	Python	Yes	11	133	Yes
SWE-EVO	Python	No	7	48	Yes
Multi-PL Datasets
SWE-Mirror	Python, Rust, Go	No	40	60k	Yes	-
Multi-SWE-bench	Java, JS, TS, Go, Rust, C, C++	No	76	4,723	Yes
Swing-Bench	Python, Go, C++, Rust	No	400	2300	Yes	-
SWE-PolyBench	Python, Java, JS, TS	No	21	2,110	Yes
SWE-Compass	Python, JS, TS, Java, C, C++, Go, Rust, Kotlin, C#	No	-	2,000	Yes
SWE-Bench Pro	Python, Go, TS	No	41	1,865	Yes
SWE-bench++	Python, Go, TS, JS, Ruby, PHP, Java, Rust, C++, C#, C	No	3,971	1,782	Yes
SWE-Lancer	JS, TS	No	-	1,488	Yes
OmniGIRL	Python, TS, Java, JS	Yes	15	959	Yes
SWE-bench Multimodal	JS, TS, HTML, CSS	Yes	17	619	Yes
SWE-fficiency	Python, Cython	No	9	498	Yes
SWE-Factory	Python, Java, JS, TS	No	12	430	Yes
SWE-bench-Live-MultiLang & Windows	Python, JS, TS, C, C++, C#, Java, Go, Rust	No	238	418	Yes
SWE-bench Multilingual	C, C++, Go, Java, JS, TS, Rust, Python, Ruby, PHP	No	42	300	Yes
SWE-InfraBench	Python, TS	No	-	100	Yes	-

Training Trajectory Datasets

A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.

Dataset	Language	Repos	Amount
SWE-Fixer	Python	856	69,752
SWE-rebench	Python	1,823	67,074
R2E-Gym	Python	10	3,321
SWE-Synth	Python	11	3,018
SWE-Factory	Python	10	2,809
SWE-Gym	Python	11	491
SWE-Lego	Python	3251	14.6k

SFT-based Methods

Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).

Model Name	Base Model	Size	Arch.	Training Scaffold	Res.(%)	Code	Data	Model
SWE-rebench-openhands-Qwen3-235B-A22B	Qwen3-235B-A22B	235B-A22B	MoE	OpenHands	59.9	-
SWE-Lego-Qwen3-32B	Qwen3-32B	32B	Dense	OpenHands	57.6
CGM-SWE-PY	Qwen2.5-Coder-72B	72B	Dense	Graph RAG	50.4		-
SWE-rebench-openhands-Qwen3-30B-A3B	Qwen3-30B-A3B	30B-A3B	MoE	OpenHands	49.7	-
Devstral	Mistral Small 3	22B	Dense	OpenHands	46.8	-	-
Co-PatcheR	Qwen2.5-Coder-14B	3$\times$14B	Dense	PatchPilot-mini	46.0		-
SWE-Swiss-32B	Qwen2.5-32B-Instruct	32B	Dense	Agentless	45.0
SWE-Lego-Qwen3-8B	Qwen3-8B	8B	Dense	OpenHands	44.4
Lingma SWE-GPT	Qwen2.5-72B-Instruct	72B	Dense	SWESynInfer	30.2		-	-
SWE-Gym-Qwen-32B	Qwen2.5-Coder-32B	32B	Dense	OpenHands, MoatlessTools	20.6		-
SWE-Gym-Qwen-14B	Qwen2.5-Coder-14B	14B	Dense	OpenHands, MoatlessTools	16.4		-
SWE-Gym-Qwen-7B	Qwen2.5-Coder-7B	7B	Dense	OpenHands, MoatlessTools	10.6		-

RL-based Methods

A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.

Model Name	Base Model	Size	Arch.	Train. Scaffold	Reward	Res.(%)	Code	Data	Model
560B Models (MoE)
LongCat-Flash-Think	LongCatFlash-Base	560B-A27B	MoE	R2E-Gym	Outcome	60.4		-
72B Models
Kimi-Dev	Qwen 2.5-72B-Base	72B	Dense	BugFixer + TestWriter	Outcome	60.4		-
SWE-RL	Llama-3.3-70B-Instruct	70B	Dense	Agentless-mini	Outcome	41.0		-	-
Multi-turn RL(Nebius)	Qwen2.5-72B-Instruct	72B	Dense	SWE-agent	Outcome	39.0	-	-	-
Agent-RLVR-RM-72B	Qwen2.5-Coder-72B	72B	Dense	Localization + Repair	Outcome	27.8	-	-	-
Agent-RLVR-72B	Qwen2.5-Coder-72B	72B	Dense	Localization + Repair	Outcome	22.4	-	-	-
32B Models
OpenHands Critic	Qwen2.5-Coder-32B	32B	Dense	SWE-Gym	-	66.4		-
KAT-Dev-32B	Qwen3-32B	32B	Dense	-	-	62.4	-	-
SWE-Swiss-32B	Qwen2.5-32B-Instruct	32B	Dense	-	Outcome	60.2
FoldAgent	Seed-OSS-36B-Instruct	36B	Dense	FoldAgent	Process	58.0		-	-
SeamlessFlow-32B	Qwen3-32B	32B	Dense	SWE-agent	Outcome	45.8		-	-
DeepSWE	Qwen3-32B	32B	Dense	R2E-Gym	Outcome	42.2
SA-SWE-32B	-	32B	Dense	SkyRL-Agent	-	39.4	-	-	-
OpenHands LM v0.1	Qwen2.5-Coder-32B	32B	Dense	SWE-Gym	-	37.2		-
SWE-Dev-32B	Qwen2.5-Coder-32B	32B	Dense	OpenHands	Outcome	36.6		-
Satori-SWE	Qwen2.5-Coder-32B	32B	Dense	Retriever + Code editor	Outcome	35.8
SoRFT-32B	Qwen2.5-Coder-32B	32B	Dense	Agentless	Outcome	30.8	-	-	-
Agent-RLVR-32B	Qwen2.5-Coder-32B	32B	Dense	Localization + Repair	Outcome	21.6	-	-	-
14B Models
Agent-RLVR-14B	Qwen2.5-Coder-14B	14B	Dense	Localization + Repair	Outcome	18.0	-	-	-
SEAlign-14B	Qwen2.5-Coder-14B	14B	Dense	OpenHands	Process	17.7	-	-	-
7-8B Models
SeamlessFlow-8B	Qwen3-8B	8B	Dense	SWE-agent	Outcome	27.4		-	-
SWE-Dev-7B	Qwen2.5-Coder-7B	7B	Dense	OpenHands	Outcome	23.4		-
SoRFT-7B	Qwen2.5-Coder-7B	7B	Dense	Agentless	Outcome	21.4	-	-	-
SWE-Dev-8B	Llama-3.1-8B	8B	Dense	OpenHands	Outcome	18.0		-
SEAlign-7B	Qwen2.5-Coder-7B	7B	Dense	OpenHands	Process	15.0	-	-	-
SWE-Dev-9B	GLM-4-9B	9B	Dense	OpenHands	Outcome	13.6		-

General Foundation Models

Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.

Model Name	Size	Arch.	Inf. Scaffold	Reward	Res.(%)	Code	Model
KAT-Coder	-	-	Claude Code	Outcome	73.4	-
MiMo-V2-Flash	309B-A15B	MoE	Agentless	Outcome	73.4
Deepseek V3.2	671B-A37B	MoE	Claude Code, RooCode	-	73.1
Kimi-K2-Instruct	1T	MoE	Agentless	Outcome	71.6	-
Qwen3-Coder	480B-A35B	MoE	OpenHands	Outcome	69.6
GLM-4.6	355B-A32B	MoE	OpenHands	Outcome	68.0	-
gpt-oss-120b	116.8B-A5.1B	MoE	Internal tool	Outcome	62.0
Minimax M2	230B-10B	MoE	R2E-Gym	Outcome	61.0
gpt-oss-20b	20.9B-A3.6B	MoE	Internal tool	Outcome	60.0
GLM-4.5-Air	106B-A12B	MoE	OpenHands	Outcome	57.6	-	-
Minimax M1-80k	456B-A45.9B	MoE	Agentless	Outcome	56.0
Minimax M1-40k	456B-A45.9B	MoE	Agentless	Outcome	55.6
Seed1.5-Thinking	200B-A20B	MoE	-	Outcome	47.0		-
Llama 4 Maverick	400B-A17B	MoE	mini-SWE-agent	Outcome	21.0
Llama 4 Scout	109B-17B	MoE	mini-SWE-agent	Outcome	9.1

🚀 Quick Start

# First time: install dependencies
pip install flask flask-cors sqlalchemy pyyaml requests

# Full update + start admin server
# (refreshes news, re-renders README/docs, builds static site, then serves)
python start.py

# Or force re-import from YAML/CSV first
python start.py --init

Open http://localhost:5000/admin to manage papers, datasets, and methods.

Command	Description
`python start.py`	Full update (news + render + build) then start server
`python start.py --init`	Re-import from YAML/CSV, then full update + start
`python start.py --no-update`	Start server without running update steps
`python start.py --port 8080`	Use a custom port
`python start.py --news`	Refresh This Month's Papers only and exit
`python start.py --render`	Re-render README/docs from DB only and exit
`python start.py --build`	Build static site (mkdocs) only and exit

🤝 Contributing

We welcome contributions! To add new papers or tables:

Fork this repository
Add entries via the admin interface (python start.py → localhost:5000/admin)
— or manually edit the YAML/CSV files in data/
Run python start.py --init if you edited files directly
Submit a PR with your changes

🌟 Related Work

Code Generation

The application of LLMs in the programming domain has witnessed explosive growth. Early research focused primarily on function-level code generation, with benchmarks such as HumanEval serving as standard metrics. However, generic benchmarks often fail to capture the nuances of real-world development. To bridge this gap, recent initiatives have attempted to extend evaluation tasks to align more closely with realistic software development scenarios, revealing the limitations of general models in specialized domains. Concurrently, methods are also evolving to capture these broader contexts. While foundational approaches primarily relied on SFT or standard retrieval-augmented generation, RL-based methods emerged as a pivotal direction for handling complex coding tasks.

Related:

HumanEval: Evaluating Large Language Models Trained on Code
Program Synthesis: Program Synthesis with Large Language Models
Repository-Level Code Completion: RLCoder: Reinforcement Learning for Repository-Level Code Completion
Domain-Specific Benchmarks: Top General Performance = Top Domain Performance? DomainCodeBench
Long-Context Code Models: Long Code Arena
Multitask Fine-Tuning: MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
RAG for Code: RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry, Repoformer: Selective Retrieval for Repository-Level Code Completion, CodeRAG-Bench
Code Generation Survey: A Survey on Large Language Models for Code Generation

Automated Software Generation

The primary goal of this task is to autonomously construct complete and executable software systems starting from high-level natural language requirements. Unlike code completion, it necessitates covering the Software Development Life Cycle (SDLC), including requirement analysis, system design, coding, and testing. To address the complexity and potential logic inconsistencies in this process, state-of-the-art frameworks leverage multi-agent collaboration, simulating human development teams to decompose complex tasks into streamlined and verifiable workflows.

Related:

ChatDev: Communicative Agents for Software Development
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
RPG: Repository Planning Graph for Unified and Scalable Codebase Generation

Automated Software Maintenance

Issue resolution is intrinsically linked to the broader domain of automated software maintenance. Methodologies established in this field are frequently encapsulated as callable tools to augment the capabilities of LLMs in software development tasks.

Related:

Bug Reproduction: AssertFlip, Automated Generation of Issue-Reproducing Tests
Fault Localization:
Code Search: A Benchmark for Localizing Code and Non-Code Issues
Test Generation:
Security: Is Vibe Coding Safe?
Survey Papers:
- A Survey on Automated Program Repair Techniques
- A Survey of Learning-based Automated Program Repair

Automated Environment Setup

Recent initiatives focus on automating the configuration of runtime environments for entire repositories. This capability develops in parallel with data construction for issue resolution.

Related:

EnvBench: A Benchmark for Automated Environment Setup
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Automated Benchmark Generation: Automated Benchmark Generation for Repository-Level Coding Tasks

Related Surveys

Existing surveys primarily focus on code generation or other tasks within the software engineering domain. This paper bridges this gap by offering the first systematic survey dedicated to the entire spectrum of issue resolution, ranging from non-agent approaches to the latest agentic advancements.

Related:

📄 Citation

If you use this project or related survey in your research or system, please cite the following:

Li, Caihua, Guo, Lianghong, Wang, Yanlin, et al. (2026). Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey. arXiv preprint arXiv:2601.11655.

BibTeX:

@article{li2026advances,
  title={Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey},
  author={Li, Caihua and Guo, Lianghong and Wang, Yanlin and Guo, Daya and Tao, Wei and Shan, Zhenyu and Liu, Mingwei and Chen, Jiachi and Song, Haoyu and Tang, Duyu and Zhang, Hongyu and Zheng, Zibin},
  journal={arXiv preprint arXiv:2601.11655},
  year={2026},
  eprint={2601.11655},
  archivePrefix={arXiv},
  primaryClass={cs.SE}
}

🙏 Acknowledgements

We would like to express our sincere gratitude to:

The authors of cited papers who provided valuable feedback on how their work is presented in this survey, greatly improving its accuracy and comprehensiveness.
All contributors who have helped improve this project through issues, pull requests, and discussions.
The open-source community for developing the amazing tools and frameworks that made this project possible.

Special Thanks

@chao-peng (Dr. Chao Peng), ByteDance Software Engineering Lab, for providing valuable suggestions on the Challenges and Opportunities section of our survey.
@EuniAI/awesome-code-agents for providing an excellent reference on managing survey papers through documentation systems and inspiring our project structure.

📬 Contact

If you have any questions or suggestions, please contact us through:

📧 Email: noranotdor4@gmail.com
💬 GitHub Issues: Open an issue

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ Star this repository if you find it helpful!

Made with ❤️ by the DeepSoftwareAnalytics team

Documentation | Paper | Tables | About | Cite

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
admin		admin
controllers		controllers
data		data
database		database
docs		docs
models		models
scripts		scripts
services		services
site		site
templates		templates
view		view
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
start.py		start.py

Folders and files

Latest commit

History

Repository files navigation

✨ Awesome Issue Resolution

📖 Abstract

📰 News

This Month's Papers

Recent Updates

📚 Complete Paper List

📊 Evaluation Datasets

🎯 Training Datasets

🤖 Single-Agent Systems

👥 Multi-Agent Systems

🔄 Workflow-Based Methods

🛠️ Tool-Augmented Methods

🧠 Memory-Enhanced Methods

📚 Supervised Fine-Tuning (SFT)

🎮 Reinforcement Learning (RL)

⚡ Inference-Time Scaling

📥 Data Collection Methods

🔬 Data Synthesis Methods

📈 Data Analysis

🔍 Methods Analysis

Evaluation & Training Datasets

Training Trajectory Datasets

SFT-based Methods

RL-based Methods

General Foundation Models

🚀 Quick Start

🤝 Contributing

🌟 Related Work

Code Generation

Automated Software Generation

Automated Software Maintenance

Automated Environment Setup

Related Surveys

📄 Citation

🙏 Acknowledgements

Special Thanks

📬 Contact

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages