Skip to content

DeepSoftwareAnalytics/Awesome-Issue-Resolution

Repository files navigation

✨ Awesome Issue Resolution

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey

GitHub Stars Forks Awesome Paper arXiv Hugging Face Tables Contributors Papers Count

📖 Documentation Website | 📄 Full Paper | 📋 Tables & Resources

🤗 HF Paper: https://huggingface.co/papers/2601.11655 (Upvotes appreciated! ⬆️)

🎙️ Interactive Exploration:

NotebookLM Discord Issues

Awesome Issue Resolution

📖 Abstract

Based on a systematic review of 196 papers and online resources, this survey establishes a holistic theoretical framework for Issue Resolution in software engineering. We examine how Large Language Models (LLMs) are transforming the automation of GitHub issue resolution. Beyond the theoretical analysis, we have curated a comprehensive collection of datasets and model training resources, which are continuously synchronized with our GitHub repository and project documentation website.

🔎 Browse & Export: The full paper database is searchable and exportable at deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/admin/ — filter by category, date, or keyword, and export results as CSV.

📰 News

This Month's Papers

2 paper(s) — 2026-03

  • BeyondSWE: BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? arXiv
  • SWE-Adept: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution arXiv

Recent Updates

  • Survey Update (2026-02): Added 21 new papers covering the latest advances in LLM-based issue resolution!
  • Survey Launch (2026-01): Our survey paper is now publicly available on arXiv: https://arxiv.org/abs/2601.11655. It covers 175 papers and resources on LLM-based GitHub issue resolution, with continuously updated datasets and leaderboards!

Admin Interface Demo

🔍 Explore This Survey:


📚 Complete Paper List

Total: 196 works across 14 categories

📊 Evaluation Datasets

Benchmarks for evaluating issue resolution systems

  • (2026-02) SWE Context Bench: SWE Context Bench: A Benchmark for Context Learning in Coding arXiv
  • (2025-12) SWE-InfraBench: SWE-InfraBench: Evaluating Language Models on Cloud Infrastructure Code OpenReview
  • (2025-12) SWE-EVO: SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios arXiv
  • (2025-11) SWE-Sharp-Bench: SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks arXiv
  • (2025-11) SWE-fficiency: SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? arXiv
  • (2025-11) SWE-Compass: SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models arXiv
  • (2025-09) SWE-Bench Pro: SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? arXiv
  • (2025-07) SWE-Perf: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? arXiv OpenReview
  • (2025-05) SwingArena: SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving arXiv
  • (2025-05) OmniGIRL: Omnigirl: A multilingual and multimodal benchmark for github issue resolution arXiv
  • (2025-05) SWE-bench-Live: SWE-bench Goes Live! arXiv OpenReview
  • (2025-04) Multi-SWE-bench: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving arXiv OpenReview
  • (2025-04) SWE-PolyBench: SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents arXiv
  • (2025-04) SWE-bench Multilingual: SWE-smith: Scaling Data for Software Engineering Agents arXiv OpenReview
  • (2025-03) FEA-Bench: FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation arXiv
  • (2025-02) SWE-Lancer: SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? arXiv
  • (2024-12) Visual SWE-bench: CodeV: Issue Resolving with Visual Data arXiv DOI
  • (2024-10) SWE-bench Multimodal: SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? arXiv OpenReview
  • (2024-08) SWE-bench-java: SWE-bench-java: A GitHub Issue Resolving Benchmark for Java arXiv

🎯 Training Datasets

Datasets for training issue resolution agents

  • (2026-02) SWE-Universe: SWE-Universe: Scale Real-World Verifiable Environments to Millions arXiv
  • (2025-06) Skywork-SWE: Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs arXiv
  • (2025-05) SWELoc: SweRank: Software Issue Localization with Code Ranking arXiv
  • (2025-04) Multi-SWE-RL: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving arXiv OpenReview
  • (2025-04) SWE-Smith: SWE-smith: Scaling Data for Software Engineering Agents arXiv OpenReview
  • (2025-02) LocAgent: OrcaLoca: An LLM Agent Framework for Software Issue Localization arXiv OpenReview
  • (2025-01) SWE-Fixer: SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution arXiv
  • (2023-10) SWE-bench-extra: SWE-bench: Can Language Models Resolve Real-world Github Issues? arXiv

🤖 Single-Agent Systems

Individual autonomous agents for issue resolution

  • (2025-12) Confucius Code Agent: Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases arXiv
  • (2025-10) TOM-SWE: TOM-SWE: User Mental Modeling For Software Engineering Agents arXiv
  • (2025-09) Lita: Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs arXiv
  • (2025-08) Live-SWE-agent: SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents arXiv OpenReview
  • (2025-07) Trae Agent: Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling arXiv
  • (2025-05) LCLM: Putting It All into Context: Simplifying Agents with LCLMs arXiv
  • (2025-02) PatchPilot: PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification arXiv OpenReview
  • (2024-05) SWE-agent: Swe-agent: Agent-computer interfaces enable automated software engineering arXiv
  • (2024-03) Devin: SWE-bench technical report Website
  • (2023-06) Aider Website GitHub

👥 Multi-Agent Systems

Collaborative multi-agent frameworks

  • (2025-08) Meta-RAG: Meta-RAG on Large Codebases Using Code Summarization arXiv
  • (2025-07) SWE-Debate: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution arXiv
  • (2025-06) AgentScope: SWE-Bench - AgentScope Website
  • (2025-05) Devlo: Achieving SOTA on SWE-bench Website
  • (2025-05) Refact.ai Agent: AI Coding Agent for Software Development - Refact.ai Website
  • (2025-03) Lingxi: Lingxi/docs/Lingxi Technical Report 2505.pdf at master · lingxi-agent/Lingxi GitHub
  • (2025-02) OrcaLora: OrcaLoca: An LLM Agent Framework for Software Issue Localization arXiv OpenReview
  • (2025-01) CodeCoR: CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation arXiv
  • (2024-09) MarsCode Agent: MarsCode Agent: AI-native Automated Bug Fixing arXiv
  • (2024-09) HyperAgent: HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale arXiv
  • (2024-08) DEI: Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents arXiv OpenReview
  • (2024-07) OpenHands: OpenHands: An Open Platform for AI Software Developers as Generalist Agents arXiv OpenReview
  • (2024-06) CodeR: CodeR: Issue Resolving with Multi-Agent and Task Graphs arXiv
  • (2024-04) AutoCodeRover: AutoCodeRover: Autonomous Program Improvement arXiv DOI
  • (2024-03) MAGIS: MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution arXiv OpenReview

🔄 Workflow-Based Methods

Structured pipeline approaches

  • (2025-07) SynFix: SynFix: Dependency-Aware Program Repair via RelationGraph Analysis DOI Website
  • (2025-06) GUIRepair: Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing arXiv
  • (2024-12) CodeV: CodeV: Issue Resolving with Visual Data arXiv DOI
  • (2024-10) Conversational Pipeline: Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench arXiv
  • (2024-07) Agentless: Demystifying LLM-Based Software Engineering Agents arXiv Website

🛠️ Tool-Augmented Methods

Methods leveraging external tools

  • (2026-02) Closing the Loop: Closing the Loop: Universal Repository Representation with RPG-Encoder arXiv Website GitHub
  • (2026-01) SWE-Tester: SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories arXiv
  • (2025-12) GraphLocator: GraphLocator: Graph-guided Causal Reasoning for Issue Localization arXiv
  • (2025-11) InfCode: InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution arXiv
  • (2025-10) BugPilot: BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills arXiv
  • (2025-10) TestPrune: When Old Meets New: Evaluating the Impact of Regression Tests on SWE Issue Resolution arXiv
  • (2025-09) Nemotron-CORTEXA: Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity OpenReview Website
  • (2025-08) Git Context Controller: Git Context Controller: Manage the Context of LLM-based Agents like Git arXiv
  • (2025-07) Prometheus: Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases arXiv
  • (2025-06) SACL: SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization arXiv
  • (2025-06) OpenHands-Versa: Coding Agents with Multimodal Browsing are Generalist Problem Solvers arXiv
  • (2025-06) SemAgent: SemAgent: A Semantics Aware Program Repair Agent arXiv
  • (2025-06) Repeton: Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles arXiv
  • (2025-06) cAST: cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree arXiv
  • (2025-05) InfantAgent-Next: InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction arXiv
  • (2025-05) SWERank: SweRank: Software Issue Localization with Code Ranking arXiv
  • (2025-03) DARS: DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal arXiv
  • (2025-03) Issue2Test: Issue2Test: Generating Reproducing Test Cases from Issue Reports arXiv
  • (2025-03) KGCompass: Enhancing repository-level software repair via repository-aware knowledge graphs arXiv
  • (2025-03) CoSIL: Issue Localization via LLM-Driven Iterative Code Graph Searching arXiv
  • (2025-02) OrcaLoca: OrcaLoca: An LLM Agent Framework for Software Issue Localization arXiv OpenReview
  • (2025-02) Otter: Otter: Generating Tests from Issues to Validate SWE Patches arXiv OpenReview
  • (2025-02) Quadropic Insiders: Quadropic Insiders : Syntheo Tops Swelite Feb Website
  • (2024-12) CoRNStack: CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking arXiv OpenReview
  • (2024-11) AEGIS: AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions arXiv
  • (2024-10) RepoGraph: RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph arXiv
  • (2024-09) SuperCoder2.0: SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer arXiv
  • (2024-08) SpecRover: SpecRover: Code Intent Extraction via LLMs arXiv
  • (2024-06) Alibaba LingmaAgent: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration arXiv DOI

🧠 Memory-Enhanced Methods

Systems with memory mechanisms

  • (2026-01) MemGovern: MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences arXiv
  • (2025-10) RepoMem: Improving Code Localization with Repository Memory arXiv
  • (2025-09) AgentDiet: Improving the Efficiency of LLM Agent Systems through Trajectory Reduction arXiv
  • (2025-07) Agent KB: Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving arXiv
  • (2025-07) SWE-Exp: SWE-Exp: Experience-Driven Software Issue Resolution arXiv
  • (2025-06) ExpeRepair: EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair arXiv
  • (2025-05) DGM: Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents arXiv
  • (2024-11) Infant Agent: Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage arXiv
  • (2024-11) EvoCoder: LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues arXiv

📚 Supervised Fine-Tuning (SFT)

Models trained via supervised learning

  • (2026-01) SWE-Lego: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving arXiv
  • (2026-01) SWE-Replay: SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents arXiv
  • (2025-12) SWE-Compressor: Context as a Tool: Context Management for Long-Horizon SWE-Agents arXiv
  • (2025-09) Devstral: Devstral: Fine-tuning Language Models for Coding Agent Applications arXiv
  • (2025-06) MCTS-Refined CoT: MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution arXiv
  • (2025-05) Search for training: Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents arXiv
  • (2025-05) Co-PatcheR: Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models arXiv
  • (2025-05) CGM: Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks arXiv GitHub HuggingFace
  • (2025-03) Thinking Longer: Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute arXiv
  • (2024-12) ReSAT: Repository Structure-Aware Training Makes SLMs Better Issue Resolver arXiv
  • (2024-12) Scaling data collection: Scaling Data Collection for Training SWE Agents Website
  • (2024-12) SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym arXiv
  • (2024-11) Lingma SWE-GPT: SWE-GPT: A Process-Centric Language Model for Automated Software Improvement arXiv DOI GitHub
  • (2024-11) CodeXEmbed: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval arXiv OpenReview

🎮 Reinforcement Learning (RL)

Models trained via reinforcement learning

  • (2026-02) SWE-Master: SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training arXiv GitHub
  • (2026-02) SWE-Protégé: SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents arXiv
  • (2026-02) SWE-MiniSandbox: SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents arXiv GitHub
  • (2026-01) MiMo-V2-Flash: MiMo-V2-Flash Technical Report arXiv
  • (2025-12) Self-play SWE-RL: Toward Training Superintelligent Software Agents through Self-Play SWE-RL arXiv
  • (2025-12) SWE-Playground: Training Versatile Coding Agents in Synthetic Environments arXiv
  • (2025-12) SWE-RM: SWE-RM: Execution-free Feedback For Software Engineering Agents arXiv
  • (2025-12) One Tool Is Enough: One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents arXiv
  • (2025-12) Let It Flow: Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem arXiv
  • (2025-12) Deepseek V3.2: DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models arXiv
  • (2025-11) TSP: Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair DOI
  • (2025-10) CWM: CWM: An Open-Weights LLM for Research on Code Generation with World Models arXiv
  • (2025-10) FoldGRPO: Scaling Long-Horizon LLM Agent via Context-Folding arXiv
  • (2025-10) GRPO-based Method: A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning arXiv OpenReview Website
  • (2025-10) Supervised RL: Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning arXiv
  • (2025-10) KAT-Coder: KAT-Coder Technical Report arXiv
  • (2025-09) CoreThink: CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs arXiv
  • (2025-09) EntroPO: Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization arXiv
  • (2025-09) Kimi-Dev: Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents arXiv
  • (2025-09) LongCat-Flash-Think: Introducing LongCat-Flash-Thinking: A Technical Report arXiv
  • (2025-08) Tool-integrated RL: Tool-integrated Reinforcement Learning for Repo Deep Search arXiv
  • (2025-08) SWE-Swiss: SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution Website
  • (2025-08) SeamlessFlow: SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling arXiv
  • (2025-08) DAPO: Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning arXiv
  • (2025-08) GLM-4.6: gpt-oss-120b & gpt-oss-20b model card arXiv
  • (2025-07) DeepSWE: DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL Website
  • (2025-07) Kimi-K2-Instruct: Kimi K2: Open Agentic Intelligence arXiv
  • (2025-06) Agent-RLVR: Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards arXiv
  • (2025-06) SWE-Dev2: SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling arXiv
  • (2025-06) Minimax M2: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention arXiv
  • (2025-05) SWE-Dev1: SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development arXiv
  • (2025-05) Satori-SWE: Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering arXiv
  • (2025-05) Qwen3-Coder: Qwen3 Technical Report arXiv
  • (2025-04) Seed1.5-Thinking: Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning arXiv
  • (2025-03) SEAlign: SEAlign: Alignment Training for Software Engineering Agent arXiv DOI
  • (2025-02) SWE-RL: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution arXiv OpenReview
  • (2025-02) SoRFT: SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning arXiv
  • (2024-10) OSCA: Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation arXiv DOI

⚡ Inference-Time Scaling

Methods for scaling at inference time

  • (2026-01) Agentic Rubrics: Agentic Rubrics as Contextual Verifiers for SWE Agents arXiv Website
  • (2025-10) SIADAFIX: SIADAFIX: issue description response for adaptive program repair arXiv
  • (2025-09) SWE-PRM: When Agents go Astray: Course-Correcting SWE Agents with PRMs arXiv
  • (2025-01) ReasoningBank: CodeMonkeys: Scaling Test-Time Compute for Software Engineering arXiv
  • (2024-10) SWE-Search: SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement arXiv OpenReview

📥 Data Collection Methods

Techniques for collecting training data

  • (2026-02) DockSmith: DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder arXiv HuggingFace
  • (2026-01) MEnvAgent: MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering arXiv GitHub
  • (2025-12) Multi-Docker-Eval: Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering arXiv
  • (2025-08) RepoForge: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale arXiv
  • (2025-07) SWE-MERA: SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks arXiv
  • (2025-06) SWE-Factory: SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks arXiv
  • (2025-05) SWE-rebench: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents arXiv OpenReview
  • (2025-05) RepoLaunch: SWE-bench Goes Live! arXiv OpenReview

🔬 Data Synthesis Methods

Approaches for synthetic data generation

  • (2026-02) SWE-World: SWE-World: Building Software Engineering Agents in Docker-Free Environments arXiv GitHub
  • (2025-09) SWE-Mirror: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories arXiv
  • (2025-06) SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner arXiv OpenReview
  • (2025-04) R2E-Gym: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents arXiv OpenReview
  • (2025-04) SWE-Synth: SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs arXiv
  • (2025-04) SWE-smith: SWE-smith: Scaling Data for Software Engineering Agents arXiv OpenReview
  • (2025-01) Learn-by-interact: Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments arXiv OpenReview

📈 Data Analysis

Analysis of datasets and benchmarks

  • (2025-12) Data contamination: Does SWE-Bench-Verified Test Agent Ability or Model Memory? arXiv
  • (2025-07) Rigorous agentic benchmarks: Establishing Best Practices for Building Rigorous Agentic Benchmarks arXiv
  • (2025-07) SPICE: SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation arXiv
  • (2025-06) UTBoost: UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench arXiv
  • (2025-06) Trustworthiness: Is Your Automated Software Engineer Trustworthy? arXiv
  • (2025-06) The SWE-Bench Illusion: The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason arXiv
  • (2025-04) Revisiting SWE-Bench: Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code Models DOI
  • (2025-03) Patch Correctness: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study arXiv
  • (2024-08) SWE-bench Verified: Introducing SWE-bench Verified | OpenAI Website

🔍 Methods Analysis

Comparative analysis of different methods

  • (2025-12) SWEnergy: SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs arXiv
  • (2025-09) Failures analysis: An Empirical Study on Failures in Automated Issue Solving arXiv
  • (2025-07) Security analysis: How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench arXiv
  • (2025-06) Dissecting the SWE-Bench Leaderboards: Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems arXiv
  • (2025-05) GSO: GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents arXiv
  • (2025-05) Strong-Weak Model Collaboration: An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation arXiv
  • (2025-05) Agents in the Wild Website
  • (2025-04) SeaView: SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow arXiv
  • (2025-03) Beyond final code: Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios arXiv
  • (2025-02) Overthinking: The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks arXiv
  • (2024-10) Evaluating software development agents: Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios arXiv DOI
  • (2024-06) Context Retrieval: On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing arXiv

Evaluation & Training Datasets

A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.

Dataset Language Multimodal Repos Amount Environment Link
Single-PL Datasets
SWE-Fixer Python No 856 115,406 No GitHub HuggingFace HuggingFace
SWE-smith Python No 128 50k Yes GitHub HuggingFace
SWE-Lego Python No 3,251 32,119 Yes GitHub HuggingFace
SWE-rebench Python No 3,468 21,336 Yes GitHub HuggingFace
SWE-bench-train Python No 37 19k No GitHub HuggingFace
SWE-Flow Python No 74 18,081 Yes GitHub
Skywork-SWE Python No 2,531 10,169 Yes -
R2E-Gym Python No 10 8,135 Yes GitHub HuggingFace
RepoForge Python No - 7.3k Yes -
SWE-bench-extra Python No 2k 6.38k Yes HuggingFace
SWE-Gym Python No 11 2,438 Yes GitHub HuggingFace
SWE-bench Python No 12 2,294 Yes GitHub HuggingFace
SWE-bench-java Java No 19 1,797 Yes GitHub HuggingFace
FEA-bench Python No 83 1,401 Yes GitHub HuggingFace
SWE-bench-Live Python No 164 1,565 Yes GitHub HuggingFace
Loc-Bench Python No - 560 No GitHub HuggingFace
SWE-bench Verified Python No - 500 Yes GitHub HuggingFace
SWE-bench Lite Python No 12 300 Yes GitHub HuggingFace
SWE-MERA Python No 200 300 Yes GitHub HuggingFace
SWE-Bench-CL Python No 8 273 Yes GitHub
SWE-Sharp-Bench C# No 17 150 Yes GitHub HuggingFace
SWE-Perf Python No 12 140 Yes GitHub HuggingFace
Visual SWE-bench Python Yes 11 133 Yes GitHub HuggingFace
SWE-EVO Python No 7 48 Yes GitHub
Multi-PL Datasets
SWE-Mirror Python, Rust, Go No 40 60k Yes -
Multi-SWE-bench Java, JS, TS, Go, Rust, C, C++ No 76 4,723 Yes GitHub HuggingFace
Swing-Bench Python, Go, C++, Rust No 400 2300 Yes -
SWE-PolyBench Python, Java, JS, TS No 21 2,110 Yes GitHub HuggingFace HuggingFace
SWE-Compass Python, JS, TS, Java, C, C++, Go, Rust, Kotlin, C# No - 2,000 Yes GitHub HuggingFace
SWE-Bench Pro Python, Go, TS No 41 1,865 Yes GitHub HuggingFace
SWE-bench++ Python, Go, TS, JS, Ruby, PHP, Java, Rust, C++, C#, C No 3,971 1,782 Yes GitHub HuggingFace
SWE-Lancer JS, TS No - 1,488 Yes GitHub
OmniGIRL Python, TS, Java, JS Yes 15 959 Yes GitHub HuggingFace
SWE-bench Multimodal JS, TS, HTML, CSS Yes 17 619 Yes GitHub HuggingFace
SWE-fficiency Python, Cython No 9 498 Yes GitHub
SWE-Factory Python, Java, JS, TS No 12 430 Yes GitHub HuggingFace
SWE-bench-Live-MultiLang & Windows Python, JS, TS, C, C++, C#, Java, Go, Rust No 238 418 Yes GitHub HuggingFace HuggingFace
SWE-bench Multilingual C, C++, Go, Java, JS, TS, Rust, Python, Ruby, PHP No 42 300 Yes GitHub HuggingFace
SWE-InfraBench Python, TS No - 100 Yes -

Training Trajectory Datasets

A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.

Dataset Language Repos Amount Link
SWE-Fixer Python 856 69,752 GitHub HuggingFace
SWE-rebench Python 1,823 67,074 HuggingFace
R2E-Gym Python 10 3,321 GitHub HuggingFace
SWE-Synth Python 11 3,018 GitHub HuggingFace
SWE-Factory Python 10 2,809 GitHub HuggingFace
SWE-Gym Python 11 491 GitHub HuggingFace
SWE-Lego Python 3251 14.6k GitHub

SFT-based Methods

Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).

Model Name Base Model Size Arch. Training Scaffold Res.(%) Code Data Model
SWE-rebench-openhands-Qwen3-235B-A22B Qwen3-235B-A22B 235B-A22B MoE OpenHands 59.9 - HuggingFace HuggingFace
SWE-Lego-Qwen3-32B Qwen3-32B 32B Dense OpenHands 57.6 GitHub HuggingFace HuggingFace
CGM-SWE-PY Qwen2.5-Coder-72B 72B Dense Graph RAG 50.4 GitHub - HuggingFace
SWE-rebench-openhands-Qwen3-30B-A3B Qwen3-30B-A3B 30B-A3B MoE OpenHands 49.7 - HuggingFace HuggingFace
Devstral Mistral Small 3 22B Dense OpenHands 46.8 - - HuggingFace
Co-PatcheR Qwen2.5-Coder-14B 3$\times$14B Dense PatchPilot-mini 46.0 GitHub - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense Agentless 45.0 GitHub HuggingFace HuggingFace
SWE-Lego-Qwen3-8B Qwen3-8B 8B Dense OpenHands 44.4 GitHub HuggingFace HuggingFace
Lingma SWE-GPT Qwen2.5-72B-Instruct 72B Dense SWESynInfer 30.2 GitHub - -
SWE-Gym-Qwen-32B Qwen2.5-Coder-32B 32B Dense OpenHands, MoatlessTools 20.6 GitHub - HuggingFace
SWE-Gym-Qwen-14B Qwen2.5-Coder-14B 14B Dense OpenHands, MoatlessTools 16.4 GitHub - HuggingFace
SWE-Gym-Qwen-7B Qwen2.5-Coder-7B 7B Dense OpenHands, MoatlessTools 10.6 GitHub - HuggingFace

RL-based Methods

A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.

Model Name Base Model Size Arch. Train. Scaffold Reward Res.(%) Code Data Model
560B Models (MoE)
LongCat-Flash-Think LongCatFlash-Base 560B-A27B MoE R2E-Gym Outcome 60.4 GitHub - HuggingFace
72B Models
Kimi-Dev Qwen 2.5-72B-Base 72B Dense BugFixer + TestWriter Outcome 60.4 GitHub - HuggingFace
SWE-RL Llama-3.3-70B-Instruct 70B Dense Agentless-mini Outcome 41.0 GitHub - -
Multi-turn RL(Nebius) Qwen2.5-72B-Instruct 72B Dense SWE-agent Outcome 39.0 - - -
Agent-RLVR-RM-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 27.8 - - -
Agent-RLVR-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 22.4 - - -
32B Models
OpenHands Critic Qwen2.5-Coder-32B 32B Dense SWE-Gym - 66.4 GitHub - HuggingFace
KAT-Dev-32B Qwen3-32B 32B Dense - - 62.4 - - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense - Outcome 60.2 GitHub HuggingFace HuggingFace
FoldAgent Seed-OSS-36B-Instruct 36B Dense FoldAgent Process 58.0 GitHub - -
SeamlessFlow-32B Qwen3-32B 32B Dense SWE-agent Outcome 45.8 GitHub - -
DeepSWE Qwen3-32B 32B Dense R2E-Gym Outcome 42.2 GitHub HuggingFace HuggingFace
SA-SWE-32B - 32B Dense SkyRL-Agent - 39.4 - - -
OpenHands LM v0.1 Qwen2.5-Coder-32B 32B Dense SWE-Gym - 37.2 GitHub - HuggingFace
SWE-Dev-32B Qwen2.5-Coder-32B 32B Dense OpenHands Outcome 36.6 GitHub - HuggingFace
Satori-SWE Qwen2.5-Coder-32B 32B Dense Retriever + Code editor Outcome 35.8 GitHub HuggingFace HuggingFace
SoRFT-32B Qwen2.5-Coder-32B 32B Dense Agentless Outcome 30.8 - - -
Agent-RLVR-32B Qwen2.5-Coder-32B 32B Dense Localization + Repair Outcome 21.6 - - -
14B Models
Agent-RLVR-14B Qwen2.5-Coder-14B 14B Dense Localization + Repair Outcome 18.0 - - -
SEAlign-14B Qwen2.5-Coder-14B 14B Dense OpenHands Process 17.7 - - -
7-8B Models
SeamlessFlow-8B Qwen3-8B 8B Dense SWE-agent Outcome 27.4 GitHub - -
SWE-Dev-7B Qwen2.5-Coder-7B 7B Dense OpenHands Outcome 23.4 GitHub - HuggingFace
SoRFT-7B Qwen2.5-Coder-7B 7B Dense Agentless Outcome 21.4 - - -
SWE-Dev-8B Llama-3.1-8B 8B Dense OpenHands Outcome 18.0 GitHub - HuggingFace
SEAlign-7B Qwen2.5-Coder-7B 7B Dense OpenHands Process 15.0 - - -
SWE-Dev-9B GLM-4-9B 9B Dense OpenHands Outcome 13.6 GitHub - HuggingFace

General Foundation Models

Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.

Model Name Size Arch. Inf. Scaffold Reward Res.(%) Code Model
KAT-Coder - - Claude Code Outcome 73.4 - Website
MiMo-V2-Flash 309B-A15B MoE Agentless Outcome 73.4 GitHub HuggingFace
Deepseek V3.2 671B-A37B MoE Claude Code, RooCode - 73.1 GitHub HuggingFace
Kimi-K2-Instruct 1T MoE Agentless Outcome 71.6 - HuggingFace
Qwen3-Coder 480B-A35B MoE OpenHands Outcome 69.6 GitHub HuggingFace
GLM-4.6 355B-A32B MoE OpenHands Outcome 68.0 - HuggingFace
gpt-oss-120b 116.8B-A5.1B MoE Internal tool Outcome 62.0 GitHub HuggingFace
Minimax M2 230B-10B MoE R2E-Gym Outcome 61.0 GitHub HuggingFace
gpt-oss-20b 20.9B-A3.6B MoE Internal tool Outcome 60.0 GitHub HuggingFace
GLM-4.5-Air 106B-A12B MoE OpenHands Outcome 57.6 - -
Minimax M1-80k 456B-A45.9B MoE Agentless Outcome 56.0 GitHub Website
Minimax M1-40k 456B-A45.9B MoE Agentless Outcome 55.6 GitHub Website
Seed1.5-Thinking 200B-A20B MoE - Outcome 47.0 GitHub -
Llama 4 Maverick 400B-A17B MoE mini-SWE-agent Outcome 21.0 GitHub HuggingFace
Llama 4 Scout 109B-17B MoE mini-SWE-agent Outcome 9.1 GitHub HuggingFace


🚀 Quick Start

# First time: install dependencies
pip install flask flask-cors sqlalchemy pyyaml requests

# Full update + start admin server
# (refreshes news, re-renders README/docs, builds static site, then serves)
python start.py

# Or force re-import from YAML/CSV first
python start.py --init

Open http://localhost:5000/admin to manage papers, datasets, and methods.

Command Description
python start.py Full update (news + render + build) then start server
python start.py --init Re-import from YAML/CSV, then full update + start
python start.py --no-update Start server without running update steps
python start.py --port 8080 Use a custom port
python start.py --news Refresh This Month's Papers only and exit
python start.py --render Re-render README/docs from DB only and exit
python start.py --build Build static site (mkdocs) only and exit


🤝 Contributing

We welcome contributions! To add new papers or tables:

  1. Fork this repository
  2. Add entries via the admin interface (python start.pylocalhost:5000/admin)
    — or manually edit the YAML/CSV files in data/
  3. Run python start.py --init if you edited files directly
  4. Submit a PR with your changes

🌟 Related Work

Code Generation

The application of LLMs in the programming domain has witnessed explosive growth. Early research focused primarily on function-level code generation, with benchmarks such as HumanEval serving as standard metrics. However, generic benchmarks often fail to capture the nuances of real-world development. To bridge this gap, recent initiatives have attempted to extend evaluation tasks to align more closely with realistic software development scenarios, revealing the limitations of general models in specialized domains. Concurrently, methods are also evolving to capture these broader contexts. While foundational approaches primarily relied on SFT or standard retrieval-augmented generation, RL-based methods emerged as a pivotal direction for handling complex coding tasks.

Related:

Automated Software Generation

The primary goal of this task is to autonomously construct complete and executable software systems starting from high-level natural language requirements. Unlike code completion, it necessitates covering the Software Development Life Cycle (SDLC), including requirement analysis, system design, coding, and testing. To address the complexity and potential logic inconsistencies in this process, state-of-the-art frameworks leverage multi-agent collaboration, simulating human development teams to decompose complex tasks into streamlined and verifiable workflows.

Related:

Automated Software Maintenance

Issue resolution is intrinsically linked to the broader domain of automated software maintenance. Methodologies established in this field are frequently encapsulated as callable tools to augment the capabilities of LLMs in software development tasks.

Related:

Automated Environment Setup

Recent initiatives focus on automating the configuration of runtime environments for entire repositories. This capability develops in parallel with data construction for issue resolution.

Related:

Related Surveys

Existing surveys primarily focus on code generation or other tasks within the software engineering domain. This paper bridges this gap by offering the first systematic survey dedicated to the entire spectrum of issue resolution, ranging from non-agent approaches to the latest agentic advancements.

Related:


📄 Citation

If you use this project or related survey in your research or system, please cite the following:

Li, Caihua, Guo, Lianghong, Wang, Yanlin, et al. (2026). Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey. arXiv preprint arXiv:2601.11655.

BibTeX:

@article{li2026advances,
  title={Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey},
  author={Li, Caihua and Guo, Lianghong and Wang, Yanlin and Guo, Daya and Tao, Wei and Shan, Zhenyu and Liu, Mingwei and Chen, Jiachi and Song, Haoyu and Tang, Duyu and Zhang, Hongyu and Zheng, Zibin},
  journal={arXiv preprint arXiv:2601.11655},
  year={2026},
  eprint={2601.11655},
  archivePrefix={arXiv},
  primaryClass={cs.SE}
}

🙏 Acknowledgements

We would like to express our sincere gratitude to:

  • The authors of cited papers who provided valuable feedback on how their work is presented in this survey, greatly improving its accuracy and comprehensiveness.

  • All contributors who have helped improve this project through issues, pull requests, and discussions.

  • The open-source community for developing the amazing tools and frameworks that made this project possible.

Special Thanks

  • @chao-peng (Dr. Chao Peng), ByteDance Software Engineering Lab, for providing valuable suggestions on the Challenges and Opportunities section of our survey.

  • @EuniAI/awesome-code-agents for providing an excellent reference on managing survey papers through documentation systems and inspiring our project structure.


📬 Contact

If you have any questions or suggestions, please contact us through:


📜 License

This project is licensed under the MIT License - see the LICENSE file for details.


⭐ Star this repository if you find it helpful!

Made with ❤️ by the DeepSoftwareAnalytics team

Documentation | Paper | Tables | About | Cite

About

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors