gsm8k

Star

Here are 25 public repositories matching this topic...

akjindal53244 / Arithmo

Star

Small and Efficient Mathematical Reasoning LLMs

mathematical-reasoning large-language-models llm gsm8k mistral-7b

Updated Jan 27, 2024
Python

TianHongZXY / CoRe

Star

[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)

nlp acl mcts gpt language-model monte-carlo-tree-search bert reasoning deberta gpt-j math-word-problem gpt-j-6b llms chain-of-thought acl2023 gsm8k

Updated Dec 15, 2023
Python

om-ai-lab / open-agent-leaderboard

Star

Reproducible Language Agent Research

react agent reasoning llm chain-of-thought chatgpt gsm8k program-of-thoughts tree-of-thoughts language-agent graph-of-thoughts doubao agent-leaderboard

Updated Jun 25, 2025
Python

THU-KEG / DICE

Star

DICE: Detecting In-distribution Data Contamination with LLM's Internal State

benchmark data-contamination sft llm gsm8k fine-tuning-llm

Updated Sep 21, 2024
Python

declare-lab / LLM-ReasoningTest

Star

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

reasoning humaneval gsm8k

Updated Apr 21, 2025
Python

HAD653 / gsm8k-cot-120b

Star

Short-CoT distilled GSM8K dataset generated with OpenAI gpt-oss-120b.

math dataset granite reasoning distillation cot chain-of-thought gsm8k gpt-oss-120b

Updated Feb 19, 2026

SuperBruceJia / GSM8K-Consistency

Star

GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.

Updated Dec 31, 2023

msmrexe / llm-math-reasoning-analysis

Star

An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.

Updated Nov 2, 2025
Python

jpordoy / -Dynamic-Multi-Chain-Multi-Path-Reasoning-with-Consensus

Star

Multi-path reasoning with dynamic chains and consensus scoring for improved GSM8K benchmark performance.

consensus claude llm gsm8k multi-path-reasoning

Updated Feb 11, 2026
Jupyter Notebook

DURGESH716 / Creating-Hard-Reasoning-Benchmark

Star

Hard Reasoning Benchmark filtered with disagreement scores

benchmarks mathematical-modelling model-evaluation gsm8k arc-agi model-reliability

Updated Feb 14, 2026
Python

DeveloperZeeshu / Nano_R1-model

Star

Nano R1 Model is an AI-driven reasoning model built using reinforcement learning techniques. It focuses on decision-making and adaptability in dynamic environments, utilizing state-of-the-art machine learning methods to improve over time. Developed with Python and hosted on Hugging Face.

python neural-network pytorch transformer gsm8k unsloth qwen2-5 grpo

Updated Jun 16, 2025

goblinasaddy / nanoJEPA

Star

A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.

deep-learning pytorch transformer research-project representation-learning language-model latent-space gsm8k jepa math-reasoning

Updated Feb 28, 2026
Python

Mantissagithub / alphazero_llm_trainer

Star

AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.

reinforcement-learning pytorch mcts alphazero gsm8k reward-model llm-train mathematical-reas

Updated Dec 2, 2025
Python

strongSoda / prompt-sculpting

Star

You Don't Need Prompt Engineering Anymore: The Prompting Inversion

ai llm prompt-engineering gsm8k gpt5 gpt4o

Updated Oct 28, 2025
Python

HyperKuvid-Labs / alphazero_llm_trainer

Star

AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.

reinforcement-learning pytorch mcts alphazero gsm8k reward-model mathematical-reasonin

Updated Dec 20, 2025
Python

antonisbaro / promptimus-prime

Star

Transforming weak prompts into reasoning machines using Textual Gradients and AdalFlow. Runs on Colab.

transformers pytorch google-colab large-language-models llm prompt-engineering chain-of-thought generative-ai gsm8k prompt-optimization textual-gradients automated-prompt-engineering math-reasoning llm-autodiff adalflow

Updated Jan 28, 2026
Python

ashharz / CoT-on-LLMs

Star

Analysis of CoT and standard prompt techniques on the standard dataset (GSM8K, AQuA, SVAMP) is done using the PaLM (text-bison-001), LLAMA (Llama-2-7B-Chat-GPTQ), LLAMA (Llama-2-13B-Chat-GPTQ) language model.

natural-language-processing deep-learning gpu-computing llm prompting chain-of-thought gsm8k palm-api llama2 svamp

Updated Apr 23, 2024
Jupyter Notebook

Pavansomisetty21 / Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

Sponsor

Star

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset

finetuning gsm8k unsloth gpt-oss gpt-oss-20b

Updated Aug 23, 2025
Jupyter Notebook

manncodes / rlvr-gsm8k-benchmark

Star

Comprehensive benchmarking framework for RLVR/RLHF libraries on GSM8K mathematical reasoning dataset

benchmark machine-learning reinforcement-learning deep-learning evaluation transformers llm rlhf gsm8k rlvr

Updated Oct 7, 2025
Python

Saharsh1005 / autonomous-prompting

Star

Developing an autonomous system for prompt selection for Large Language Models (LLMs), enhancing performance across tasks by balancing generality and specificity. This project automates diverse, high-quality prompt creation and selection, reducing manual intervention and maximizing LLM utility across applications.

self-consistency cot large-language-models llm prompt-engineering chain-of-thought gsm8k

Updated Dec 10, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the gsm8k topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gsm8k topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gsm8k

Here are 25 public repositories matching this topic...

akjindal53244 / Arithmo

TianHongZXY / CoRe

om-ai-lab / open-agent-leaderboard

THU-KEG / DICE

declare-lab / LLM-ReasoningTest

HAD653 / gsm8k-cot-120b

SuperBruceJia / GSM8K-Consistency

msmrexe / llm-math-reasoning-analysis

jpordoy / -Dynamic-Multi-Chain-Multi-Path-Reasoning-with-Consensus

DURGESH716 / Creating-Hard-Reasoning-Benchmark

DeveloperZeeshu / Nano_R1-model

goblinasaddy / nanoJEPA

Mantissagithub / alphazero_llm_trainer

strongSoda / prompt-sculpting

HyperKuvid-Labs / alphazero_llm_trainer

antonisbaro / promptimus-prime

ashharz / CoT-on-LLMs

Pavansomisetty21 / Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

manncodes / rlvr-gsm8k-benchmark

Saharsh1005 / autonomous-prompting

Improve this page

Add this topic to your repo