Small and Efficient Mathematical Reasoning LLMs
-
Updated
Jan 27, 2024 - Python
Small and Efficient Mathematical Reasoning LLMs
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)
Reproducible Language Agent Research
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
Short-CoT distilled GSM8K dataset generated with OpenAI gpt-oss-120b.
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.
Multi-path reasoning with dynamic chains and consensus scoring for improved GSM8K benchmark performance.
Hard Reasoning Benchmark filtered with disagreement scores
Nano R1 Model is an AI-driven reasoning model built using reinforcement learning techniques. It focuses on decision-making and adaptability in dynamic environments, utilizing state-of-the-art machine learning methods to improve over time. Developed with Python and hosted on Hugging Face.
A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.
AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.
AlphaZero-style RL training for LLMs using MCTS on mathematical reasoning tasks (GSM8K). Student model explores reasoning paths guided by teacher ensembles and reward signals.
Transforming weak prompts into reasoning machines using Textual Gradients and AdalFlow. Runs on Colab.
Analysis of CoT and standard prompt techniques on the standard dataset (GSM8K, AQuA, SVAMP) is done using the PaLM (text-bison-001), LLAMA (Llama-2-7B-Chat-GPTQ), LLAMA (Llama-2-13B-Chat-GPTQ) language model.
In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset
Comprehensive benchmarking framework for RLVR/RLHF libraries on GSM8K mathematical reasoning dataset
Developing an autonomous system for prompt selection for Large Language Models (LLMs), enhancing performance across tasks by balancing generality and specificity. This project automates diverse, high-quality prompt creation and selection, reducing manual intervention and maximizing LLM utility across applications.
Add a description, image, and links to the gsm8k topic page so that developers can more easily learn about it.
To associate your repository with the gsm8k topic, visit your repo's landing page and select "manage topics."