Deliver safe & effective language models
-
Updated
Feb 19, 2026 - Python
Deliver safe & effective language models
Python SDK for running evaluations on LLM generated responses
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
[ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"
scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
Measure of estimated confidence for non-hallucinative nature of outputs generated by Transformer-based Language Models.
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
LLM-based chatbot using RAG to guide people around the KIT campus using natural language
A PHP package for evaluating LLM outputs. Test your prompts, validate responses, and ensure your AI features work correctly.
Tools for systematic large language model evaluations
Indic evals for quantised models AWQ / GPTQ / EXL2
Production-ready LLM evaluation & guardrails toolkit (provider-agnostic). Generate explainable metrics and ALLOW/WARN/BLOCK recommendations.
Add a description, image, and links to the llm-evaluation-toolkit topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation-toolkit topic, visit your repo's landing page and select "manage topics."