OntoLearner is a modular and extensible Python library for ontology learning powered by Large Language Models (LLMs). It provides a unified framework covering the full workflow β from loading and modularizing ontologies to training, predicting, and evaluating learner models across multiple ontology learning tasks.
The framework is built around three core components:
- π§© Ontologizers β load, parse, and modularize ontologies from 150+ ready-to-use sources across 20+ domains.
- π Learning Tasks β support for Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
- π€ Learner Models β plug-and-play LLM, Retriever, and RAG-based learners with a consistent
fit β predict β evaluateinterface.
OntoLearner is available on PyPI and can be installed with pip:
pip install ontolearnerVerify the installation:
import ontolearner
print(ontolearner.__version__)For additional installation options (e.g., from source, with optional dependencies), see the Installation Guide.
| Resource | Description |
|---|---|
| π Documentation | Full documentation website. |
| π€ Datasets on Hugging Face | Curated, machine-readable ontology datasets. |
| π Quickstart | Get started in minutes. |
| πΈοΈ Learning Tasks | Term Typing, Taxonomy Discovery, Relation Extraction, and Text2Onto. |
| π§ Learner Models | LLM, Retriever, and RAG-based learner models. |
| π Ontologies Documentation | Browse 150+ benchmark ontologies across 20+ domains. |
| π§© Ontologizer Guide | How to modularize and preprocess ontologies. |
| π Metrics Dashboard | Explore benchmark ontology metrics and complexity scores. |
- 150+ Ontologizers across 20+ domains (biology, medicine, agriculture, chemistry, law, finance, and more).
- Multiple learning tasks: Term Typing, Taxonomy Discovery, Non-Taxonomic Relation Extraction, and Text2Onto.
- Three learner paradigms: LLM-based, Retriever-based, and Retrieval-Augmented Generation (RAG).
- Hugging Face integration: auto-download ontologies and models directly from the Hub.
- Unified API: consistent
fit β predict β evaluateinterface across all learners. - LearnerPipeline: end-to-end pipeline in a single call.
- Extensible: easily plug in custom ontologies, learners, or retrievers.
Load any of the 150+ built-in ontologies and extract task datasets in just a few lines:
from ontolearner import Wine
# Initialize an ontologizer
ontology = Wine()
# Auto-download from Hugging Face and load
ontology.load()
# Extract learning task datasets
data = ontology.extract()
# Inspect ontology metadata
print(ontology)Explore 150+ ready-to-use ontologies or learn how to work with ontologizers.
Use a dense retriever model to perform non-taxonomic relation extraction:
from ontolearner import AutoRetrieverLearner, AgrO, train_test_split, evaluation_report
# Load and extract ontology data
ontology = AgrO()
ontology.load()
ontological_data = ontology.extract()
# Split into train and test sets
train_data, test_data = train_test_split(ontological_data, test_size=0.2, random_state=42)
# Initialize and load a retriever-based learner
task = 'non-taxonomic-re'
ret_learner = AutoRetrieverLearner(top_k=5)
ret_learner.load(model_id='sentence-transformers/all-MiniLM-L6-v2')
# Fit on training data and predict on test data
ret_learner.fit(train_data, task=task)
predicts = ret_learner.predict(test_data, task=task)
# Evaluate predictions
truth = ret_learner.tasks_ground_truth_former(data=test_data, task=task)
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
print(metrics)Other available learners:
LearnerPipeline consolidates the entire workflow β initialization, training, prediction, and evaluation β into a single call:
from ontolearner import LearnerPipeline, AgrO, train_test_split
# Load ontology and extract data
ontology = AgrO()
ontology.load()
train_data, test_data = train_test_split(
ontology.extract(),
test_size=0.2,
random_state=42
)
# Initialize the pipeline with a dense retriever
pipeline = LearnerPipeline(
retriever_id='sentence-transformers/all-MiniLM-L6-v2',
batch_size=10,
top_k=5
)
# Run: fit β predict β evaluate
outputs = pipeline(
train_data=train_data,
test_data=test_data,
evaluate=True,
task='non-taxonomic-re'
)
print("Metrics:", outputs['metrics'])
print("Elapsed time:", outputs['elapsed_time'])We welcome contributions of all kinds β bug reports, new features, documentation improvements, or new ontologies!
Please review our guidelines before getting started:
- CONTRIBUTING.md β contribution guidelines
- MAINTENANCE.md β ongoing maintenance notes
For bugs or questions, please open an issue in the GitHub Issue Tracker.
If OntoLearner is useful in your research or work, please consider citing one of our publications:
@inproceedings{babaei2023llms4ol,
title = {LLMs4OL: Large Language Models for Ontology Learning},
author = {Babaei Giglou, Hamed and D'Souza, Jennifer and Auer, S{\"o}ren},
booktitle = {International Semantic Web Conference},
pages = {408--427},
year = {2023},
organization = {Springer}
}@software{babaei_giglou_2025_15399783,
author = {Babaei Giglou, Hamed and D'Souza, Jennifer and Aioanei, Andrei
and Mihindukulasooriya, Nandana and Auer, SΓΆren},
title = {OntoLearner: A Modular Python Library for Ontology Learning with LLMs},
month = may,
year = 2025,
publisher = {Zenodo},
version = {v1.3.0},
doi = {10.5281/zenodo.15399783},
url = {https://doi.org/10.5281/zenodo.15399783}
}This software is archived on Zenodo under and is licensed under
.
