Note
This documentation may contain AI-generated content. While we strive for accuracy, there might be inaccuracies. Please report any issues via:
- GitHub Issues
- Community contribution (PRs welcome!)
- do_translate_async_stream is the low-level async entrypoint that translates a single PDF and yields a stream of events (progress/error/finish).
- It is suitable for building your own UI or CLI where you want real-time progress and full control over results.
- It accepts a validated SettingsModel and a file path and returns an async generator of dict events.
- Import:
from pdf2zh_next.high_level import do_translate_async_stream - Call:
async for event in do_translate_async_stream(settings, file): ... - Parameters:
- settings: SettingsModel. Must be valid; the function will call
settings.validate_settings(). - file: str | pathlib.Path. The single PDF to translate. Must exist.
- settings: SettingsModel. Must be valid; the function will call
Note:
settings.basic.input_filesis ignored by this function; only the givenfileis translated.- If
settings.basic.debugis True, translation runs in the main process; otherwise it runs in a subprocess. Event schema is identical for both.
The async generator yields JSON-like dict events with the following types:
-
Stage summary event:
stage_summary(optional, may appear first)- Fields
type: "stage_summary"stages: list of objects{ "name": str, "percent": float }describing the estimated work distributionpart_index: may be 0 for this summary eventtotal_parts: total number of parts (>= 1)
- Fields
-
Progress events:
progress_start,progress_update,progress_end- Common fields
type: one of the abovestage: human-readable stage name (e.g., "Parse PDF and Create Intermediate Representation", "Translate Paragraphs", "Save PDF")stage_progress: float in [0, 100] indicating progress within the current stageoverall_progress: float in [0, 100] indicating overall progresspart_index: current part index (typically 1-based for progress events)total_parts: total number of parts (>= 1). Large documents may be split automatically.stage_current: current step within the stagestage_total: total steps within the stage
- Common fields
-
Finish event:
finish- Fields
type: "finish"translate_result: an object providing final outputs (NOTE: not a dictionary, but a class instance)original_pdf_path: Path to the input PDFmono_pdf_path: Path to the monolingual translated PDF (or None)dual_pdf_path: Path to the bilingual translated PDF (or None)no_watermark_mono_pdf_path: Path to monolingual output without watermark (if produced), otherwise Noneno_watermark_dual_pdf_path: Path to bilingual output without watermark (if produced), otherwise Noneauto_extracted_glossary_path: Path to auto-extracted glossary CSV (or None)total_seconds: elapsed seconds (float)peak_memory_usage: approximate peak memory usage during translation (float; implementation-dependent units)
- Fields
-
Error event:
error- Fields
type: "error"error: human-readable error messageerror_type: one ofBabeldocError,SubprocessError,IPCError,SubprocessCrashError, etc.details: optional details (e.g., original error or traceback)
- Fields
Important behavior:
- An optional
stage_summarymay be emitted before progress begins. - On certain failures, the generator will first yield an
errorevent and then raise an exception derived fromTranslationError. You should both check for error events and be prepared to catch exceptions. progress_updateevents may repeat with identical values; consumers should debounce if necessary.- Stop consuming the stream when you receive a
finishevent.
import asyncio
from pathlib import Path
from pdf2zh_next.high_level import do_translate_async_stream
# Assume you already have a valid SettingsModel instance named `settings`
# and a PDF file path
async def translate_one(settings, pdf_path: str | Path):
try:
async for event in do_translate_async_stream(settings, pdf_path):
etype = event.get("type")
if etype == "stage_summary":
# Optional pre-flight summary of stages
stages = event.get("stages", [])
print("Stage summary:", ", ".join(f"{s['name']}:{s['percent']:.2f}" for s in stages))
elif etype in {"progress_start", "progress_update", "progress_end"}:
stage = event.get("stage")
stage_prog = event.get("stage_progress") # 0..100
overall = event.get("overall_progress") # 0..100
part_i = event.get("part_index")
part_n = event.get("total_parts")
print(f"[{etype}] {stage} | stage {stage_prog:.1f}% | overall {overall:.1f}% (part {part_i}/{part_n})")
elif etype == "error":
# You will also get a raised exception after this yield
print("[error]", event.get("error"), event.get("error_type"))
elif etype == "finish":
result = event["translate_result"]
print("Done in", getattr(result, "total_seconds", None), "s")
print("Mono:", getattr(result, "mono_pdf_path", None))
print("Dual:", getattr(result, "dual_pdf_path", None))
print("No-watermark Mono:", getattr(result, "no_watermark_mono_pdf_path", None))
print("No-watermark Dual:", getattr(result, "no_watermark_dual_pdf_path", None))
print("Glossary:", getattr(result, "auto_extracted_glossary_path", None))
print("Peak memory:", getattr(result, "peak_memory_usage", None))
break
except Exception as exc:
# Catch exceptions raised by the stream after an error event
print("Translation failed:", exc)
# asyncio.run(translate_one(settings, "/path/to/file.pdf"))You can cancel the task consuming the stream. Cancellation is propagated to the underlying translation process.
import asyncio
from pdf2zh_next.high_level import do_translate_async_stream
async def cancellable(settings, pdf):
task = asyncio.create_task(_consume(settings, pdf))
await asyncio.sleep(1.0) # let it start
task.cancel()
try:
await task
except asyncio.CancelledError:
print("Cancelled")
async def _consume(settings, pdf):
async for event in do_translate_async_stream(settings, pdf):
if event["type"] == "finish":
breakStage summary event (example):
{
"type": "stage_summary",
"stages": [
{"name": "Parse PDF and Create Intermediate Representation", "percent": 0.1086},
{"name": "DetectScannedFile", "percent": 0.0188},
{"name": "Parse Page Layout", "percent": 0.1079}
// ... more stages ...
],
"part_index": 0,
"total_parts": 1
}Progress event (example):
{
"type": "progress_update",
"stage": "Translate Paragraphs",
"stage_progress": 2.04,
"stage_current": 1,
"stage_total": 49,
"overall_progress": 53.44,
"part_index": 1,
"total_parts": 1
}Finish event (example):
{
"type": "finish",
"translate_result": {
"original_pdf_path": "pdf2zh_files/<session>/table.pdf",
"mono_pdf_path": "pdf2zh_files/<session>/table.zh-CN.mono.pdf",
"dual_pdf_path": "pdf2zh_files/<session>/table.zh-CN.dual.pdf",
"no_watermark_mono_pdf_path": "pdf2zh_files/<session>/table.no_watermark.zh-CN.mono.pdf",
"no_watermark_dual_pdf_path": "pdf2zh_files/<session>/table.no_watermark.zh-CN.dual.pdf",
"auto_extracted_glossary_path": "pdf2zh_files/<session>/table.zh-CN.glossary.csv",
"total_seconds": 42.83,
"peak_memory_usage": 4651.55
}
}Error event (example):
{
"type": "error",
"error": "Babeldoc translation error: <message>",
"error_type": "BabeldocError",
"details": "<optional original error or traceback>"
}- Always handle both error events and exceptions from the generator.
- Break the loop on
finishto avoid unnecessary work. - Ensure the
fileexists andsettings.validate_settings()passes before calling. - Large documents may be split; use
part_index/total_partsandoverall_progressto drive your UI. - Debounce
progress_updateif your UI is sensitive to repeated, identical updates. report_interval(SettingsModel): controls only the emission rate ofprogress_updateevents. It does not affectstage_summary,progress_start,progress_end, orfinish. Default is 0.1s and the minimum allowed is 0.05s. As per the progress monitor logic, whenstage_total <= 3, updates are not throttled byreport_interval.