Adding EHR Generation task + gpt baseline by chufangao · Pull Request #877 · sunlabuiuc/PyHealth

chufangao · 2026-03-01T11:01:36Z

Thank you Claude!

python -m pytest tests/core/test_transformer_ehr_helpers.py tests/core/test_mimic3_ehr_generation.py -v 2>&1 | tail -25

Looking for coding guidelines from @jhnwu3 and team!

… visit sequences and optional ICD-9 truncation. Updated imports and documentation accordingly.

jhnwu3 · 2026-03-04T17:27:14Z

pyhealth/models/generators/gpt_baseline.py

Hey Andy, sorry for the late reply. Can you have it inherit BaseModel with your models? Here's the abstract API:

https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/models/base_model.py

jhnwu3 · 2026-03-04T19:45:16Z

pyhealth/tasks/ehr_generation.py

+
+    task_name: str = "EHRGenerationMIMIC3"
+    input_schema: Dict[str, str] = {"conditions": "nested_sequence"}
+    output_schema: Dict[str, str] = {}


In generation, we do a next sequence prediction right?

the output_schema should probably have a sequence of codes or something or whatever the output tensor if I'm understanding it correctly?

jhnwu3 · 2026-03-04T19:45:53Z

pyhealth/models/generators/gpt_baseline.py

+# ── Main model class ───────────────────────────────────────────────────────────
+
+
+class EHRGPTBaseline(nn.Module):


Can we have this inherit BaseModel?

https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/models/base_model.py

jhnwu3 · 2026-03-04T19:47:09Z

pyhealth/models/generators/gpt_baseline.py

+# ── PyTorch Dataset ────────────────────────────────────────────────────────────
+
+
+class EHRTextDataset(Dataset):


What's super cool about the PyHealth task is that you can do all of the tokenization in parallel with the task calls. Let me know if you need help understanding that.

But, basically, you can pre-cache all of the tokenized stuff in a very efficient way.

Can you give an example of this in a task? Would be helpful to understand

Yeah, here's a PR from our Multimodal project with a tokenizer:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/processors/tuple_time_text_processor.py

https://github.com/Multimodal-PyHealth/PyHealth/blob/wp-dev/pyhealth/tasks/multimodal_mimic4.py
"discharge_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, ), "radiology_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, )

if you look in the task here. You can pass args to your schemas.

jhnwu3 · 2026-03-04T19:48:17Z

One other thing, is don't forget to have docs/ and things like that. It's really helpful to have nice doc strings like this:https://github.com/sunlabuiuc/PyHealth/pull/392/changes

chufangao added 2 commits March 1, 2026 04:34

init

9e10936

Added EHR generation task and baseline model, with support for nested…

69caa1f

… visit sequences and optional ICD-9 truncation. Updated imports and documentation accordingly.

chufangao changed the title ~~init commit for gpt baseline (thanks to claude)~~ Adding EHR Generation task + gpt baseline Mar 1, 2026

chufangao requested a review from jhnwu3 March 1, 2026 11:03

chufangao marked this pull request as ready for review March 1, 2026 11:06

jhnwu3 requested changes Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding EHR Generation task + gpt baseline#877

Adding EHR Generation task + gpt baseline#877
chufangao wants to merge 2 commits intosunlabuiuc:masterfrom
chufangao:master

chufangao commented Mar 1, 2026 •

edited

Loading

Uh oh!

jhnwu3 Mar 4, 2026

Uh oh!

jhnwu3 Mar 4, 2026

Uh oh!

jhnwu3 Mar 4, 2026

Uh oh!

jhnwu3 Mar 4, 2026

Uh oh!

jhnwu3 Mar 4, 2026

Uh oh!

chufangao Mar 5, 2026

Uh oh!

jhnwu3 Mar 5, 2026

Uh oh!

jhnwu3 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# ── Main model class ───────────────────────────────────────────────────────────


		class EHRGPTBaseline(nn.Module):

		# ── PyTorch Dataset ────────────────────────────────────────────────────────────


		class EHRTextDataset(Dataset):

Conversation

chufangao commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhnwu3 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

chufangao Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhnwu3 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chufangao commented Mar 1, 2026 •

edited

Loading