Skip to content

Adding EHR Generation task + gpt baseline#877

Open
chufangao wants to merge 2 commits intosunlabuiuc:masterfrom
chufangao:master
Open

Adding EHR Generation task + gpt baseline#877
chufangao wants to merge 2 commits intosunlabuiuc:masterfrom
chufangao:master

Conversation

@chufangao
Copy link
Collaborator

@chufangao chufangao commented Mar 1, 2026

Thank you Claude!

python -m pytest tests/core/test_transformer_ehr_helpers.py tests/core/test_mimic3_ehr_generation.py -v 2>&1 | tail -25

Looking for coding guidelines from @jhnwu3 and team!

@jalengg @ethanrasmussen

… visit sequences and optional ICD-9 truncation. Updated imports and documentation accordingly.
@chufangao chufangao changed the title init commit for gpt baseline (thanks to claude) Adding EHR Generation task + gpt baseline Mar 1, 2026
@chufangao chufangao requested a review from jhnwu3 March 1, 2026 11:03
@chufangao chufangao marked this pull request as ready for review March 1, 2026 11:06
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Andy, sorry for the late reply. Can you have it inherit BaseModel with your models? Here's the abstract API:

https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/models/base_model.py


task_name: str = "EHRGenerationMIMIC3"
input_schema: Dict[str, str] = {"conditions": "nested_sequence"}
output_schema: Dict[str, str] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generation, we do a next sequence prediction right?

the output_schema should probably have a sequence of codes or something or whatever the output tensor if I'm understanding it correctly?

# ── Main model class ───────────────────────────────────────────────────────────


class EHRGPTBaseline(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this inherit BaseModel?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# ── PyTorch Dataset ────────────────────────────────────────────────────────────


class EHRTextDataset(Dataset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's super cool about the PyHealth task is that you can do all of the tokenization in parallel with the task calls. Let me know if you need help understanding that.

But, basically, you can pre-cache all of the tokenized stuff in a very efficient way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of this in a task? Would be helpful to understand

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, here's a PR from our Multimodal project with a tokenizer:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/processors/tuple_time_text_processor.py

https://github.com/Multimodal-PyHealth/PyHealth/blob/wp-dev/pyhealth/tasks/multimodal_mimic4.py
"discharge_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, ), "radiology_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, )

if you look in the task here. You can pass args to your schemas.

@jhnwu3
Copy link
Collaborator

jhnwu3 commented Mar 4, 2026

One other thing, is don't forget to have docs/ and things like that. It's really helpful to have nice doc strings like this:https://github.com/sunlabuiuc/PyHealth/pull/392/changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants