Adding EHR Generation task + gpt baseline#877
Adding EHR Generation task + gpt baseline#877chufangao wants to merge 2 commits intosunlabuiuc:masterfrom
Conversation
… visit sequences and optional ICD-9 truncation. Updated imports and documentation accordingly.
There was a problem hiding this comment.
Hey Andy, sorry for the late reply. Can you have it inherit BaseModel with your models? Here's the abstract API:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/models/base_model.py
|
|
||
| task_name: str = "EHRGenerationMIMIC3" | ||
| input_schema: Dict[str, str] = {"conditions": "nested_sequence"} | ||
| output_schema: Dict[str, str] = {} |
There was a problem hiding this comment.
In generation, we do a next sequence prediction right?
the output_schema should probably have a sequence of codes or something or whatever the output tensor if I'm understanding it correctly?
| # ── Main model class ─────────────────────────────────────────────────────────── | ||
|
|
||
|
|
||
| class EHRGPTBaseline(nn.Module): |
There was a problem hiding this comment.
Can we have this inherit BaseModel?
There was a problem hiding this comment.
| # ── PyTorch Dataset ──────────────────────────────────────────────────────────── | ||
|
|
||
|
|
||
| class EHRTextDataset(Dataset): |
There was a problem hiding this comment.
What's super cool about the PyHealth task is that you can do all of the tokenization in parallel with the task calls. Let me know if you need help understanding that.
But, basically, you can pre-cache all of the tokenized stuff in a very efficient way.
There was a problem hiding this comment.
Can you give an example of this in a task? Would be helpful to understand
There was a problem hiding this comment.
Yeah, here's a PR from our Multimodal project with a tokenizer:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/processors/tuple_time_text_processor.py
https://github.com/Multimodal-PyHealth/PyHealth/blob/wp-dev/pyhealth/tasks/multimodal_mimic4.py
"discharge_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, ), "radiology_note_times": ( "tuple_time_text", { "tokenizer_model": "bert-base-uncased", "type_tag": "note", }, )
if you look in the task here. You can pass args to your schemas.
|
One other thing, is don't forget to have docs/ and things like that. It's really helpful to have nice doc strings like this:https://github.com/sunlabuiuc/PyHealth/pull/392/changes |
Thank you Claude!
python -m pytest tests/core/test_transformer_ehr_helpers.py tests/core/test_mimic3_ehr_generation.py -v 2>&1 | tail -25
Looking for coding guidelines from @jhnwu3 and team!
@jalengg @ethanrasmussen