File size: 1,845 Bytes
cf9debb 4ceb8fb cf9debb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# Model Details
##### Model Name: NumericBERT
##### Model Type: Transformer
##### Architecture: BERT
##### Training Method: Masked Language Modeling (MLM)
##### Training Data: MIMIC IV Lab values data
##### Training Hyperparameters:
Optimizer: AdamW
Learning Rate: 5e-5
Masking Rate: 20%
Tokenization
Tokenizer: Custom numeric-to-text mapping using the TextEncoder class
### Text Encoding Process:
The process converts non-negative integers into uppercase letter-based representations. This mapping allows numerical values to be expressed as sequences of letters.
Subsequently, a method is applied to scale numerical values and convert them into corresponding letters based on a predefined mapping.
Finally, a text encoding is executed to add the corresponding lab ID using the numeric values in specified columns ('Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc').
### Training Data Preprocessing
Column Selection: Numerical values from the following lab values represented as: 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'.
Text Encoding: The numeric values are encoded into text.
Masking: 20% of the data is randomly masked during training.
### Model Output
The model outputs predictions for masked values during training.
The output contains the encoded text.
### Limitations and Considerations
Numeric Data Representation: The model relies on a custom text representation of numeric data, which might have limitations in capturing complex patterns present in the original numeric data.
Training Data Source: The model is trained on MIMIC IV numeric data, and its performance might be influenced by the characteristics and biases present in that dataset.
### Contact Information
For inquiries or additional information, please contact:
David Restrepo
[email protected]
MIT Critical Data
---
license: mit
---
|