# Model Details

##### Model Name: NumericBERT

##### Model Type: Transformer

##### Architecture: BERT

##### Training Method: Masked Language Modeling (MLM)

##### Training Data: MIMIC IV Lab values data

##### Training Hyperparameters:

Optimizer: AdamW
Learning Rate: 5e-5
Masking Rate: 20%
Tokenization
Tokenizer: Custom numeric-to-text mapping using the TextEncoder class

### Text Encoding Process:

The process converts non-negative integers into uppercase letter-based representations. This mapping allows numerical values to be expressed as sequences of letters. 
Subsequently, a method is applied to scale numerical values and convert them into corresponding letters based on a predefined mapping. 
Finally, a text encoding is executed to add the corresponding lab ID using the numeric values in specified columns ('Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc').


### Training Data Preprocessing
Column Selection: Numerical values from the following lab values represented as: 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'.
Text Encoding: The numeric values are encoded into text.
Masking: 20% of the data is randomly masked during training.

### Model Output
The model outputs predictions for masked values during training.
The output contains the encoded text.

### Limitations and Considerations
Numeric Data Representation: The model relies on a custom text representation of numeric data, which might have limitations in capturing complex patterns present in the original numeric data.
Training Data Source: The model is trained on MIMIC IV numeric data, and its performance might be influenced by the characteristics and biases present in that dataset.

### Contact Information
For inquiries or additional information, please contact:

David Restrepo
davidres@mit.edu
MIT Critical Data

---
license: mit
---