File size: 2,650 Bytes

8659f94
9a35a49
 
8659f94
 
 
 
 
 
 
 
 
 
 
 
 
 
9a35a49
8659f94
 
 
 
 
 
 
9a35a49
ea4fd71
c5da1b8
ea4fd71
c5da1b8
ea4fd71
c5da1b8
ea4fd71
c5da1b8
ea4fd71
c5da1b8
ea4fd71
c5da1b8
8659f94
 
 
 
 
9a35a49
8659f94
9a35a49
 
8659f94
 
 
1423362
 
 
5ee7d52
 
 
 
 
1423362
 
 
5ee7d52
 
 
1423362
8659f94

---
language:
- en
license: mit
tags:
- generated_from_trainer
datasets:
- glue
metrics:
- matthews_correlation
model-index:
- name: deberta-v3-xsmall-CoLA
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: GLUE COLA
      type: glue
      config: cola
      split: validation
      args: cola
    metrics:
    - name: Matthews Correlation
      type: matthews_correlation
      value: 0.5894856058137782

widget:
  - text: 'The cat sat on the mat.'
    example_title: Correct grammatical sentence
  - text: 'Me and my friend going to the store.'
    example_title: Incorrect subject-verb agreement
  - text: 'I ain''t got no money.'
    example_title: Incorrect verb conjugation and double negative
  - text: 'She don''t like pizza no more.'
    example_title: Incorrect verb conjugation and double negative
  - text: 'They is arriving tomorrow.'
    example_title: Incorrect verb conjugation
---


# deberta-v3-xsmall-CoLA

This model is a fine-tuned version of [microsoft/deberta-v3-xsmall](https://huggingface.co/microsoft/deberta-v3-xsmall) on the GLUE COLA dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4237
- Matthews Correlation: 0.5895

## Model description

Trying to find a decent optimum between accuracy/quality and inference speed.


```json
{
    "epoch": 3.0,
    "eval_loss": 0.423,
    "eval_matthews_correlation": 0.589,
    "eval_runtime": 5.0422,
    "eval_samples": 1043,
    "eval_samples_per_second": 206.853,
    "eval_steps_per_second": 51.763
}

```

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 6e-05
- train_batch_size: 32
- eval_batch_size: 4
- seed: 16105
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3.0
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Matthews Correlation |
|:-------------:|:-----:|:----:|:---------------:|:--------------------:|
| 0.3945        | 1.0   | 67   | 0.4323          | 0.5778               |
| 0.3214        | 2.0   | 134  | 0.4237          | 0.5895               |
| 0.3059        | 3.0   | 201  | 0.4636          | 0.5795               |


### Framework versions

- Transformers 4.27.0.dev0
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.1