NECOUDBFM
/

Jellyfish-8B

@@ -68,28 +68,29 @@ If you find our work useful, please give us credit by citing:
 _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
 _Accuracy as the metric for data imputation and the F1 score for other tasks._
-| Task | Type   | Dataset              | Best of non-LLM | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
-|------|--------|-------------|-----------|-------|---------|-------|--------|-----------|--------------|--------------|---------------|
-| Error Detection   | Seen   | Adult                | *99.10*          | 99.10 | 92.01   | 92.01 | 83.58  | --        | 77.40        | 73.74        | **99.33**       |
-|      |        | Hospital             | 94.40           | **97.80** | 90.74  | 90.74 | 44.76  | --        | 94.51        | 93.40        | *95.59*        |
-|      | Unseen | Flights              | 81.00           | --     | --     | **83.48** | 66.01 | --        | 69.15        | 66.21        | *82.52*        |
-|      |        | Rayyan               | 79.00           | --     | --     | *81.95* | 68.53 | --        | 75.07        | 81.06        | **90.65**       |
-| Data Imputation   | Seen   | Buy                  | 96.50           | 98.50  | 98.46   | **100**  | **100** | --        | 98.46        | 98.46        | **100**         |
-|      |        | Restaurant           | 77.20           | 88.40  | *94.19*  | **97.67** | 90.70 | --        | 89.53        | 87.21        | 89.53         |
-|      | Unseen | Flipkart             | 68.00           | --     | --     | **89.94** | 83.20 | --        | 87.14        | *87.48*       | 81.68         |
-|      |        | Phone                | 86.70           | --     | --     | **90.79** | 86.78 | --        | 86.52        | 85.68        | *87.21*        |
-| Schema Matching   | Seen   | MIMIC-III            | 20.00           | --     | --     | 40.00  | 29.41 | --        | **53.33**      | *45.45*       | 40.00         |
-|      |        | Synthea              | 38.50           | 45.20  | *57.14* | **66.67** | 6.56  | --        | 55.56        | 47.06        | 56.00         |
-|      | Unseen | CMS                  | *50.00*          | --     | --     | 19.35  | 22.22 | --        | 42.86        | 38.10        | **59.29**       |
-| Entity Matching   | Seen   | Amazon-Google        | 75.58           | 63.50  | 66.50  | 74.21  | 70.91 | 70.10     | **81.69**      | *81.42*       | 81.34         |
-|      |        | Beer                 | 94.37           | **100**  | 96.30  | **100**  | 90.32 | 96.30     | **100.00**     | **100.00**     | 96.77         |
-|      |        | DBLP-ACM             | **98.99**         | 96.60  | 96.99  | 97.44  | 95.87 | 93.80     | 98.65        | 98.77        | *98.98*        |
-|      |        | DBLP-GoogleScholar   | *95.70*          | 83.80  | 76.12  | 91.87  | 90.45 | 92.40     | 94.88        | 95.03        | **98.51**       |
-|      |        | Fodors-Zagats        | **100**           | **100**  | **100**  | **100**  | 93.62 | **100**     | **100**        | **100**        | **100**         |
-|      |        | iTunes-Amazon        | 97.06           | *98.20* | 96.40  | **100**  | 98.18 | 94.30     | 96.30        | 96.30        | 98.11         |
-|      | Unseen | Abt-Buy              | 89.33           | --     | --     | **92.77** | 78.73 | --        | 86.06        | 88.84        | *89.58*        |
-|      |        | Walmart-Amazon       | 86.89           | 87.00  | 86.17  | **90.27** | 79.19 | 82.40     | 84.91        | 85.24        | *89.42*        |
-| Avg  |        |                      | 80.44           | -      | -      | *84.17* | 72.58 | -         | 82.74        | 81.55        | **86.02**       |
 ## Performance on unseen tasks

 _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
 _Accuracy as the metric for data imputation and the F1 score for other tasks._
+| Task            | Type   | Dataset           | Best of non-LLM | GPT-3  | GPT-3.5 | GPT-4  | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
+|-----------------|--------|-------------------|-----------------|--------|---------|--------|--------|-----------|--------------|--------------|---------------|
+| Error Detection | Seen   | Adult             | *99.10*         | 99.10  | 92.01   | 92.01  | 83.58  | --        | 77.40        | 73.74        | **99.33**     |
+| Error Detection | Seen   | Hospital          | 94.40           | **97.80** | 90.74  | 90.74  | 44.76  | --        | 94.51        | 93.40        | *95.59*       |
+| Error Detection | Unseen | Flights           | 81.00           | --     | --     | **83.48** | 66.01  | --        | 69.15        | 66.21        | *82.52*       |
+| Error Detection | Unseen | Rayyan            | 79.00           | --     | --     | *81.95* | 68.53  | --        | 75.07        | 81.06        | **90.65**     |
+| Data Imputation | Seen   | Buy               | 96.50           | 98.50  | 98.46   | **100** | **100** | --        | 98.46        | 98.46        | **100**       |
+| Data Imputation | Seen   | Restaurant        | 77.20           | 88.40  | *94.19* | **97.67** | 90.70  | --        | 89.53        | 87.21        | 89.53         |
+| Data Imputation | Unseen | Flipkart          | 68.00           | --     | --     | **89.94** | 83.20  | --        | 87.14        | *87.48*      | 81.68         |
+| Data Imputation | Unseen | Phone             | 86.70           | --     | --     | **90.79** | 86.78  | --        | 86.52        | 85.68        | *87.21*       |
+| Schema Matching | Seen   | MIMIC-III         | 20.00           | --     | --     | 40.00   | 29.41  | --        | **53.33**    | *45.45*      | 40.00         |
+| Schema Matching | Seen   | Synthea           | 38.50           | 45.20  | *57.14* | **66.67** | 6.56   | --        | 55.56        | 47.06        | 56.00         |
+| Schema Matching | Unseen | CMS               | *50.00*         | --     | --     | 19.35   | 22.22  | --        | 42.86        | 38.10        | **59.29**     |
+| Entity Matching | Seen   | Amazon-Google     | 75.58           | 63.50  | 66.50   | 74.21  | 70.91  | 70.10     | **81.69**    | *81.42*      | 81.34         |
+| Entity Matching | Seen   | Beer              | 94.37           | **100** | 96.30  | **100** | 90.32  | 96.30     | **100.00**   | **100.00**   | 96.77         |
+| Entity Matching | Seen   | DBLP-ACM          | **98.99**       | 96.60  | 96.99   | 97.44  | 95.87  | 93.80     | 98.65        | 98.77        | *98.98*       |
+| Entity Matching | Seen   | DBLP-GoogleScholar| *95.70*         | 83.80  | 76.12   | 91.87  | 90.45  | 92.40     | 94.88        | 95.03        | **98.51**     |
+| Entity Matching | Seen   | Fodors-Zagats     | **100**         | **100** | **100** | **100** | 93.62  | **100**   | **100**      | **100**      | **100**       |
+| Entity Matching | Seen   | iTunes-Amazon     | 97.06           | *98.20*| 96.40   | **100** | 98.18  | 94.30     | 96.30        | 96.30        | 98.11         |
+| Entity Matching | Unseen | Abt-Buy           | 89.33           | --     | --     | **92.77** | 78.73  | --        | 86.06        | 88.84        | *89.58*       |
+| Entity Matching | Unseen | Walmart-Amazon    | 86.89           | 87.00  | 86.17   | **90.27** | 79.19  | 82.40     | 84.91        | 85.24        | *89.42*       |
+| Avg             |        |                   | 80.44           | -      | -      | *84.17* | 72.58  | -         | 82.74        | 81.55        | **86.02**     |
 ## Performance on unseen tasks