HCZhang commited on
Commit
2c0c57f
·
verified ·
1 Parent(s): 4a16424

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -22
README.md CHANGED
@@ -68,28 +68,29 @@ If you find our work useful, please give us credit by citing:
68
  _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
69
  _Accuracy as the metric for data imputation and the F1 score for other tasks._
70
 
71
- | Task | Type | Dataset | Best of non-LLM | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
72
- |------|--------|-------------|-----------|-------|---------|-------|--------|-----------|--------------|--------------|---------------|
73
- | Error Detection | Seen | Adult | *99.10* | 99.10 | 92.01 | 92.01 | 83.58 | -- | 77.40 | 73.74 | **99.33** |
74
- | | | Hospital | 94.40 | **97.80** | 90.74 | 90.74 | 44.76 | -- | 94.51 | 93.40 | *95.59* |
75
- | | Unseen | Flights | 81.00 | -- | -- | **83.48** | 66.01 | -- | 69.15 | 66.21 | *82.52* |
76
- | | | Rayyan | 79.00 | -- | -- | *81.95* | 68.53 | -- | 75.07 | 81.06 | **90.65** |
77
- | Data Imputation | Seen | Buy | 96.50 | 98.50 | 98.46 | **100** | **100** | -- | 98.46 | 98.46 | **100** |
78
- | | | Restaurant | 77.20 | 88.40 | *94.19* | **97.67** | 90.70 | -- | 89.53 | 87.21 | 89.53 |
79
- | | Unseen | Flipkart | 68.00 | -- | -- | **89.94** | 83.20 | -- | 87.14 | *87.48* | 81.68 |
80
- | | | Phone | 86.70 | -- | -- | **90.79** | 86.78 | -- | 86.52 | 85.68 | *87.21* |
81
- | Schema Matching | Seen | MIMIC-III | 20.00 | -- | -- | 40.00 | 29.41 | -- | **53.33** | *45.45* | 40.00 |
82
- | | | Synthea | 38.50 | 45.20 | *57.14* | **66.67** | 6.56 | -- | 55.56 | 47.06 | 56.00 |
83
- | | Unseen | CMS | *50.00* | -- | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | **59.29** |
84
- | Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 66.50 | 74.21 | 70.91 | 70.10 | **81.69** | *81.42* | 81.34 |
85
- | | | Beer | 94.37 | **100** | 96.30 | **100** | 90.32 | 96.30 | **100.00** | **100.00** | 96.77 |
86
- | | | DBLP-ACM | **98.99** | 96.60 | 96.99 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | *98.98* |
87
- | | | DBLP-GoogleScholar | *95.70* | 83.80 | 76.12 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | **98.51** |
88
- | | | Fodors-Zagats | **100** | **100** | **100** | **100** | 93.62 | **100** | **100** | **100** | **100** |
89
- | | | iTunes-Amazon | 97.06 | *98.20* | 96.40 | **100** | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 |
90
- | | Unseen | Abt-Buy | 89.33 | -- | -- | **92.77** | 78.73 | -- | 86.06 | 88.84 | *89.58* |
91
- | | | Walmart-Amazon | 86.89 | 87.00 | 86.17 | **90.27** | 79.19 | 82.40 | 84.91 | 85.24 | *89.42* |
92
- | Avg | | | 80.44 | - | - | *84.17* | 72.58 | - | 82.74 | 81.55 | **86.02** |
 
93
 
94
  ## Performance on unseen tasks
95
 
 
68
  _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
69
  _Accuracy as the metric for data imputation and the F1 score for other tasks._
70
 
71
+ | Task | Type | Dataset | Best of non-LLM | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
72
+ |-----------------|--------|-------------------|-----------------|--------|---------|--------|--------|-----------|--------------|--------------|---------------|
73
+ | Error Detection | Seen | Adult | *99.10* | 99.10 | 92.01 | 92.01 | 83.58 | -- | 77.40 | 73.74 | **99.33** |
74
+ | Error Detection | Seen | Hospital | 94.40 | **97.80** | 90.74 | 90.74 | 44.76 | -- | 94.51 | 93.40 | *95.59* |
75
+ | Error Detection | Unseen | Flights | 81.00 | -- | -- | **83.48** | 66.01 | -- | 69.15 | 66.21 | *82.52* |
76
+ | Error Detection | Unseen | Rayyan | 79.00 | -- | -- | *81.95* | 68.53 | -- | 75.07 | 81.06 | **90.65** |
77
+ | Data Imputation | Seen | Buy | 96.50 | 98.50 | 98.46 | **100** | **100** | -- | 98.46 | 98.46 | **100** |
78
+ | Data Imputation | Seen | Restaurant | 77.20 | 88.40 | *94.19* | **97.67** | 90.70 | -- | 89.53 | 87.21 | 89.53 |
79
+ | Data Imputation | Unseen | Flipkart | 68.00 | -- | -- | **89.94** | 83.20 | -- | 87.14 | *87.48* | 81.68 |
80
+ | Data Imputation | Unseen | Phone | 86.70 | -- | -- | **90.79** | 86.78 | -- | 86.52 | 85.68 | *87.21* |
81
+ | Schema Matching | Seen | MIMIC-III | 20.00 | -- | -- | 40.00 | 29.41 | -- | **53.33** | *45.45* | 40.00 |
82
+ | Schema Matching | Seen | Synthea | 38.50 | 45.20 | *57.14* | **66.67** | 6.56 | -- | 55.56 | 47.06 | 56.00 |
83
+ | Schema Matching | Unseen | CMS | *50.00* | -- | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | **59.29** |
84
+ | Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 66.50 | 74.21 | 70.91 | 70.10 | **81.69** | *81.42* | 81.34 |
85
+ | Entity Matching | Seen | Beer | 94.37 | **100** | 96.30 | **100** | 90.32 | 96.30 | **100.00** | **100.00** | 96.77 |
86
+ | Entity Matching | Seen | DBLP-ACM | **98.99** | 96.60 | 96.99 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | *98.98* |
87
+ | Entity Matching | Seen | DBLP-GoogleScholar| *95.70* | 83.80 | 76.12 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | **98.51** |
88
+ | Entity Matching | Seen | Fodors-Zagats | **100** | **100** | **100** | **100** | 93.62 | **100** | **100** | **100** | **100** |
89
+ | Entity Matching | Seen | iTunes-Amazon | 97.06 | *98.20*| 96.40 | **100** | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 |
90
+ | Entity Matching | Unseen | Abt-Buy | 89.33 | -- | -- | **92.77** | 78.73 | -- | 86.06 | 88.84 | *89.58* |
91
+ | Entity Matching | Unseen | Walmart-Amazon | 86.89 | 87.00 | 86.17 | **90.27** | 79.19 | 82.40 | 84.91 | 85.24 | *89.42* |
92
+ | Avg | | | 80.44 | - | - | *84.17* | 72.58 | - | 82.74 | 81.55 | **86.02** |
93
+
94
 
95
  ## Performance on unseen tasks
96