Other
English
minecraft
action prediction
Kqte commited on
Commit
8b943c1
·
verified ·
1 Parent(s): ccb8c2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -3
README.md CHANGED
@@ -14,21 +14,42 @@ base_model:
14
 
15
  # Llamipa: An Incremental Discourse Parser
16
 
17
- Llamipa is an LLM (Llama3-8B) finetuned on the Minecraft Structured Dialogue Corpus (MSDC) https://huggingface.co/datasets/linagora/MinecraftStructuredDialogueCorpus.
18
 
19
- We provide the adapters for the Llamipa parser trained on Llama3-8B, the generation script, as well as the formatted MSDC data. We also provide scripts for formatting new dialogue data you may wish to try with Llamipa.
 
 
 
 
 
 
20
 
21
  ### Model Description
22
 
23
  - **Language(s) (NLP):** English
24
  - **Finetuned from model:** Llama3-8B
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ### Citations
27
 
28
  **Paper:** https://aclanthology.org/2024.findings-emnlp.373/
 
29
  **Video:** https://www.youtube.com/watch?v=yerUotx3QZY
30
 
31
- Please cite the EMNLP Findings paper if you use Llamipa in your work
32
 
33
  ```bibtex
34
  @inproceedings{thompson-etal-2024-llamipa,
@@ -55,3 +76,5 @@ Please cite the EMNLP Findings paper if you use Llamipa in your work
55
 
56
  We acknowledge support from the National Interdisciplinary Artificial Intelligence Institute, ANITI (Artificial and Natural Intelligence Toulouse Institute), funded by the French ‘Investing for the Future–PIA3’ program under the Grant agreement ANR-19-PI3A-000. We also thank the ANR project COCOBOTS (ANR-21-FAI2-0005), the ANR/DGA project DISCUTER (ANR21-ASIA-0005), and the COCOPIL “Graine” project funded by the Région Occitanie of France. This work was granted access to the HPC resources of CALMIP supercomputing center under the allocation 2016-P23060.
57
 
 
 
 
14
 
15
  # Llamipa: An Incremental Discourse Parser
16
 
17
+ Llamipa is Llama3-8B finetuned on the Minecraft Structured Dialogue Corpus (MSDC) https://huggingface.co/datasets/linagora/MinecraftStructuredDialogueCorpus.
18
 
19
+ | | Link F1 | Link+Rel F1|
20
+ |----------------|-------|--------|
21
+ |**Llamipa + gold structure** | 0.9004 | 0.8154 |
22
+ |**Llamipa + predicted structure** (incremental) | 0.8830 | 0.7951 |
23
+
24
+ For a given speaker turn, Llamipa was trained to predict the discourse relations which connect
25
+ the elementary units of the turn to the units of the previous dialogue turns, given the text of the previous dialogue turns and the previous discourse structure, or the relations that connect those turns. For training, the gold annotated structure was used. The model was then tested using gold structure, and gave state of the art results on the MSDC (see above table). However, for a discourse parser to be truly incremental, it should be able to predict the relations for each new turn using the structure it predicted in previous steps. We tested the model using its predicted structure and found the results were robust to this change.
26
 
27
  ### Model Description
28
 
29
  - **Language(s) (NLP):** English
30
  - **Finetuned from model:** Llama3-8B
31
 
32
+ ### Running Llamipa
33
+
34
+ #### Training from scratch
35
+ The training data are provided in the `\data` folder. They contain a maximum context window of 15 elementary units (EDUs). For training parameters see the paper cited below.
36
+
37
+ #### Reproducing test results
38
+ The `\model` folder contains the adapters for the parser trained on Llama3-8B, as well as the scripts for generating structures using both gold (`parse_gold.py`) and predicted structure (`parse_incremental.py`). Be sure to use either the gold or incremental version of the test data, found in `\data`.
39
+
40
+ #### Using Llamipa on new data
41
+ In order to re-generate the Llamipa data from the original MSDC files, or to format new data to be parsed using Llamipa, we provide data formatting scripts and instructions in the `\bespoke` folder.
42
+
43
+ #### Evaluation
44
+ Get F1 scores using `\evaluation\evaluation.py`, and produce a friendlier version of Llamipa output using `\evaluation\output_formatter.py`.
45
+
46
  ### Citations
47
 
48
  **Paper:** https://aclanthology.org/2024.findings-emnlp.373/
49
+
50
  **Video:** https://www.youtube.com/watch?v=yerUotx3QZY
51
 
52
+ Please cite the EMNLP Findings paper if you use Llamipa in your work:
53
 
54
  ```bibtex
55
  @inproceedings{thompson-etal-2024-llamipa,
 
76
 
77
  We acknowledge support from the National Interdisciplinary Artificial Intelligence Institute, ANITI (Artificial and Natural Intelligence Toulouse Institute), funded by the French ‘Investing for the Future–PIA3’ program under the Grant agreement ANR-19-PI3A-000. We also thank the ANR project COCOBOTS (ANR-21-FAI2-0005), the ANR/DGA project DISCUTER (ANR21-ASIA-0005), and the COCOPIL “Graine” project funded by the Région Occitanie of France. This work was granted access to the HPC resources of CALMIP supercomputing center under the allocation 2016-P23060.
78
 
79
+
80
+