|
--- |
|
datasets: |
|
- allenai/qasper |
|
license: apache-2.0 |
|
widget: |
|
- text: "Here is the the abstract for a scientific paper:\n<paste abstract here>\nWhat would be some questions that the paper could answer?\n" |
|
--- |
|
|
|
# Model Card for TinyLlama-abs2qa |
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model was an experiment to see if I could get a model to generate useful questions from a scientific paper's abstract. The answer was yes! |
|
|
|
## Model Details |
|
|
|
The base model is TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, thanks to the TinyLlama devs for training and releasing it! |
|
|
|
As such, it has a context size of 4096 tokens |
|
|
|
Training data was a modified form of the QASPER train split, which contains 1169 examples of abstracts and suitable questions for NLP papers. |
|
|
|
### Model Description |
|
|
|
I modified the QASPER dataset a little to do this training. The original has the abstract and a set of questions and their answers. |
|
For this test I only wanted to see if I could generate questions from abstracts, so I extracted only those parts and formulated them in an alpaca style instruction: |
|
|
|
{"instruction":"Here is the the abstract for a scientific paper: |
|
It has been shown that word embeddings derived from large corpora |
|
tend to incorporate biases present in their training data. Various |
|
methods for mitigating these biases have been proposed, but recent |
|
work has demonstrated that these methods hide but fail to truly |
|
remove the biases, which can still be observed in word |
|
nearest-neighbor statistics. In this work we propose a probabilistic |
|
view of word embedding bias. We leverage this framework to present a |
|
novel method for mitigating bias which relies on probabilistic |
|
observations to yield a more robust bias mitigation algorithm. |
|
We demonstrate that this method effectively reduces bias according |
|
to three separate measures of bias while maintaining embedding quality |
|
across various popular benchmark semantic tasks |
|
What would be some questions that the paper could answer?", |
|
"output":"How is embedding quality assessed? |
|
What are the three measures of bias which are reduced in experiments? |
|
What are the probabilistic observations which contribute to the more robust algorithm?"} |
|
|
|
I'm not sure how critical the instruction phrasing is, but with the instructions as in the training, |
|
this tiny model actually does a pretty good job on totally unseen abstracts in NLP. |
|
|
|
Training this model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) took only 3 minutes on an A100. |
|
Wrangling the environment to get axolotl to work took a lot longer and if you can I highly recommend using their docker. |
|
|
|
|
|
- **Developed by:** Andrew Green |
|
- **Model type:** Llama 2 architecture, 1.1B parameters |
|
- **Language(s) (NLP):** english |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
I intend to use this model or a derivative of it to screen papers for inclusion in literature summarisation tools in the future. |
|
|
|
Another thing I want to try is using this model to augment QASPER for other fields. |
|
|
|
Since it is so fast to train, I think it will also be a useful testbed for trying out some other techniques like DPO and SPIN that I want to learn. |
|
|
|
### Direct Use |
|
|
|
Directly using this model should be possible, though some testing of the impact of slightly different prompting styles would be needed, and I think it |
|
will generate ad infinitum because I didn't use a chat template - that's on my to-do list and should be quick enough. |
|
|
|
From a few quick tests, the generated questions look at least plausible, though they may have questionable utility in the real world |
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
The model was finetuned on scientific articles for NLP, and questions about the articles written by NLP experts. As such, it is quite likely the model |
|
will not work well on other fields. In my limited testing however, it does seem to generalise ok. |
|
|
|
The same risks for misuse and malicious use apply as they would for any LLM, but in particluar this model has the potential to generate questions from |
|
an abstract, which could lead to it being misused in academia (e.g. to partially automate peer review). This would be a violation of most publisher's terms |
|
I think. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
This model is based on the TinyLlama model, which is a foundation model so all the same risks of out of scope use there apply. |
|
|
|
The model is biased towards NLP abstracts, because those are contained in the QASPER dataset on which it is trained. |
|
|
|
This is a very small model, so it is likely to be quite limited in its reasoning capabilities, which may lead to nonsense or irrelevant questions being generated. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. |