| This has the advantage of identifying high-probability | |
| sequences that start with a lower probability initial tokens and would've been ignored by the greedy search. | |
| do_sample: if set to True, this parameter enables decoding strategies such as multinomial sampling, beam-search | |
| multinomial sampling, Top-K sampling and Top-p sampling. |