Overview:

Decoding:

Question: How do we generate the most probable continuation? We want to find

$$ \arg \max_{w_1,\ldots,w_n} \prod_{i=1}^n p(w_1,\ldots,w_n | \text{prefix}) $$

Greedy decoding

Beam search decoding

Sampling

Nucleus sampling

Temperature in LLMs

Locally typical sampling