Overview:

Evaluation is important and difficult

Evaluation by people

Evaluation by string overlap metrics

Evaluate using Embeddings

Evaluate using metrics trained on human evaluations