<aside> 🗣️ Background:

3 parameters helps large language modelling:

Can we be a bit more rigorous in finding the relationship between these three parameters?

Scaling laws can answer this question

</aside>

Scaling Laws:

GPU hardware