<aside> 🗣️ Background:
3 parameters helps large language modelling:
Can we be a bit more rigorous in finding the relationship between these three parameters?
Scaling laws can answer this question
</aside>