PDF file
Question 1: Softmax Function
- a. What is the purpose of the softmax function?
- b. What is the purpose of the expression in the numerator?
- c. What is the purpose of the expression in the denominator?
Now consider how a neural language model with a softmax output layer compares with a classic n-gram language model. Typically, we use techniques like smoothing or backoff in conjunction with n-gram models.
- d. Does this problem arise in the neural model? Why or why not?
Question 2: Parameters of Neural Nets
- a. How would you express the number of model parameters in terms of $|V|$, $n$, $d_1$, and $d_2$?
- b. An effective size for the hidden dimension of a neural NLP model is often in the hundreds. For $n$ from 2 to 5, how many parameters would your model have if
$d_1/(n−1) = d2 = 100$
? What if $d_1/(n − 1) = d2 = 1000$
?
- c. What do you conclude about the relative memory efficiency of classic n-gram and feedforward neural language models? If you increased n even further, what would happen?
- d. How would you expect the number of parameters in an RNN model to scale with n and |V |?
- e. Can you think of any strategies to substantially reduce the number of parameters?
Question 3: Model Design
- a. Design a feedforward neural network to model
$P(y_i | x)$
. Identify any independence assumptions you make. Draw a diagram that illustrates how the model computes probabilities for the tag of the word “with”: What is the input, and how is the output distribution computed from the input? Write out the basic equations of the model, and explain your choices.