PDF file

Untitled

Question 1: Softmax Function

a. What is the purpose of the softmax function?
b. What is the purpose of the expression in the numerator?
c. What is the purpose of the expression in the denominator?

Now consider how a neural language model with a softmax output layer compares with a classic n-gram language model. Typically, we use techniques like smoothing or backoff in conjunction with n-gram models.

d. Does this problem arise in the neural model? Why or why not?

Untitled

Question 2: Parameters of Neural Nets

a. How would you express the number of model parameters in terms of $|V|$, $n$, $d_1$, and $d_2$?
b. An effective size for the hidden dimension of a neural NLP model is often in the hundreds. For $n$ from 2 to 5, how many parameters would your model have if $d_1/(n−1) = d2 = 100$ ? What if $d_1/(n − 1) = d2 = 1000$ ?
c. What do you conclude about the relative memory efficiency of classic n-gram and feedforward neural language models? If you increased n even further, what would happen?
d. How would you expect the number of parameters in an RNN model to scale with n and |V |?
e. Can you think of any strategies to substantially reduce the number of parameters?

Untitled

Question 3: Model Design

a. Design a feedforward neural network to model $P(y_i | x)$ . Identify any independence assumptions you make. Draw a diagram that illustrates how the model computes probabilities for the tag of the word “with”: What is the input, and how is the output distribution computed from the input? Write out the basic equations of the model, and explain your choices.