# NPTEL Introduction to Machine Learning Assignment 5 Answers 2022

## What is Introduction to Machine Learning?

With the increased availability of data from varied sources, there has been increasing attention paid to the various data-driven disciplines such as analytics and machine learning. In this course, we intend to introduce some of the basic concepts of machine learning from a mathematically well-motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

## NPTEL Introduction to Machine Learning Assignment 5 Answers July 2022

Q1. If the step size in gradient descent is too large, what can happen?

a. Overfitting
b. The model will not converge
c. We can reach maxima instead of minima
d. None of the above

`Answer:- b`

2. Recall the XOR(tabulated below) example from class where we did a transformation of features to make it linearly separable. Which of the following transformations can also work?

`Answer:- c, d`

3. What is the effect of using activation function f(x)=x for hidden layers in an ANN?

a. No effect. It’s as good as any other activation function (sigmoid, tanh etc).
b. The ANN is equivalent to doing multi-output linear regression.
c. Backpropagation will not work.
d. We can model highly complex non-linear functions.

`Answer:- b`

4. Which of the following functions can be used on the last layer of an ANN for classification

a. Softmax
b. Sigmoid
c. Tanh
d. Linear

`Answer:- a, b, c`

5. Statement: Threshold function cannot be used as activation function for hidden layers.
Reason: Threshold functions do not introduce non-linearity.

a. Statement is true and reason is false.
b. Statement is false and reason is true.
c. Both are true and the reason explains the statement.
d. Both are true and the reason does not explain the statement.

`Answer:- a`

6. We use several techniques to ensure the weights of the neural network are small (such as random initialization around 0 or regularisation). What conclusions can we draw if weights of our ANN are high?

a. Model has overfitted.
b. It was initialized incorrectly.
c. At least one of (a) or (b).
d. None of the above.

`Answer:- d`

7. On different initializations of your neural network, you get significantly different values of loss. What could be the reason for this?

a. Overfitting
b. Some problem in the architecture
c. Incorrect activation function
d. Multiple local minima

`Answer:- d`

8. The likelihood L(θ|X) is given by:

`Answer:- b`

9. You are trying to estimate the probability of it raining today using maximum likelihood estimation. Given that in nn days, it rained nrnr times, what is the probability of it raining today?

`Answer:- a`

10. Choose the correct statement (multiple may be correct):

a. MLE is a special case of MAP when prior is a uniform distribution.
b. MLE acts as regularisation for MAP.
c. MLE is a special case of MAP when prior is a beta distribution.
d. MAP acts as regularisation for MLE.

`Answer:- a, d`

## NPTEL Introduction to Machine Learning Assignment 5 Answers Jan 2022

Q1. The last layer of ANN is linear for _________ and softmax for __________.

Q2. Consider the following statement and answer True/False with corresponding reason:

The class outputs of a classification problem with a ANN cannot be treated independently

Q3. Below are two views of a error surface of an ANN (these are 3D plots of the same error function – shown twice with different 2D views for understanding). Pertaining to the plots, what is something you need to keep in mind while training?

Q4. What happens if we do not use an activation function in an ANN?

Q5. Given below is a simple ANN with 2 inputs X1, X2 ∈ {0, 1} and edge weights -3, +2, +2

Q6. Consider the following function.

Q7. Using the notations used in class, evaluate the value of the neural network with a 3-3-1 archi- tecture (2-dimensional input with 1 node for the bias term in both the layers). The parameters are as follows

Q8. Logistic regression is a special case of ANN with:

Q9. Which of these are limitations of the backpropagation algorithm?

Q10. Which of these are true about learning rate (multiple may be correct)?