# NPTEL Introduction to Machine Learning Assignment 8 Answers

## About Introduction To Machine Learning

With the increased availability of data from varied sources, there has been increasing attention paid to the various data-driven disciplines such as analytics and machine learning. In this course, we intend to introduce some of the basic concepts of machine learning from a mathematically well-motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

## NPTEL Introduction to Machine Learning Assignment 8 Answers 2022 {July – Dec}

1. The figure below shows a Bayesian Network with 9 variables, all of which are binary.

Which of the following is/are always true for the above Bayesian Network?

2. Consider the following data for 20 budget phones, 30 mid-range phones, and 20 high-end phones:

Consider a phone with 2 SIM card slots and NFC but no 5G compatibility. Calculate the probabilities of this phone being a budget phone, a mid-range phone, and a high-end phone using the Naive Bayes method. The correct ordering of the phone type from the highest to the lowest probability is?

a. Budget, Mid-Range, High End
b. Budget, High End, Mid-Range
c. Mid-Range, High End, Budget
d. High End, Mid-Range, Budget

3. Consider the following dataset where outlook, temperature, humidity, and wind are independent features, and play is the dependent feature.

Find the probability that the student will not play given that x = (Outlook=sunny, Temperature=66, Humidity=90, Windy=True) using the Naive Bayes method. (Assume the continuous features are represented as Gaussian distributions).

a. 0.0001367
b. 0.0000358
c. 0.0000236
d. 1

4. Which among Gradient Boosting and AdaBoost is less susceptible to outliers considering their respective loss functions?

c. On average, both are equally susceptible.

5. How do you prevent overfitting in random forest models?

a. Increasing Tree Depth.
b. Increasing the number of variables sampled at each split.
c. Increasing the number of trees.
d. All of the above.

6. A dataset with two classes is plotted below.

Does the data satisfy the Naive Bayes assumption?

a. Yes
b. No
c. The given data is insufficient
d. None of these

7. Ensembling in random forest classifier helps in achieving:

a. reduction of bias error
b. reduction of variance error
c. reduction of data dimension
d. none of the above

## NPTEL Introduction to Machine Learning Assignment 8 Answers 2022 {Jan – June}

Q1. Consider the two statements:
Statement 1: Gradient Boosted Decision Trees can overfit easily.
Statement 2: It is easy to parallelize Gradient Boosted Decision Trees.Which of these are true?

a. Both the statements are True.
b. Statement 1 is true, and statement 2 is false.
c. Statement 1 is false, and statement 2 is true.
d. Both the statements are false.

Answer:- b. Statement 1 is true, and statement 2 is false.

Q2. A company hires you to look at their classification system for whether a given customer would potentially buy their product.
When you check the existing classifier on different folds of the training set, you find that it manages a low accuracy of usually around 60%.
Sometimes, it’s barely above 50%.

With this information in mind, and without using additional classifiers, which of the following ensemble methods would you use to increase the classification accuracy effectively?

• Committee Machine
• Bagging
• Stacking

Q3. Which of the following algorithms don’t use learning rate as a hyperparameter?

• Random Forests
• KNN
• PCA

Q4. Consider the following data for 500 instances of home, 600 instances of office and 700 instances of factory type buildings
suppose a building has a balcony and power-backup but is not multi-storied. According to the Naive Bayes algorithm, it is of type

• Home
• Office
• Factory

Q5. A dataset with two classes is plotted below.

Q6. Which of these statements is/are True about Random Forests?

Q7. Consider the below dataset:

Q8. Consider the two statements:
Statement 1: Bayesian Networks need not always be Directed Acyclic Graphs (DAGs)
Statement 2: Each node in a bayesian network represents a random variable, and each edge represents conditional dependence.Which of these are true?