## What is Introduction to Machine Learning?

With the increased availability of data from varied sources, there has been increasing attention paid to the various data-driven disciplines such as analytics and machine learning. In this course, we intend to introduce some of the basic concepts of machine learning from a mathematically well-motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.

Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

**YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.**

## NPTEL Introduction to Machine Learning Assignment 10 Answers 2022:-

**Q1.** The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created

by single link clustering algorithm?

(a)

(b)

(c)

(d)

**Answer:- (b) **

**Q2.** For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created

by the complete link clustering algorithm.

- (a)
- (b)
- (c)
- (d)

**Answer: (b) **

**Q3.** In BIRCH, using number of points **N**, sum of points **SUM** and sum of squared points **SS**, we can determine the centroid

and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?

(In terms of **N**,**SUM** and **SS** of both two clusters A and B)

**Answer: c**

**Q4.** Statement 1: CURE is robust to outliers.

Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

**Answer: a**

**Q5.** Run K-means on the input features of the iris dataset using the following initialization:

**Answer: b**

**Q6.** For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:

**Answer:- a**

**Q7.** *a* in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and *b* as true negatives(pair of points

belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?

**Answer: d**

**Q8.** Run BIRCH on the input features of iris dataset using ** Birch(n clusters=3, threshold=1). **What is the rand-index obtained?

**Answer: c**

**Q9.** Run BIRCH on the following values of *threshold* parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given

in the previous question. What value of threshold achieves the best rand-index?

**Answer: b**

**Q10.** Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using ** DBSCAN(eps=0.5, min samples=5)** on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?

As an extra, you can plot the PCA features on a 2D plot using

*matplotlib.pyplot.scatter*with parameter

*c = y-pred*(where

*y-pred*is the cluster prediction) to visualise the clusters and outliers.

**Answer: b**

