# NPTEL Introduction to Machine Learning Assignment 10 Answers

## What is Introduction to Machine Learning?

With the increased availability of data from varied sources, there has been increasing attention paid to the various data-driven disciplines such as analytics and machine learning. In this course, we intend to introduce some of the basic concepts of machine learning from a mathematically well-motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

## NPTEL Introduction to Machine Learning Assignment 10 Answers 2022:-

Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created

(a)
(b)
(c)
(d)

Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created
by the complete link clustering algorithm.

• (a)
• (b)
• (c)
• (d)

Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?
(In terms of N,SUM and SS of both two clusters A and B)

Q4. Statement 1: CURE is robust to outliers.

Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

Q5. Run K-means on the input features of the iris dataset using the following initialization:

Q6. For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:

Q7. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair of points
belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?

Q8. Run BIRCH on the input features of iris dataset using Birch(n clusters=3, threshold=1). What is the rand-index obtained?

Q9. Run BIRCH on the following values of threshold parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given
in the previous question. What value of threshold achieves the best rand-index?

If there are any changes in answers will notify you on telegram so you can get a 100% score, So Join

Q10. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?

As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y-pred (where y-pred is the cluster prediction) to visualise the clusters and outliers.