## NPTEL Data Science for Engineers Assignment 7 Answers 2022 {July – Dec}

**1. What are the dimensions of the dataframe?**

a. (150, 5)

b. (150, 4)

c. (50, 5)

d. None of the above

Answer:-a

**2. What can you comment on the distribution of the independent variables in the dataframe?**

a. All the variables are normally distributed

b. The variables Sepal Length and Sepal Width are not normally distributed

c. The variable Petal Length alone is normally distributed

d. None of the above

Answer:-a

**3. How many rows in the dataset contain missing values?**

a. 10

b. 5

c. 25

d. 0

Answer:-d

**4. What can be interpreted from the plot shown below?**

a. Sepal widths of Versicolor flowers are lesser than 3 cm

b. Sepal lengths of Setosa flowers are lesser than 6 cm

c. Sepal lengths of Virginica flowers are greater than 6 cm

d. Sepals of Setosa flowers are relatively more wider than Versicolor flowers

Answer:-b, d

**5. Which of the following code blocks can be used to summarize the data (finding the mean of the columns PetalLength and PetalWidth), similar to the one given below.**

a. lapply(iris_data[, 3:4], mean)

b. sapply(iris_data[, 3:4], 2, mean)

c. apply(iris_data[, 3:4], 2, mean)

d. apply(iris_data[, 3:4], 1, mean)

Answer:-a, c

**6. Which of the following packages must be imported to use the logistic regression function glm()?**

a. ROCR

b. dplyrca

c. Tools

d. None of the above

Answer:-d

**7. Which of the following parameters are significant with 95% confidence interval?**

a. Sepal Length

b. Intercept

c. Petal Width

d. Petal Length

Answer:-b, c, d

**8. What is the coefficient of the variable Sepal Width in the fitted model?**

a. 1.24

b. – 0.15

c. – 0.13

d. 0.66

Answer:-c

**9. State whether the following statement is TRUE or FALSE.**

**Logistic Regression tends to overfit when we have a large number of independent variables present.**

a. True

b. False

Answer:-a

**10. An ROC curve is plotted between.**

a. Sensitivity and Specificity

b. Sensitivity and (1 – Specificity)

c. (1 – Sensitivity) and Specificity

d. None of the above

Answer:-b

## NPTEL Data Science for Engineers Assignment 7 Answers 2022:-

Q1. Which of the following algorithms is/are suitable for the below problem statement?

Problem Statement: To classify the severity of the infection on covid-19 patients as critical/ not critical based on their comorbidity, habits, and some other demographic data.

a. Linear Regression

b. Logistics Regression

c. K-Means

d. Decision tree

**Answer:- a, b**

**Q2.** Which parameter determines the goodness of fit of a Logistic Regression model?

**Answer:- b, d**

**Q3.** Which of the following functions is used to bind the probability of x between 0 and 1?

**Answer:-** **b**

**Q4.** Which of the following strategies do we apply to obtain the best fit line for data in Linear Regression?

**Answer:-** **c**

**Refer to the confusion matrix mentioned below for answering the question 5. The matrix results from a classification model applied to test the working condition of parts of a refrigerator based on some parameters.**

**Q5.** **What is the value of specificity?**

**Answer**:- **b**

**Q6.** **The R-Squared for the built linear model is ___________**

**Answer:-** **a**

**Q7.** According to the built model, which variable has no contribution to the model?

**Answer:-** **c**

