Data Analytics With Python Week 11 Answers


Q1. ________ is used for calculating distance measures in clustering using python

a. distance_matrix 
b. spatial_matrix 
c. scipy_matrix 
d. distance.matrix

Answer:- A

Q2. The formula for dissimilarity computation between two objects for categorical variables is – 
Here p is a categorical variable and m denotes the number of matches.

  • D(i,j) = p-m / p 
  • D(i,j) = p-m / m 
  • D(i,j) = m-p / p 
  • D(i,j) = m-p / m

Answer: A

Q3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we have the following measurements: f = (1, 2, 3, 4, 5, 8, 50) containing one outlying value.

  • Std deviation (std_f) and mean absolute deviation (s_f) are equally affected 
  • Mean absolute deviation (s_f) is more affected by the outlier 
  • Std deviation (std_f) is more affected by the outlier 
  • None of these

Answer: B

Q4. Which of the following is true for K-means clustering?

  • It comes under the partitioning method 
  • The number of clusters is predefined for this method 
  • Cluster similarity is measure in regard to the mean value of the objects in a cluster 
  • All of the above

Answer: D

Q5. Which of the following can act as possible termination conditions in K-Means?

  1. For a fixed number of iterations.
  2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
  3. Centroids do not change between successive iterations.
  4. Terminate when Residual Sum of Squares (RSS) falls below a threshold.
  • 1,3 and 4 
  • 1,2,3 and 4 
  • 2 and 3 
  • None of these

Answer: B

Q6. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

Answer:- B

Q7. Which of the following clustering requires merging approach?

Answer: C

Q8. State True or False: Hierarchical clustering should primarily be used for exploration

Answer: A

Q9. State True or False: For finding dissimilarity between two clusters in hierarchical clustering, average-link is the only metric used

Answer: B

Q10. Hierarchical clustering can either be an agglomerative or divisive algorithm

Answer: A