## 11. How do you handle missing or corrupted data in a dataset?

- Drop missing rows or columns
- Replace missing values with mean/median/mode
- Assign a unique category to missing values
- All of the above

Answer : D

Explanation: All of the above techniques are different ways of imputing the missing or corrupted data in a dataset.

## 12. The most widely used metrics and tools to assess a classification model are:

- Confusion matrix
- Cost-sensitive accuracy
- Area under the ROC curve
- All of the above

Answer : D

Explanation: None

## 13. A model of language consists of the categories which do not include?

- Language units
- Structural units
- Role structure of units
- System constraints

Answer : B

Explanation: A model of language consists of categories which does not include structural units.

## 14. Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?

- Decision Trees
- Model-based clustering
- K-means clustering
- Density-based clustering

Answer : D

Explanation: The density-based clustering methods recognize clusters based on the density function distribution of the data object. For clusters with arbitrary shapes, these algorithms connect regions with sufficiently high densities into clusters.

## 15. Which of the following is a disadvantage of decision trees?

- Factor analysis
- Decision trees are robust to outliers
- Decision trees are prone to be overfit
- None of the above

Answer : C

Explanation: Allowing a decision tree to split to a granular degree makes decision trees prone to learning every point extremely well to the point of perfect classification that is overfitting.

## 16. Which of the following is true about Naive Bayes?

- Assumes that all the features in a dataset are equally important
- Assumes that all the features in a dataset are independent
- Both A and B
- None of the above options

Answer : C

Explanation: None

## 17. Among the following which is not a horn clause?

- p → Øq
- p
- p → q
- Øp V q

Answer : A

Explanation: p → Øq is not a horn clause from the above options.

## 18. Which of the following techniques can not be used for normalization in text mining?

- Stop Word Removal
- Stemming
- Lemmatization
- None of the above

Answer : A

Explanation: Stop word removal is not but Lemmatization and stemming are the techniques of keyword normalization.

## 19. Which of the following is a reasonable way to select the number of principal components “k”?

- Choose k to be the smallest value so that at least 99% of the varinace is retained
- Use the elbow method
- Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer)
- Choose k to be the largest value so that 99% of the variance is retained

Answer : A

Explanation: Choose k to be the smallest value so that at least 99% of the variance is retained and This will maintain the structure of the data and also reduce its dimension.

## 20. In which of the following cases will K-means clustering fail to give good results?

**Data points with outliers****Data points with different densities****Data points with nonconvex shapes**

- 1 & 2
- 1, 2, & 3
- 2 & 3
- 1 & 3

Answer : B

Explanation: K-means clustering algorithm of Machine Learning fails to give good results when the data contains outliers, the density spread of data points across the data space is different, and when the data points with nonconvex shapes.

Pingback: Artificial Intelligence (AI) MCQ Questions and Answers - cozmocard

Pingback: Data Science MCQ Questions And Answers - cozmocard