Skip to main content

10 Machine Learning Interview Questions

A flavour of logistic regression known as OVR (One Vs. Rest) Logistic Regression can be used to solve multi-class classification problems. 

The algorithm's functioning is crudely explained using an example below:


Suppose you have to classify between three classes: Dog, cat and cow.

Instead of treating this problem as a multi-class classification problem we can treat it as three independent binary classification problems as following diagram depicts:


Now we can simply compare the probability ranking of each of the classes for example for a particular record: 

DOG Yes = 0.8

CAT Yes =  0.5

COW Yes =0.4

Then the record is a Dog.


2. What is the significance of Gini Index =0?
Gini Index = 0 signifies that the dataset entirely consists of only one class. It can also be interpreted that the dataset is entirely pure.


3. What are the differences and similarities between CART and CHAID?




4. Random Forest is an instance of which ensemble technique?


Random Forest is an instance of the Bagging ensemble technique.


5. What is sensitivity w.r.t. confusion matrix context?


Sensitivity, recall, hit-rate and True positive rate are synonyms and are represented by the following ratio:


{\displaystyle \mathrm {TPR} ={\frac {\mathrm {TP} }{\mathrm {P} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} }}=1-\mathrm {FNR} }


6. What does it mean when the model performs well on the train data but not on the test data?

It means that the model is not able to generalise and is Overfitting


Let us first look at the formulas of precision and recall.

Confusion Matrix:

Decoding the Confusion Matrix. Understand the Confusion Matrix and ...

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

The difference in the above is that the denominator of precision contains FP and the denominator of recall contains FN.

FP: Falsely classified as positive

FN: Falsely classified as negative

Assume that you run a hospital and you are predicting cancer based on  MRI scans using a machine learning algorithm, if the machine learning algorithm predicts that the person may have cancer only then you consider the patient for a proper medical test of cancer.

Now it is okay for your model to predict that a person has cancer even when the person actually doesn’t because he would be further tested.

But if your model predicts that a person doesn’t have cancer but in reality he has cancer, you would be in trouble. The case would particularly be falsely classified as negative(FN) for cancer.

So according to the above statement you need to reduce the false negatives and if you reduce FN then the Recall would increase.

Recall should be the preferred evaluation metric of choice when the there is less tolerance for False Negatives.

8. What are support vectors?

Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. 

9. Calculate the Gini-Index for a sample that contains 40 males and 60 females?

Gini Index = 1-(40/100)2- (60/100)2

                     =0.48

                  

10. How can we define/explain recall and precision in an effective way to someone who is not from the data science field?

In an exam that contains only true/false questions:

  • Recall: Ratio of correct questions you answered as true to the total correct answers you gave
  • Precision: Ratio of correct questions you answered as true to the total questions you answered as true






Comments

Popular posts from this blog

Lifecycle of a Machine Learning Project

 

Product Substituition Algorithm

The following is an illustrative explanation of the algorithm that can be used to find the best substitutes for a particular product in a retail store.