Machine Learning Interview Questions To Know

https://miro.medium.com/max/1200/0*bk9JSxxXKFvO-20W

Original Source Here

This set of Machine Learning interview questions deal with scenario-based Machine Learning questions.

‘People who bought this also bought…’ recommendations seen on Amazon is based on which algorithm?

E-commerce websites like Amazon make use of Machine Learning to recommend products to their customers. The basic idea of this kind of recommendation comes from collaborative filtering. Collaborative filtering is the process of comparing users with similar shopping behaviors in order to recommend products to a new user with similar shopping behavior.

To better understand this, let’s look at an example. Let’s say a user A who is a sports enthusiast bought, pizza, pasta, and a coke. Now a couple of weeks later, another user B who rides a bicycle buys pizza and pasta. He does not buy the coke, but Amazon recommends a bottle of coke to user B since his shopping behaviors and his lifestyle is quite similar to user A. This is how collaborative filtering works.

You’re asked to build a random forest model with 10000 trees. During its training, you got training error as 0.00. But, on testing the validation error was 34.23. What is going on? Haven’t you trained your model perfectly?

  • The model is overfitting the data.
  • Training error of 0.00 means that the classifier has mimicked the training data patterns to an extent.
  • But when this classifier runs on the unseen sample, it was not able to find those patterns and returned the predictions with more number of errors.
  • In Random Forest, it usually happens when we use a larger number of trees than necessary. Hence, to avoid such situations, we should tune the number of trees using cross-validation.

You are given a data set consisting of variables having more than 30% missing values? Let’s say, out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them?

  • Assign a unique category to the missing values, who knows the missing values might uncover some trend.
  • We can remove them blatantly.
  • Or, we can sensibly check their distribution with the target variable, and if found any pattern we’ll keep those missing values and assign them a new category while removing others.

You are asked to build a multiple regression model but your model R² isn’t as good as you wanted. For improvement, you remove the intercept term now your model R² becomes 0.8 from 0.3. Is it possible? How?

Yes, it is possible.

  • The intercept term refers to model prediction without any independent variable or in other words, mean prediction
    R² = 1 — ∑(Y — Y´)²/∑(Y — Ymean)² where Y´ is the predicted value.
  • In the presence of the intercept term, R² value will evaluate your model with respect to the mean model.
  • In the absence of the intercept term (Ymean), the model can make no such evaluation,
  • With large denominator,
    Value of ∑(Y — Y´)²/∑(Y)² equation becomes smaller than actual, thereby resulting in a higher value of R².

You are given a data set. The data set contains many variables, some of which are highly correlated and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why?

Possibly, you might get tempted to say no, but that would be incorrect.
Discarding correlated variables will have a substantial effect on PCA because, in the presence of correlated variables, the variance explained by a particular component gets inflated.

Suppose you found that your model is suffering from low bias and high variance. Which algorithm you think could tackle this situation and Why?

Type 1: How to tackle high variance?

  • Low bias occurs when the model’s predicted values are near to actual values.
  • In this case, we can use the bagging algorithm (eg: Random Forest) to tackle high variance problem.
  • Bagging algorithm will divide the data set into its subsets with repeated randomized sampling.
  • Once divided, these samples can be used to generate a set of models using a single learning algorithm. Later, the model predictions are combined using voting (classification) or averaging (regression).

Type 2: How to tackle high variance?

  • Lower the model complexity by using regularization technique, where higher model coefficients get penalized.
  • You can also use top n features from variable importance chart. It might be possible that with all the variable in the data set, the algorithm is facing difficulty in finding the meaningful signal.

You are working on a time series data set. Your manager has asked you to build a high accuracy model. You start with the decision tree algorithm since you know it works fairly well on all kinds of data. Later, you tried a time series regression model and got higher accuracy than the decision tree model. Can this happen? Why?

  • Time series data is based on linearity while a decision tree algorithm is known to work best to detect non-linear interactions
  • Decision tree fails to provide robust predictions. Why?
  • The reason is that it couldn’t map the linear relationship as good as a regression model did.
  • We also know that a linear regression model can provide a robust prediction only if the data set satisfies its linearity assumptions.

You are given a cancer detection data set. Let’s suppose when you build a classification model you achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?

You can do the following:

  • Add more data
  • Treat missing outlier values
  • Feature Engineering
  • Feature Selection
  • Multiple Algorithms
  • Algorithm Tuning
  • Ensemble Method
  • Cross-Validation

Suppose you are given a data set which has missing values spread along 1 standard deviation from the median. What percentage of data would remain unaffected and Why?

Since the data is spread across the median, let’s assume it’s a normal distribution.
As you know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.

A jar has 1000 coins, of which 999 are fair and 1 is double headed. Pick a coin at random, and toss it 10 times. Given that you see 10 heads, what is the probability that the next toss of that coin is also a head?

  • There are two ways of choosing a coin. One is to pick a fair coin and the other is to pick the one with two heads.
  • Probability of selecting fair coin = 999/1000 = 0.999
    Probability of selecting unfair coin = 1/1000 = 0.001
  • Selecting 10 heads in a row = Selecting fair coin * Getting 10 heads + Selecting an unfair coin
  • P (A) = 0.999 * (1/2)¹⁰ = 0.999 * (1/1024) = 0.000976
    P (B) = 0.001 * 1 = 0.001
    P( A / A + B ) = 0.000976 / (0.000976 + 0.001) = 0.4939
    P( B / A + B ) = 0.001 / 0.001976 = 0.5061
  • Probability of selecting another head = P(A/A+B) * 0.5 + P(B/A+B) * 1 = 0.4939 * 0.5 + 0.5061 = 0.7531

How do you map nicknames (Pete, Andy, Nick, Rob, etc) to real names?

  • This problem can be solved in n number of ways. Let’s assume that you’re given a data set containing 1000s of twitter interactions. You will begin by studying the relationship between two people by carefully analyzing the words used in the tweets.
  • This kind of problem statement can be solved by implementing Text Mining using Natural Language Processing techniques, wherein each word in a sentence is broken down and co-relations between various words are found.
  • NLP is actively used in understanding customer feedback, performing sentimental analysis on Twitter and Facebook. Thus, one of the ways to solve this problem is through Text Mining and Natural Language Processing techniques.

How would you predict who will renew their subscription next month? What data would you need to solve this? What analysis would you do? Would you build predictive models? If so, which algorithms?

  • Let’s assume that we’re trying to predict renewal rate for Netflix subscription. So our problem statement is to predict which users will renew their subscription plan for the next month.
  • Next, we must understand the data that is needed to solve this problem. In this case, we need to check the number of hours the channel is active for each household, the number of adults in the household, number of kids, which channels are streamed the most, how much time is spent on each channel, how much has the watch rate varied from last month, etc. Such data is needed to predict whether or not a person will continue the subscription for the upcoming month.
  • After collecting this data, it is important that you find patterns and correlations. For example, we know that if a household has kids, then they are more likely to subscribe. Similarly, by studying the watch rate of the previous month, you can predict whether a person is still interested in a subscription. Such trends must be studied.
  • The next step is analysis. For this kind of problem statement, you must use a classification algorithm that classifies customers into 2 groups:

* Customers who are likely to subscribe next month

*Customers who are not likely to subscribe next month

  • Would you build predictive models? Yes, in order to achieve this you must build a predictive model that classifies the customers into 2 classes like mentioned above.
  • Which algorithms to choose? You can choose classification algorithms such as Logistic Regression, Random Forest, Support Vector Machine, etc.
  • Once you’ve opted the right algorithm, you must perform model evaluation to calculate the efficiency of the algorithm. This is followed by deployment.

We have two options for serving ads within Newsfeed:
1 — out of every 25 stories, one will be an ad
2 — every story has a 4% chance of being an ad

For each option, what is the expected number of ads shown in 100 news stories?
If we go with option 2, what is the chance a user will be shown only a single ad in 100 stories? What about no ads at all?

  • The expected number of ads shown in 100 new stories for option 1 is equal to 4 (100/25 = 4).
  • Similarly, for option 2, the expected number of ads shown in 100 new stories is also equal to 4 (4/100 = 1/25 which suggests that one out of every 25 stories will be an ad, therefore in 100 new stories there will be 4 ads)
  • Therefore for each option, the total number of ads shown in 100 new stories is 4.
  • The second part of the question can be solved by using Binomial distribution. Binomial distribution takes three parameters:

*The probability of success and failure, which in our case is 4%.

*The total number of cases, which is 100 in our case.

*The probability of the outcome, which is a chance that a user will be shown only a single ad in 100 stories

  • p(single ad) = (0.96)⁹⁹*(0.04)¹

(note: here 0.96 denotes the chance of not seeing an ad in 100 stories, 99 denotes the possibility of seeing only 1 ad, 0.04 is the probability of seeing an ad once in 100 stories )

  • In total, there are 100 positions for the ad. Therefore, 100 * p(single ad) = 7.03%

There’s a game where you are asked to roll two fair six-sided dice. If the sum of the values on the dice equals seven, then you win $21. However, you must pay $5 to play each time you roll both dice. Do you play this game? And in the follow-up: If he plays 6 times what is the probability of making money from this game?

  • The first condition states that if the sum of the values on the 2 dices is equal to 7, then you win $21. But for all the other cases you must pay $5.
  • First, let’s calculate the number of possible cases. Since we have two 6-sided dices, the total number of cases => 6*6 = 36.
  • Out of 36 cases, we must calculate the number of cases that produces a sum of 7 (in such a way that the sum of the values on the 2 dices is equal to 7)
  • Possible combinations that produce a sum of 7 is, (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1). All these 6 combinations generate a sum of 7.
  • This means that out of 36 chances, only 6 will produce a sum of 7. On taking the ratio, we get: 6/36 = 1/6
  • So this suggests that we have a chance of winning $21, once in 6 games.
  • So to answer the question if a person plays 6 times, he will win one game of $21, whereas for the other 5 games he will have to pay $5 each, which is $25 for all five games. Therefore, he will face a loss because he wins $21 but ends up paying $25.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: