Machine learning Interview Questions And Answers
1. What is the difference between inductive machine learning and deductive machine learning?
Answer: In inductive machine learning, the model learns by examples from a set of observed instances to draw a generalized conclusion whereas in deductive learning the model first draws the conclusion and then the conclusion is drawn. Let’s understand this with an example, for instance, if you have to explain to a kid that playing with fire can cause burns. There are two ways you can explain this to kids, you can show them training examples of various fire accidents or images with burnt people and label them as “Hazardous”. In this case, the kid will learn with the help of examples and not play with fire. This is referred to as Inductive machine learning. The other way is to let your kid play with fire and wait to see what happens. If the kid gets a burn they will learn not to play with fire and whenever they come across fire, they will avoid going near it. This is referred to as deductive learning.
2. Why is Harmonic mean used to calculate F1 score and not the arithmetic mean?
Answer: Because the Harmonic mean gives more weight to lower values. Thus, we will only get high F1 score if both Precision and Recall are higher.
3. Does 100% precision mean that our model predicts all the values correctly?
Answer: No. We can get perfect precision in many ways, but it doesn’t mean that our model predicts every value accurately. Forex. if we make one single positive prediction and make sure it is correct, our precision reaches 100%. Generally, precision is used with other metrics (recall) to measure the performance.
4. What are the similarities & difference between machine learning and human learning?
Answer: Machine learning and human learning actually quite similar. Machine learning is about an algorithm or computer. Actually engaging with its environment with data and adopting a coding too on the things that it learns. Let us suppose a program fails to make the right predictions & it will balance itself in some sense. In order to make better predictions next time. Now, That is very similar to the way human learns. Human is actually engaging with its environment & learning from it. So, Machine learning has an aspect of kind of an evolutionary aspect to it. Which I think quite new to the area of this artificial intelligence.
5. How to decide one problem is a machine learning problem or not?
Answer: When you are analyzing a problem. If that problem consisting patterns and that pattern we can’t extract from mathematical equations. If you found such kind of problem then we need to use machine learning to extract those pattern by using lots of data. These above key features are helpful to predict whether the problem is a machine learning problem or not.
Example:– We need to find whether the number is even or odd. This example seems very simple. Yes, this problem is very simple because we know the logic to find whether a number is odd or even and we also know about the mathematics behind this problem. Let us suppose the number is divided by 2. Then the remainder is 1 then we call that number is odd. Whether the remainder is 0 then we call that number as an even. So, this problem has some pattern but we can solve through mathematical equations. do not need a lot of data also. So, definitely, this is not a machine learning problem. Till now this is not a machine learning problem but if you want to make it as a machine learning problem. We can do one thing. We can feed lots of data as an individual number by telling the number is odd the number is even. The machine will classify whether the number is even or odd. But as we know logic and mathematics behind this problem this problem can’t come under as a machine learning problem.
Example1:- Let’s say you have a lot of photos or images. We need to find whether a particular photo contains a human face or not. Here there is a pattern that we need to find human across all the photos. Can we solve this problem through mathematical equations? So, it’s very difficult. SImply we can take this lot of data and we can feed this data to the algorithm as training data. (Elearning Portal) Which means training the machine using this data. After this training, we will get some mathematical equations based on the patterns that we got from training. But humans can’t write this logic as their own. So Definitely, this is a machine learning problem. Here machine will automatically form a rule-based on training data. That rule is nothing but to detect whether the photo contains a human face or not.
6. What is AdaGrad algorithm in machine learning?
Answer: A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate.
7. What is a binary classification in machine learning?
Answer: A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either “spam” or “not spam” is a binary classifier.
8. What is Rectified Linear Unit (ReLU) in Machine learning?
Answer: An activation function with the following rules:
If the input is negative or zero, the output is 0.
If the input is positive, the output is equal to the input.
9. Explain what is the function of ‘Unsupervised Learning?
Answer: Find clusters of the data
Find low-dimensional representations of the data
Find interesting directions in data
Interesting coordinates and correlations
Find novel observations/ database cleaning
10. Why does overfitting happen?
Answer: The possibility of overfitting happens as the criteria used for training the model is not the same as the criteria used to judge the efficiency of a model.
11. What is a recommendation system?
Answer: Anyone who has used Spotify or shopped at Amazon will recognize a recommendation system: It’s an information filtering system that predicts what a user might want to hear or see based on choice patterns provided by the user.
12. Explain what is precision and Recall?
It is known as a true positive rate. The number of positives that your model has claimed compared to the actual defined number of positives available throughout the data.
It is also known as a positive predicted value. This is more based on the prediction. It is a measure of a number of accurate positives that the model claims when compared to the number of positives it actually claims.
13. Pick an algorithm and write a Pseudocode for the same?
Answer: This question depicts your understanding of the algorithm. This is something that one has to be very creative and also should have in-depth knowledge about the algorithms and first and foremost the individual should have a good understanding of the algorithms. Best way to answer this question would be start off with Web Sequence Diagrams.
14. List out some important methods of reducing dimensionality?
Answer: Combine features with feature engineering.
Use some form of algorithmic dimensionality reduction like ICA or PCA.
Remove collinear features to reduce dimensionality.
15. Explain the Bias-Variance Tradeoff?
Answer: Predictive models have a tradeoff between bias (how well the model fits the data) and variance (how much the model changes based on changes in the inputs).
Simpler models are stable (low variance) but they don’t get close to the truth (high bias).
More complex models are more prone to overfitting (high variance) but they are expressive enough to get close to the truth (low bias).
The best model for a given problem usually lies somewhere in the middle.
16. When is Ridge regression favorable over Lasso regression?
Answer: You can quote ISLR’s authors Hastie, Tibshirani who asserted that, in the presence of few variables with medium / large sized effect, use lasso regression. In presence of many variables with small/medium-sized effect, use ridge regression.
Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In the presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.
17. What is the convex hull?
Answer: In the case of linearly separable data, convex hull represents the outer boundaries of the two group of data points. Once the convex hull is created, we get maximum margin hyperplane (MMH) as a perpendicular bisector between two convex hulls. MMH is the line which attempts to create the greatest separation between two groups.
18. What’s the difference between a generative and discriminative model?
Answer: A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.
19. What are your training in machine learning and what types of hands-on experience do you have?
Answer: Your answer to this question will depend on your training in machine learning. Be sure to emphasize any direct projects you’ve completed as part of your education. Don’t fail to mention any additional experience that you have including certifications and how they have prepared you for your role in the machine learning field.
20. How do bias and variance play out in machine learning?
Answer: Both bias and variance are errors. Bias is an error due to flawed assumptions in the learning algorithm. Variance is an error resulting from too much complexity in the learning algorithm.
21. What is supervised versus unsupervised learning?
Answer: Supervised learning is a process of machine learning in which outputs are fed back into a computer for the software to learn from for more accurate results the next time. With supervised learning, the “machine” receives initial training to start. In contrast, unsupervised learning means a computer will learn without initial training.
22. How will you know which machine learning algorithm to choose for your classification problem?
Answer: If accuracy is a major concern for you when deciding on a machine-learning algorithm then the best way to go about it is to test a couple of different ones (by trying different parameters within each algorithm ) and choose the best one by cross-validation. A general rule of thumb to choose a good enough machine learning algorithm for your classification problem is based on how large your training set is. If the training set is small then using low variance/high bias classifiers like Naïve Bayes is advantageous over high variance/low bias classifiers like k-nearest neighbor algorithms as it might overfit the model. High variance/low bias classifiers tend to win when the training set grows in size.
23. Logistic regression gives probabilities as a result then how do we use it to predict a binary outcome?
Answer: A logistic model outputs a value between 0 and 1. To convert these probabilities into classes we use decision boundaries. We can set equal or unequal boundaries depending upon the requirement.
24. What are some common unsupervised tasks other than clustering?
Answer: Visualization, Dimensionality reduction, association rule learning.
25. What is the difference between A.I. and machine learning, and has A.I. been oversold for decades because of sci-fi?
Answer: More People thought of that A.I. that means artificial intelligence maybe than machine learning. Artificial intelligence actually it’s like which we go to see that alan turing aim was to somehow make a machine have the sort of intelligence that human might have. In particularly a program actually to convince you that it is human if you chat it with it. But I think artificial intelligence has evolved since then to make a unique sort of intelligence that machine might have. Machine learning has a slightly different quality. It is like a more specific part of artificial intelligence. Which is the idea of the program is going to change a world through coding that makes interactions as same as humans. By the end, the program might not know actually how the program is written. Because it’s been changing as it’s been interacting. So, might be when we look at the program & see the actual program we do not know why necessarily it decided to write these decisions in a particular way. Because there are a lot of connections between artificial intelligence have with machine learning.
26. Differentiate between inductive and deductive machine learning?
Answer: In inductive machine learning, the model learns through examples obtained from a set of observed instances to draw generalized conclusions, whereas in deductive machine learning certain statements are combined in a logical order as per some predefined rules to obtain new statements. Basically, inductive learning is instruction based and deductive learning is experience-based. (company)
27. What is the “Curse of Dimensionality?
Answer: The term, “Curse of Dimensionality” refers to the difficulty of searching through space with multiple dimensions; more the dimensions, more the difficulty. If talk of this term, particularly in the context of machine learning, it has to do with the difficulty associated with non-intuitive properties of data observed when working in a high-dimensional space.
28. Can you name some popular machine learning algorithms?
Yes, they are:
- Nearest Neighbour
- Neural Networks
- Decision Trees
- Support vector machines
29. What do you understand by decision tree classification?
Answer: Decision tree classification in machine learning refers to a tree-like classification model where the data is continuously split as per certain parameters. There are two primary entities in this model, namely decision nodes and leaves. The leaves denote the final outcome of decisions, while the nodes signify the point where the data is split. A decision tree classification greatly facilitates a visual and explicit representation of the decisions and decision-making process.
30. What is the difference between supervised and unsupervised learning?
Answer: In the supervised learning process, outputs are fed back into a computer system so that the software can learn from it and produce more accurate results in the successive occurrences; it is a kind of initial training for a system. On the other hand, unsupervised learning is a machine learning algorithm that draws inferences on its own from the unlabeled data set, without any external aid or input.
31. What is the activation function in Machine Learning?
Answer: A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer.
32. What is the baseline in machine learning?
Answer: A simple model or heuristic used as a reference point for comparing how well a model is performing. A baseline helps model developers quantify the minimal, expected performance on a particular problem.
33. What is the batch in machine learning?
Answer: The set of examples used in one iteration (that is, one gradient update) of model training.
34. What is the bias in machine learning?
Answer: An intercept or offset from an origin. Bias (also known as the bias term) is referred to as b or w0 in machine learning models.
35. What is the calibration layer in machine learning?
Answer: A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.
36. What is class-imbalanced data set in machine learning?
Answer: A binary classification problem in which the labels for the two classes have significantly different frequencies. For example, a disease data set in which 0.0001 of examples have positive labels and 0.9999 have negative labels is a class-imbalanced problem, but a football game predictor in which 0.51 of examples label one team winning and 0.49 label the other team winning is not a class-imbalanced problem.
37. What is the confusion matrix in machine learning?
Answer: An NxN table that summarizes how successful a classification model’s predictions were; that is, the correlation between the label and the model’s classification. One axis of a confusion matrix is the label that the model predicted, and the other axis is the actual label. N represents the number of classes.
38. How do you choose an algorithm for a classification problem?
Answer: The answer depends on the degree of accuracy needed and the size of the training set. If you have a small training set, you can use a low variance/high bias classifier. If your training set is large, you will want to choose a high variance/low bias classifier.
39. What is ‘Overfitting’ in Machine learning?
Answer: In machine learning, when a statistical model describes random error or noise instead of the underlying relationship ‘overfitting’ occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfitting.
40. What is the difference between Type 1 and Type 2 errors?
Answer: Type 1 error is classified as a false positive. I.e. This error claims that something has happened but the fact is nothing has happened. It is like a false fire alarm. The alarm rings but there is no fire.
Type 2 error is classified as a false negative. I.e. This error claims that nothing has happened but the fact is that actually, something happened at the instance.
The best way to differentiate a type 1 vs type 2 error is:
Calling a man to be pregnant- This is Type 1 example
Calling pregnant women and telling that she isn’t carrying any baby- This is type 2 example
41. What are parametric models? Give an example?
Answer: Parametric models are those with a finite number of parameters. To predict new data, you only need to know the parameters of the model. Examples include linear regression, logistic regression, and linear SVMs.
Non-parametric models are those with an unbounded number of parameters, allowing for more flexibility. To predict new data, you need to know the parameters of the model and the state of the data that has been observed. Examples include decision trees, k-nearest neighbors, and topic models using latent Dirichlet analysis.
42. What is the difference between covariance and correlation?
Answer: Correlation is the standardized form of covariance.
Covariances are difficult to compare. For example: if we calculate the covariances of salary ($) and age (years), we’ll get different covariances which can’t be compared because of having unequal scales. To combat such a situation, we calculate correlation to get a value between -1 and 1, irrespective of their respective scale.
43. How would you evaluate a logistic regression model?
Answer: A subsection of the question above. You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction, etc.) and bring up a few examples and use cases.
44. How will you explain machine learning into a layperson?
Answer: Machine learning is all about making decisions based on previous experience with a task with the intent of improving its performance. There are multiple examples that can be given to explain machine learning to a layperson –
Imagine a curious kid who sticks his palm
You have observed from your connections that obese people often tend to get heart diseases thus you make the decision that you will try to remain thin otherwise you might suffer from heart disease. You have observed a ton of data and come up with a general rule of classification.
You are playing blackjack and based on the sequence of cards you see, you decide whether to hit or to stay. In this case, based on the previous information you have and by looking at what happens, you make a decision quickly.
45. What is decision tree classification?
Answer: A decision tree builds classification (or regression) models as a tree structure, with datasets broken up into ever-smaller subsets while developing the decision tree, literally in a tree-like way with branches and nodes. Decision trees can handle both categorical and numerical data.
46. What are some methods of reducing dimensionality?
Answer: Deductive machine learning starts with a conclusion, then learns by deducing what is right or wrong about that conclusion. Inductive machine learning starts with examples from which to draw conclusions.
47. How do classification and regression differ?
Answer: Classification predicts group or class membership. Regression involves predicting a response. Classification is a better technique when you need a more definite answer.
48. What is kernel SVM?
Answer: Kernel SVM is the abbreviated version of kernel support vector machine. Kernel methods are a class of algorithms for pattern analysis and the most common one is the kernel SVM.
49. How do we separate one dimensional, two dimensional and three-dimensional data?
Answer: One dimensional can be separated using a point, two dimensional using a line and three dimensional can be separated by a hyperplane.
50. What kind problems are solved by regularization?
Answer: In machine learning, regularization is basically a process of introducing additional information with the purpose of solving an ill-posed problem or to avoid overfitting. It is basically a form of regression, which regularizes or constrains the coefficient estimates to zero. The technique of regularization prevents learning a more complex or flexible model in order to avoid overfitting risk.