# Machine learning lnterview questions and Answers

#### 1. What is machine learning?

**Answer: **In answering this question, try to show you understand the broad applications of machine learning, as well as how it fits into AI. Put it into your own words, but convey your understanding that machine learning is a form of AI that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming.

**2. Which one would you prefer to choose – model accuracy or model performance?****Answer: **Model accuracy is just a subset of model performance but is not the be-all and end-all of model performance. This question is asked to test your knowledge on how well you can make a perfect balance between model accuracy and model performance.

**3. How will you set the threshold for credit card fraud detection model?****Answer: **A machine learning interview is a compound process and the final result of the interview is determined by multiple factors and not just by looking at the number of right answers given by the candidate. If you really want that machine learning job, it’s going to take time and dedication as you practice multiple ways to answer the above-listed machine learning interview questions, but hopefully, it is the enjoyable kind. You will learn a lot and get a good deal of knowledge preparing for your next machine learning interview with the help of these questions.

Machine learning interview questions updated on this blog have been collected from various sources like actual interview experiences of data scientists, discussions on quora, facebook, job portals and other forums, etc. To contribute to this blogpost and help the learning community, please feel free to post your questions in the comments section below.

**4. Give a drawback of Gradient descent. ?****Answer: **It does not always converge to the same point as in some cases it reaches a local-minima instead of a global optimal point.

**5. When should one use Mean absolute error over Root mean square error as a performance measure for regression problems?****Answer: **When we have many outliers in the data, Mean absolute error is a better choice.

**6. What are the three stages to build any model in Machine learning?****Answer: **There are 3 stages to build mode in machine learning. Those are

Model Building:- Choose a suitable algorithm for the model and train it according to the requirement of your problem.

Model Testing:- Check the accuracy of the model through the test data

Applying the model:- Make the required changes after testing and apply the final model which we have at the end.

**7. Why is it important for the royal society to be doing a project about machine learning?****Answer: **I think this is very important that a royal society to do a project on machine learning to realize themselves to know, how much impact machine learning is going to create in the future. There are some people who even did not hear about what is machine learning until now. That going to be changed in our society in the near future. In order to address their potential or in order to address their phase/state where they are right now. When the world is moving so forward with these cutting-edge technologies. I think it’s all about transparency that we need to tell the potential of all the things where we can go when we learn these things in our future. It’s all about looking into the future to make predictions.

**8. How do we know which machine learning algorithm is better for us to solve our problem?****Answer: **If we are concerning about accuracy then one can test with different algorithms and cross-validate them to know whether you are getting good accuracy or not. Let us suppose When your problem having some small training dataset we need to use models which having low variance and high bias. Or else When your problem having large training dataset we need to use models which having high variance and low bias. If we follow these things we will easily get o know which algorithm is better to solve your machine learning algorithm.

Besant Technologies trained students are having the luxury life by getting placed in top MNC companies and earning lots of huge amount as salary. We have lots of best feedbacks for the machine learning interview questions and answers prepared by us and these questions are fully analyzed and prepared by having a tie-up with the top MNC companies. Do pursue in the best Machine learning institute in Chennai by Besant Technologies and get placed and stay happy.

**9. How will you explain machine learning to a layperson in an easily comprehensible manner?****Answer: **Machine learning is a kind of technology that enables the computer-based machines and systems to make decisions based on prior experience with an activity, with the intent of improving its performance continuously. This can be understood through multiple examples, such as:

Imagine about a curious kid who sticks his palm

You have observed that obese people are more prone to heart diseases than thinner people; thus, you decided that you will try to remain slim to prevent the risk of heart disease. You have gone through a lot of information on this topic and then, come up with a general rule of classification.

Suppose, you are playing blackjack and based on the sequence of the cards you see, you decide whether to hit or not. In this case, based on the prior experience you have and by looking at what happens, you decide on your course of action.

The same way the machines also learn with the aid of technology.

**10. How will you choose the most appropriate machine learning algorithm for your classification problem?****Answer: **If accuracy has to be given priority in deciding a machine learning algorithm, then the best way to go about it is to test a couple of different algorithms (try different parameters within each algorithm ) and choose the one that best meets the requirement. As a rule of thumb, choose a machine learning algorithm for your classification based on the size of your training set. If the training set is small, then using low variance/high bias classifiers like Naïve Bayes is beneficial, while in the case of large training sets high variance/low bias classifiers like k-nearest would serve the purpose best.

**11. What is backpropagation in machine learning?****Answer: **A) The primary algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.

The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.

**12. What is candidate sampling in machine learning?****Answer: **A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence).

**13. What is the classification threshold in machine learning?****Answer: **A scalar-value criterion that is applied to a model’s predicted score in order to separate the positive class from the negative class. Used when mapping logistic regression results in binary classification.

**14. What is Naive Bayes classifier?****Answer: **Naïve Bayes is an extensively used algorithm for the classification task. Naïve Bayes classifier is proved to be effective in textual data analysis. This algorithm is a basis for machine learning as it seeks to work on the conditional probability to cut through the improbability of a task in advance.

**15. What is Bias-Variance trade-off in machine learning?****Answer: **Bias-Variance is a dilemma of minimizing the errors that stem from 2 different sources at a time. While Bias is based on preconceived assumptions in the learning algorithm, Variance measures whence a set of random numbers are spread across from their average value. Trading off in between these two aspects defines the process of machine algorithm.

**16. What is the difference between artificial learning and machine learning?****Answer: **Machine Learning: Designing and developing algorithms according to the behaviors based on empirical data is known as Machine Learning.

Artificial intelligence: in addition to machine learning, it also covers other aspects like knowledge representation, natural language processing, planning, robotics, etc.

**17. What is deep learning?****Answer: **This might or might not apply to the job you’re going after, but your answer will help to show you know more than just the technical aspects of machine learning. Deep learning is a subset of machine learning. It refers to using multi-layered neural networks to process data in increasingly complex ways, enabling the software to train itself to perform tasks like speech and image recognition through exposure to these vast amounts of data. Thus the machine undergoes continual improvement in the ability to recognize and process information. Layers of neural networks stacked on top of each for use in deep learning are called deep neural networks.

**18. What is Genetic Programming?****Answer: **Genetic programming is one of the two techniques used in machine learning. The model is based on the testing and selecting the best choice among a set of results. (Company)

**19. Why is Naïve Bayes machine learning algorithm naïve?****Answer: **Naïve Bayes machine learning algorithm is considered Naïve because the assumptions the algorithm makes are virtually impossible to find in real-life data. Conditional probability is calculated as a pure product of individual probabilities of components. This means that the algorithm assumes the presence or absence of a specific feature of a class is not related to the presence or absence of any other feature (absolute independence of features), given the class variable. For instance, a fruit may be considered to be a banana if it is yellow, long and about 5 inches in length. However, if these features depend on each other or are based on the existence of other features, a naïve Bayes classifier will assume all these properties to contribute independently to the probability that this fruit is a banana. Assuming that all features in a given dataset are equally important and independent rarely exists in the real-world scenario.

**20. You are given a data set. The data set has missing values which spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why ?****Answer: **This question has enough hints for you to start thinking! Since the data is spread across the median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.

**21. What’s a Fourier transform?****Answer: **A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes, and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it’s a very common way to extract features from audio signals or other time series such as sensor data.

**22. You are given a dataset where the number of variables (p) is greater than the number of observations (n) (p>n). Which is the best technique to use and why ?****Answer: **When the number of variables is greater than the number of observations, it represents a high dimensional dataset. In such cases, it is not possible to calculate a unique least-square coefficient estimate. Penalized regression methods like LARS, Lasso or Ridge seem to work well under these circumstances as they tend to shrink the coefficients to reduce variance. Whenever the least square estimates have higher variance, Ridge regression technique seems to work best.

**23. When will you use classification over regression?****Answer: **Classification is about identifying group membership while regression technique involves predicting a response. Both techniques are related to prediction, where classification predicts the belonging to a class whereas regression predicts the value from a continuous set. Classification technique is preferred over regression when the results of the model need to return the belongingness of data points in a dataset to specific explicit categories. (For instance, when you want to find out whether a name is male or female instead of just finding it how correlated they are with male and female names.

**24. If a highly positively skewed variable has missing values and we replace them with mean, do we underestimate or overestimate the values?****Answer: **Since in positively skewed data, mean in greater than the median, we overestimate the value of missing observations.

**25. What does linear in ‘linear regression’ actually mean?****Answer: **It implies that the dependent variable should be a linear function of parameters. For the same reason, Polynomial regression is classified as linear though it fits a non-linear model between the dependent and independent variable.

**26. What type of learning is needed when the system needs to adapt to rapidly changing data?****Answer: **Online learning. Because in Online learning each learning step is fast and cheap, and the system can be trained by feeding data instances sequentially.

**27. What kind of problems lend themselves to machine learning?****Answer: **I think machine learning is become such a big deal because of big data. We now had access to so much data that machine can interact with it. So, I think this would be a problem where machine learning is going to make great progress. Its like big exploitation to the data. So, one of the big challenges is for artificial intelligence is a computer vision. One of the things like humans do their job in an incredible way

Example:- When humans look at a picture and they will interpret that picture very well. For computers, it is very difficult.

Because we are trying to program the thing from bottom to upwards. But now we can expose an algorithm to many pictures as it can learn as its going learn. So, I think the sort of ability for machine actually to view its environment and interpret and read it. Where it can make a lot of progress. Frankly, where are this data machine learning would be successful? For Example recommendations on the internet and navigation like whenever we drive we are giving new information to it and that’s being used and adapt to change the progress to a higher level. Likewise, health filed one of the biggest filed where a machine can study a lot of data that doctors can’t study and can’t maintain that much data.

**28. What is false positive and false negative in terms of machine learning?****Answer: **Let see you are performing some task or you conducted some experiment or you conducted some test and whatever the test is associated with you or whatever the output came from your test or task is actually a negative but you actually predicted as a positive. That means you performed some experiment and output is actually negative but you predicted as a positive. So, Those kinds of cases will lie under false positive. In false-negative exactly negative of the previous case called false positive. Actually, there are some outputs which are actually positive but you predicted a negative. So those kinds of cases lie under false negative.

**29. What do you mean by parametric models? Also, give some examples of them?****Answer: **Parametric models are the models having a limited number of parameters. In order to predict new data, you only need to know the parameters of the model. The examples of such models include logistic regression, linear regression, and linear SVMs.

**30. What is a neural network and what are some advantages and disadvantages of such a network?****Answer: **In the information technology field, a neural network is basically a system of hardware and/or software akin to the pattern of neurons in the human brain; it constitutes an important part of deep learning. The greatest advantage of neural networks is that they lead to the performance breakthroughs for unstructured datasets like audio, video, and images. Their high flexibility enables them to learn patterns that no other ML algorithm would be unable to manage. However, the disadvantage of neural networks is that need a huge volume of training data to work effectively. Also, there is difficulty in picking the right architecture for these networks due to their incomprehensible internal layers.

**31. What is the sigmoid function in Machine learning?****Answer: **A function that maps logistic or multinomial regression output (log odds) to probabilities, returning a value between 0 and 1.

**32. What is batch size machine learning?****Answer: **The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference.

**33. What is bucketing in machine learning?****Answer: **Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins. Given temperature data sensitive to a tenth of a degree, all temperatures between 0.0 and 15.0 degrees could be put into one bin, 15.1 to 30.0 degrees could be a second bin, and 30.1 to 50.0 degrees could be the third bin.

**34. What is a checkpoint in machine learning?****Answer: **Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.

**35. What is collaborative filtering in machine learning?****Answer: **Making predictions about the interests of one user based on the interests of many other users. Collaborative filtering is often used in recommendation systems.

**36. What is supervised and unsupervised machine learning?****Answer: **Supervised machine learning is commonly used as the algorithm is already fed as an input and the algorithm is taught from an equipped dataset. The AI is guided steadily to teach itself with the readily available data resources.

Unsupervised machine learning refers to a process where the machine goes blindly into analyzing the input whose outcome is necessarily unknown.

**37. How to choose notable variables while working on a data set?****Answer: **Removing the correlated variables is the first step before marking the selective variables as correlation hinders uniqueness among the variables. Other important tools such as linear regression, Random Forest and Lasso regression are keys to select variables in a machine learning process.

**38. What is ‘Training set’ and ‘Test set’?****Answer: **Training set: It is a set of data is used to discover the potentially predictive relationship in various areas of information science like machine learning. It is an example given to the learner.

Test set: It is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner.

**39. What is the standard approach to supervised learning?****Answer: **Split the set of example into the training set and the test is the standard approach to supervised learning is.

**40. What is Model Selection in Machine Learning?****Answer: **The process of choosing models among diverse mathematical models, which are used to define the same data set is known as Model Selection. It is applied to the fields of statistics, data mining, and machine learning.

**41. Explain the two components of Bayesian logic program?****Answer: **The bayesian logic program consists of two components. The first component is a logical one; it consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain. The second component is a quantitative one, it encodes the quantitative information about the domain.

**42. List some use cases where classification machine learning algorithms can be used?****Answer: **Natural language processing (Best example for this is Spoken Language Understanding )

Market Segmentation

Text Categorization (Spam Filtering )

Bioinformatics (Classifying proteins according to their function)

Fraud Detection

Face detection

**43. How much data will you allocate for your training, validation and test sets?****Answer: **There is no to the point answer to this question but there needs to be a balance/equilibrium when allocating data for training, validation and test sets.

If you make the training set too small, then the actual model parameters might have a high variance. Also, if the test set is too small, there are chances of unreliable estimation of model performance. A general thumb rule to follow is to use 80: 20 train/test spilled. After this, the training set can be further split into validation sets.

**44. What is the most frequent metric to assess model accuracy for classification problems?****Answer: **Percent Correct Classification (PCC) measures the overall accuracy irrespective of the kind of errors that are made, all errors are considered to have the same weight.

**45. “People who bought this, also bought….” recommendations on Amazon are a result of which machine learning algorithm?****Answer: **Recommender systems usually implement the collaborative filtering machine learning algorithm that considers user behavior for recommending products to users. Collaborative filtering machine learning algorithms exploit the behavior of users and products through ratings, reviews, transaction history, browsing history, selection and purchase information.46. Name some feature extraction techniques used for dimensionality reduction. ?Answer:

Independent Component Analysis

Principal Component Analysis

Kernel-Based Principal Component Analysis

**47. What kind of problems does regularization solve?****Answer: **Regularization is used to address overfitting problems as it penalizes the loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of your weights vector w.

**48. Why is Manhattan distance not used in kNN machine learning algorithm to calculate the distance between nearest neighbors?****Answer: **Manhattan distance has restrictions on dimensions and calculates the distance either vertically or horizontally. Euclidean distance is a better option in kNN to calculate the distance between nearest neighbors because the data points can be represented in any space without any dimension restriction.

**49. Why do we convert categorical variables into factor? Which function is used in R to perform the same?****Answer: **Most Machine learning algorithms require numbers as input. On converting categorical values to factors we get numerical values and also we don’t have to deal with dummy variables.

We can use both factor() and as.factor() to convert variables to factors.

**50. What are Standardization and Normalisation? Give one advantage of each over the other?****Answer: **Both are feature scaling techniques.

Standardization is less affected by outliers as compared to Normalisation.

Standardization doesn’t bound values to a specific range which may be a problem for some algorithms where input is bounded between ranges.

**51. How is machine learning used in the movement?****Answer:** As per my knowledge many people already using machine learning in their everyday life. Let us suppose when you are engaging with the internet you are actually expressing your preferences, likes, dislikes through your search. So all these things picked up by cookies coming on to your computer. From that, we can evaluate the behavior of a user. Basically, that will help us to increase the progress of a user through the internet. Navigation is also one of the examples where we are using machine learning to find a distance between two places through using optimization techniques. I think people going to more engage with machine learning in the near future is health.**Example:-** If you see now, Actually Watson is being to use for health. It looking at and scans of body data & trying to understand the symptoms of cancer. These are the things machine learning used in the movement.