10 Technical Data Science Interview Questions

Mavis LohUncategorized

These are unprecedented times where many of us are looking to switch or land a job. Interview preparation has come to the limelight. And interviews are a big deal for everyone. Uncertainty, randomness, and human errors make an interview damn scary. Adrenaline rushing through your veins, you are on the verge of messing it all up. Preparedness is the only solution to minimise your losses during an interview. So, here are 10 Technical Data Science Interview Questions to help you prepare well for your next data science interview.

1. What is the difference between supervised and unsupervised machine learning?

In a supervised learning model, the algorithm learns on a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data. However, an unsupervised model, in contrast, provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own.

2. What is Selection Bias?

Selection bias occurs when sample obtained is not representative of the population intended to be analysed.

3. What is a confusion matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the error-rate, accuracy, specificity, sensitivity, precision and recall of a classification model. A binary classifier predicts all data instances of a test dataset as either positive or negative. This produces four outcomes –

  1. False positive(FP) — Incorrect positive prediction
  2. True negative(TN) — Correct negative prediction
  3. False negative(FN) — Incorrect negative prediction

4. Explain Decision Tree algorithm in detail

Decision tree is a supervised machine learning algorithm mainly used for the Regression and Classification.It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. Decision tree can handle both categorical and numerical data.

5. What is pruning in Decision Tree ?

When we remove sub-nodes of a decision node, this process is called pruning or opposite process of splitting.

6. What is logistic regression? Or State an example when you have used logistic regression recently.

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

7. What do you understand by the term Normal Distribution?

Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve.

8. What is deep learning?

Deep learning is sub field of machine learning inspired by structure and function of brain called artificial neural network. We have a lot numbers of algorithms under machine learning like Linear regression, SVM, Neural network etc and deep learning is just an extension of Neural networks. In neural nets we consider small number of hidden layers but when it comes to deep learning algorithms we consider a huge number of hidden layers to better understand the input output relationship.

9. What is the difference between machine learning and deep learning?

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning can be categorised in following three categories.

  1. Unsupervised machine learning,
  2. Reinforcement learning

However, deep learning is a subset of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks (ANN).

10. What is the difference between Regression and classification ML techniques.

Both Regression and classification machine learning techniques come under supervised machine learning algorithms. In Supervised machine learning algorithm, we have to train the model using labelled data set, While training we have to explicitly provide the correct labels and algorithm tries to learn the pattern from input to output. If our labels are discrete values then it will a classification problem, e.g A,B etc. but if our labels are continuous values then it will be a regression problem, e.g 1.23, 1.333 etc.