Hany Hossny
January 6, 2023    3 minutes

50 Data Science Interview Questions

I needed this list of questions when I was interviewing for data science or ML engineering roles. This list aim to help data science leaders interview DS/ML engineers and help DS/ML engineers to study what is important and ace their interviews.

  1. What is the difference between supervised and unsupervised learning?
  2. What is the difference between regression, classification, clustering and ranking?
  3. What metrics will you use to evaluate a regression problem?
  4. What does it mean to have low MAE and high MSE?
  5. What metrics will you use to evaluate a classification model?
  6. Why is accuracy a bad metric for classification?
  7. How can you tackle data imbalance?
  8. Can you describe a situation where precision is more important than recall and F-score?
  9. Is the F-score a statistically significant metric?
  10. Can you explain what the area under ROC Curve (AUC-ROC) is?
  11. Is AUC_ROC immune to data imbalance?
  12. What is the area under the PR curve (AUC_PR) metric?
  13. How will you measure the association/correlation between two numerical variables?
  14. How will you measure the association/correlation between two categorical variables?
  15. How will you measure the association/correlation between one numerical variable and one categorical variable?
  16. What are the data assumption to use Pearson correlation?
  17. If we have zero Pearson correlation, what does this imply?
  18. Can you explain Spearman’s Correlation?
  19. Can you explain how the decision tree works?
  20. Can you explain how logistic regression works? Why is it called regression despite being a classifier?
  21. Can you explain the bias-variance trade-off?
  22. What is cross-validation and when is it important?
  23. What is A/B Testing?
  24. Can you Explain the curse of dimensionality?
  25. Can you explain how PCA works?
  26. Can you explain any other feature reduction methods other than PCA? Independent Component Analysis, Canonical Correlation Analysis, Common Spatial Method?
  27. Can you explain how deep learning works?
  28. What is gradient descent?
  29. How can you detect outliers?
  30. What is the difference between the deterministic model and the stochastic model?
  31. Do distance-based algorithms require orthogonality of the features? Why?
  32. What is feature scaling? Normalization and standardization?
  33. What is discretization? When is it important?
  34. What is hyperparameter optimization?
  35. What is the survival bias? And why is it a problem?
  36. How can you specify the number of clusters to use in a clustering problem?
  37. What is regularization? Why is it important? What are the different types of regularization?
  38. Describe the most interesting project you worked on during your career
  39. Can you explain reinforcement learning?
  40. How would you evaluate the clustering model?
  41. Can you explain what are “Self-Selection Bias”, “Under coverage bias” and “Survival Bias”?
  42. What are sampling techniques you know? random sampling, systematic sampling, stratified sampling, cluster sampling, etc.?
  43. Does correlation imply causality? Does correlation imply common causality?
  44. What are the confounding variables?
  45. How do you handle missing values?
  46. Can you explain what is the discriminative bias?
  47. Can you explain what is the under-represented segments?
  48. Can you explain what is the data drift and the concept drift?
  49. What is the difference between data science, data engineering and data analytics?
  50. What is the difference between artificial intelligence, machine learning and deep learning?

Connect on LinkedIn: https://www.linkedin.com/in/hanyhossny/

Read other articles: https://hany-hossny.medium.com/

Follow on Twitter: https://twitter.com/_HanyHossny

Many of these questions are written by me and others are taken from various websites. I listed the other websites as references in the links below



About Author
Related Posts

© 2024, Copyrights, by dataworks.ai