Every time we train a model we should check if its performance beats some baseline, which is a trivial model that doesn’t take the inputs into account. Comparing our model with a baseline model, we can actually figure out whether it actually learns or not.
Is your dataset imbalanced?
Dealing with unbalanced datasets is always hard for a data scientist. Such datasets can create trouble for our machine learning models if we don’t deal with them properly. So, measuring how much our dataset is unbalanced is important before taking the proper precautions. In this article, I suggest some possible techniques.
When to retrain a machine learning model?
Training a model is a complex process requiring much effort and analysis. Once a model is ready, we know that it won’t be valid forever and that we’ll need to train it again. How can we decide if a model needs to be retrained? There are some techniques that help us.
Which models require normalized data?
Data pre-processing is an important part of every machine learning project. A very useful transformation to be applied to data is normalization. Some models require it as mandatory to work properly. Let’s see some of them.
Which models are interpretable?
Model explanation is an essential task in supervised machine learning. Explaining how a model can represent the information is crucial to understanding the dynamics that rule our data. Let’s see some models that are easy to interpret.
How To Run A/B Tests
Online marketing and startup growth are better if you can continuously test different ideas. The statistic comes into help when we have to perform A/B tests. The results you may achieve with the proper analysis can give your project a great boost.
Are your training and test sets comparable?
Data scientists usually split a dataset into training and test sets. Their model is trained on the former and then its performance is checked in the latter. But, if these sets are sampled wrongly, model performance may be affected by biases.
Visualize the predictive power of a numerical feature in a classification problem
Measuring the predictive power of some feature in a supervised machine learning problem is always a hard task to accomplish. Before using any correlation metrics, it’s important to visualize wether a feature is informative or not. In this article, we’re going to apply data visualization to a classification problem.
How to measure outlier probability
Outliers are a great problem for a data scientist. They are “strange points” in a dataset that must be checked in order to verify whether they are errors or real phenomena.