Gianluca Malato, Author at Your Data Teacher

June 27, 2022

Is your dataset imbalanced?

Dealing with unbalanced datasets is always hard for a data scientist. Such datasets can create trouble for our machine learning models if we don’t deal with them properly. So, measuring how much our dataset is unbalanced is important before taking the proper precautions. In this article, I suggest some possible techniques.

June 20, 2022

When to retrain a machine learning model?

Training a model is a complex process requiring much effort and analysis. Once a model is ready, we know that it won’t be valid forever and that we’ll need to train it again. How can we decide if a model needs to be retrained? There are some techniques that help us.

June 13, 2022

Which models require normalized data?

Data pre-processing is an important part of every machine learning project. A very useful transformation to be applied to data is normalization. Some models require it as mandatory to work properly. Let’s see some of them.

June 6, 2022

Which models are interpretable?

Model explanation is an essential task in supervised machine learning. Explaining how a model can represent the information is crucial to understanding the dynamics that rule our data. Let’s see some models that are easy to interpret.

May 20, 2022

How To Run A/B Tests

Online marketing and startup growth are better if you can continuously test different ideas. The statistic comes into help when we have to perform A/B tests. The results you may achieve with the proper analysis can give your project a great boost.

May 2, 2022

Are your training and test sets comparable?

Data scientists usually split a dataset into training and test sets. Their model is trained on the former and then its performance is checked in the latter. But, if these sets are sampled wrongly, model performance may be affected by biases.

April 13, 2022

A language detection model in pure Javascript

Models are very powerful tools to be used in several web applications. The general approach is to make them accessible by using REST APIs. In this article, I’ll talk about a model that can be created in Python and then deployed in pure Javascript.

visualize the predictive power of a numerical feature against a categorical target

March 30, 2022

Visualize the predictive power of a numerical feature in a classification problem

Measuring the predictive power of some feature in a supervised machine learning problem is always a hard task to accomplish. Before using any correlation metrics, it’s important to visualize wether a feature is informative or not. In this article, we’re going to apply data visualization to a classification problem.

March 15, 2022

How to measure outlier probability

Outliers are a great problem for a data scientist. They are “strange points” in a dataset that must be checked in order to verify whether they are errors or real phenomena.

March 10, 2022

Why SQL is still important for data analysis

Data Science mixes different skills and there are some old skills that are still useful. One of such skills is SQL.