## Is there any difference between data science and machine learning?

Data Science and machine learning are two wonderful and exciting disciplines and are a great part of our lives. Sometimes people confuse them, but they are quite different things.

## Free course! Exploratory Data Analysis in Python

I’m very glad to announce that I’ve published my new free course! The topic is the Exploratory Data Analysis using …

## What managers should expect from Data Scientists

Data Science has entered the world of big companies, where data is. Managers of such companies often ask things that they don’t actually need and forget to pretend the only useful things to have.

## Monthly payments and more news

I’ve just introduced monthly payments for my learning paths and have other plans and ideas for the last part of this year.

## A Python library to remove collinearity

Collinearity is a very common problem in machine learning projects. It is the correlation between the features of a dataset and it can reduce the performance of our models because it increases variance and the number of dimensions. It becomes worst when you have to work with unsupervised models. In order to solve this problem, I’ve created a Python library that removes the collinear features.

## How to use Q-Q plot for checking the distribution of our data

Data scientists usually need to check the statistics of their datasets, particularly against known distributions or comparing them with other datasets. There are several hypothesis tests we can run for this goal, but I often prefer using a simple, graphical representation. I’m talking about Q-Q plot.

## Are you still using 0.5 as a threshold?

In binary classification problems, we usually convert the score given by a model into a predicted class applying a threshold. If the score is greater than the threshold, we predict 1, otherwise, we predict 0. This threshold is usually set to 0.5, but is it correct?

## Precision, recall, accuracy. How to choose?

When we have to work with a binary classification problem, we often have to choose the performance metric that represents the generalizing capability of our model. There’s no universal metric we can use, since it strongly depends on the problem we are building our model for.

## How accurate is your accuracy?

In binary classification models, we often work with proportions to calculate the accuracy of a model. For example, we use accuracy, precision and recall. But how can we calculate the error on these estimates? Are two models with 95% accuracy actually equivalent? Well, the answer is no. Let’s see why.

## Don’t start learning data science with neural networks

I often meet students that start their journey towards data science with Keras, Tensorflow and, generally speaking, Deep Learning. They build tons of neural networks like crazy, but in the end they fail with their models because they don’t know machine learning enough nor they are able to apply the necessary pre-processing techniques needed for making neural networks work.