## Visualize the predictive power of a numerical feature in a classification problem

Measuring the predictive power of some feature in a supervised machine learning problem is always a hard task to accomplish. Before using any correlation metrics, it’s important to visualize wether a feature is informative or not. In this article, we’re going to apply data visualization to a classification problem.

## How to measure outlier probability

Outliers are a great problem for a data scientist. They are “strange points” in a dataset that must be checked in order to verify whether they are errors or real phenomena.

## Why SQL is still important for data analysis

Data Science mixes different skills and there are some old skills that are still useful. One of such skills is SQL.

## Do companies really need Deep Learning?

We’ve seen strong growth of Deep Learning techniques in the past few years. Thanks to technologies like Tensorflow and Keras, neural networks have become accessible by anybody in the world. But is Deep Learning really useful for you?

## How my degree in Physics helped me become a better Data Scientist

I have a Master’s Degree cum laude in Theoretical Physics. When I started my journey into data science, I figured out how useful it is for this kind of job.

## How to choose the bins of a histogram?

Histograms are a very useful tool when we want to give a quick sight to the shape of our data. However, we always have to choose the right number of bins.

## How to calculate confidence intervals in Python

When we measure something, we always have to calculate the uncertainty of the result. confidence intervals are a very useful tool to calculate a range in which we can find the real value of the observable with a certain confidence.

## Outlier identification using Interquartile Range

Identifying outliers is a very common task in data pre-processing. A simple method for identifying them is using the Interquartile Range.

## Increase model stability using Bagging in Python

Data scientists usually search for a model that has the highest accuracy possible. However, they should focus on another term too, which is stability. In this article, I explain what it is and how to increase it using a technique called “bagging”.

## Feature selection with Random Forest

Feature selection has always been a great problem in machine learning.
In this article, I’ll show how to perform feature selection using a random forest model in Python.