A great strategy to use when faced with a tricky data analysis problem is to reshape the dataset into a format that turns it into an easy problem. In this article we will look at an example involving a simple calculation and extensive reshaping.
Feb 11, 2021
Storing string columns as categories can result in massive memory savings when working with large dataframes. However, those savings can surprisingly disappear when we start concatenating dataframes.
Feb 11, 2021
Real life datasets, especially ones that have been manually curated, often contain mixed data types. The first step to fixing invalid values is to get an idea of their distribution.
Feb 11, 2021
Distance matrices are rarely useful in themselves, but are often used as part of workflows involving clustering.
Feb 11, 2021
Reducing numerical precision is a way to save memory in pandas, but does it make a difference to the conclusions that we might draw from real world datasets?
Feb 11, 2021
The pandas groupby method is a very powerful problem solving tool, but that power can make it confusing. Let's take a look at the three most common ways to use it.
Feb 11, 2021
Rotating axis labels is the classic example of something that seems like an obvious tweak, but can be tricky.
Feb 11, 2021
One of the most basic elements of a chart is the size (and shape. Given all the different ways in pandas/seaborn/matplotlib to draw a chart, there are also a few different ways to set these properties.
Feb 11, 2021
Once we get used to pandas' ability to vectorize code, we want to use it all the time. Doing this with string columns requires a bit more ceremony.
Feb 11, 2021