Real life datasets, especially ones that have been manually curated, often contain mixed data types. The first step to fixing invalid values is to get an idea of their distribution.
Jan 6, 2021
Distance matrices are rarely useful in themselves, but are often used as part of workflows involving clustering.
Jan 6, 2021
Reducing numerical precision is a way to save memory in pandas, but does it make a difference to the conclusions that we might draw from real world datasets?
Jan 6, 2021
The pandas groupby method is a very powerful problem solving tool, but that power can make it confusing. Let's take a look at the three most common ways to use it.
Jan 6, 2021
Rotating axis labels is the classic example of something that seems like an obvious tweak, but can be tricky.
Jan 6, 2021
One of the most basic elements of a chart is the size (and shape. Given all the different ways in pandas/seaborn/matplotlib to draw a chart, there are also a few different ways to set these properties.
Jan 6, 2021
Once we get used to pandas' ability to vectorize code, we want to use it all the time. Doing this with string columns requires a bit more ceremony.
Jan 6, 2021