Real life datasets, especially ones that have been manually curated, often contain mixed data types. The first step to fixing invalid values is to get an idea of their distribution.
Jan 6, 2021
Distance matrices are rarely useful in themselves, but are often used as part of workflows involving clustering.
Reducing numerical precision is a way to save memory in pandas, but does it make a difference to the conclusions that we might draw from real world datasets?
The pandas groupby method is a very powerful problem solving tool, but that power can make it confusing. Let's take a look at the three most common ways to use it.
Rotating axis labels is the classic example of something that seems like an obvious tweak, but can be tricky.
One of the most basic elements of a chart is the size (and shape. Given all the different ways in pandas/seaborn/matplotlib to draw a chart, there are also a few different ways to set these properties.
Once we get used to pandas' ability to vectorize code, we want to use it all the time. Doing this with string columns requires a bit more ceremony.