For my chart, I’m utilizing an Olympic Historic Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public area license.
It incorporates occasion to Athlete degree Olympic Video games Outcomes from Athens 1896 to Beijing 2022. After an EDA (Exploratory Information Evaluation) I remodeled it right into a dataset that particulars the variety of feminine athletes in every sport/occasion per 12 months. My bubble chart concept is to indicate which sports activities have a 50/50 feminine to male ratio athletes and the way it has advanced throughout time.
My plotting information consists of two completely different datasets, one for annually: 2020 and 1996. For every dataset I’ve computed the full sum of athletes that participated to every occasion (athlete_sum) and the way a lot that sum represents in comparison with the variety of complete athletes (male + feminine) (distinction). See a screenshot of the information beneath:
That is my method to visualise it:
Dimension proportion. Utilizing radius of bubbles to match quantity athletes per sport. Greater bubbles will signify extremely aggressive occasions, equivalent to AthleticsMulti variable interpretation. Making use of colors to signify feminine illustration. Gentle inexperienced bubbles will signify occasions with a 50/50 break up, equivalent to Hockey.
Right here is my start line (utilizing the code and method from above):
Some simple fixes: growing determine dimension and altering labels to empty if the scale isn’t over 250 to keep away from having phrases outdoors bubbles.
fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(facet=”equal”))
#Labels edited instantly in dataset
Properly, now a minimum of it’s readable. However, why is Athletics pink and Boxing blue? Let’s add a legend for example the connection between colors and feminine illustration.
As a result of it’s not your common barplot chart, plt.legend() doesn’t do the trick right here.
Utilizing matplotlib Annotation Bbox we will create rectangles (or circles) to indicate that means behind every color. We will additionally do the identical factor to indicate a bubble scale.
import matplotlib.pyplot as pltfrom matplotlib.offsetbox import (AnnotationBbox, DrawingArea,TextArea,HPacker)from matplotlib.patches import Circle,Rectangle
# That is an instance for one part of the legend
# Outline the place the annotation (legend) will bexy = [50, 128]
# Create your coloured rectangle or circleda = DrawingArea(20, 20, 0, 0)p = Rectangle((10 ,10),10,10,shade=”#fc8d62ff”)da.add_artist(p)
# Add textual content
textual content = TextArea(“20%”, textprops=dict(shade=”#fc8d62ff”, dimension=14,fontweight=’daring’))
# Mix rectangle and textvbox = HPacker(kids=[da, text], align=”high”, pad=0, sep=3)
# Annotate each in a field (change alpha if you wish to see the field)ab = AnnotationBbox(vbox, xy,xybox=(1.005, xy[1]),xycoords=’information’,boxcoords=(“axes fraction”, “information”),box_alignment=(0.2, 0.5),bboxprops=dict(alpha=0))#Add to your bubble chartax.add_artist(ab)
I’ve additionally added a subtitle and a textual content description underneath the chart simply by utilizing plt.textual content()
Easy and person pleasant interpretations of the graph:
Majority of bubbles are mild inexperienced → inexperienced means 50% females → majority of Olympic competitions have a fair 50/50 feminine to male break up (yay🙌)Just one sport (Baseball), in darkish inexperienced color, has no feminine participation.3 sports activities have solely feminine participation however the variety of athletes is pretty low.The largest sports activities when it comes to athlete quantity (Swimming, Athletics and Gymnastics) are very near having a 50/50 break up