Katharina Eickhoff is a data consultant at Inviso by Devoteam and life long supporter of Schalke 04
While Tableau is a great tool for visualising data to gain better insights into business processes, it can also be used outside of a business context.
In this blog, I want to share how I used Tableau to create an overview of the different seasons of the German 1st Bundesliga. As I am really into both football and data, I wasn’t really satisfied with the overviews provided by the football media, so to get a better understanding of the teams’ performance and find patterns within the data, I decided to build my own.
The first thing I had to do was acquire the football data and convert it to the right format for my visualization purposes. This process is described below.
Disclaimer: If you are not interested in the nitty-gritty technical details, just jump to the video at the end of this article or check out the Tableau dashboard online.
Football Data Acquisition and Preparation Process
API-Football and Python
The goal of the data acquisition and preparation is to give a good overview of the German 1st Bundesliga. For this purpose the data is acquired from the web API provider API-Football. They provide detailed information about teams, matches and players with data as far back as 2008. I decided to use three different APIs to acquire Fixtures, Standings and Top Scorer data.
To access the data I used Python packages including requests, pandas and json. When calling the API with GET the data is returned in JSON* format with nested objects. The football data is explored and processed to get a dataset with a good set of attributes to analyse the German 1st Bundesliga. I loaded the prepared and extracted data into pandas data frames (table) to store them in csv files to prepare the data in Alteryx.
Alteryx
With the help of Alteryx, a low-code data science software for (automated) advanced analytics and data manipulation, I cleaned the data and created new variables. The different files are merged and columns that are duplicates, not needed, or have only NULL values are excluded. The file for the top scorers shows the 20 best scorers based on the amount of goals shot per season. The counter, however, starts from 0 and not from 1. The Multi-Row Formula tool is used with the expression [Place]+1 to add 1 to each Place. Furthermore, ranks of the football clubs for each match day should be shown to better understand the dynamics during the season. The data does not include the rank of the team for every match day. Therefore, the rank must be calculated for each team for every match day. To do this I used the fixture data and various transformations and calculations. Based on the German 1st Bundesliga rules the points are calculated and summed up, leading to the rank (1-18) for each match day. The below graphic shows a high level view of the transformation process including the data visualization part that is described next.
Data Visualization with Tableau Desktop
Developing the Tableau Dashboard
With the acquired and prepared data I developed two different football dashboards. One with an overview of the seasons and a more granular view on the team level for the seasons 2010/11 to 2020/21.
The first dashboard gives an overview of the 34 match days with important KPIs like Home and Away goals, and the different ranks per match day. To better distinguish the teams on each rank the club logos are used as shapes. This dashboard allows me to see the dynamics and patterns of how ranks change during the season. For instance, in every analysed season home teams score on average more goals than away teams.
The second dashboard gives a more detailed view on the team level per season. To get a better understanding of the football team’s performance KPIs like rank, total points, average goals and the outcome of the last five games are included. The visualizations show the (1) match outcomes, (2) the points gained and the rank during the season, (3) the goals scored and against, and finally a (4) detailed table with every match outcome.
In the following video I’m describing the dashboard in detail:
*JSON, also known as JavaScript Object Notation, is a language independent and human-readable data format for data exchange. JSON is lightweight because of less meta-data, which makes JSON significantly faster compared to XML.
Your Success Starts Here
Partner with Devoteam to leverage award-winning tech expertise, agile execution, and a culture of continuous learning.