How I build my own Covid-19 Dashboard with Power BI Desktop

A practical introduction to data analysis and visualisation in Power Bi for everyone

Image for post
Image for post
Collage of photos by Carlos Muza on Unsplash and by CDC on Unsplash

At first I have to admit that I am not a data scientist, computer scientist or programmer. I am just a tax advisor. It may therefore be surprising that I am writing about a data analysis tool, after all my job is more of a legal one, right?

And maybe you also don’t work as a data scientist, programmer, or IT department in your company. So why bother with Power Bi — after all, you have Excel, don’t you?

At least that’s what I thought for a long time too.

However, over the years I have been more and more confronted with evaluating, analyzing, aggregating and visualizing larger amounts of data. This can happen regularly e.g. for monthly, weekly or quarterly reports or ad-hoc —occasionally tax authorities, auditors or management come up with the idea quite suddenly that they need certain information, reports or forecasts. So I’ve to deliver.

The challange we’re facing in such situations is: much of our data comes from completely different sources (e.g. SAP, investment management programs, sub-ledgers, public sources) and in completely different formats. Sometimes it comes as an Excel file, often as a CSV or TXT file, occasionally in JSON or XML format or sometimes as a simple HTML page.

I know that I am not alone with this problem. The situation is similar in many jobs. We have to deal with it, even without being a trained data scientist or programmer.

And far too often the solution is seen in Excel (or at least what I call the old Excel world). With a lot of effort, copy and paste orgies are organized, gigantic monster formulas are fabricated, Excel is misused as a database and VLOOKUP is heavily used. Stop doing this! 💥

There are easier ways to do most of these tasks quickly, robustly, maintainable and reusable.

And this is where Power Bi Desktop comes in.

With the following article I want to show some quiet interesting aspects of Power BI Desktop, I want to show that it is also very easy for non-computer scientists to efficiently query, evaluate and visualize large amounts of data.

I further want to demonstrate that you can come very far without knowing anything (or at least only a little) about coding, just by using the right tools.

However, the article is not ment to be a tutorial. So it doesn’t matter if you don’t understand some things right away. It’s about demonstrating what Power Bi is and what the basic ideas are.

Let’s take a practical, current scenario that basically everyone can understand and that everyone should be confronted with. The Covid-19 pandemic.

Here is the short background of what I did and what we will do in this article:

Image for post
Image for post
Photo by Jonathan Brinkhorst on Unsplash

In early spring my wife and I planned to spend our summer vacation in Sweden. One of the most beautiful countries in Europe! Forget France or Italy!

The problem was that the number of infections in Sweden was suddenly very high compared to the rest of Europe.

The second problem was that the countries of the EU had agreed to put travel warnings into effect if a country reports more than 50 new infections per 100,000 inhabitants in the last seven days — so called incidence group and incidence value.

And that’s exactly what I wanted to evaluate and what I want to explain in this article. I was looking for a way to get a quick overview of the infections in Europe.

The goal was/is the following dashboard with Covid-19 data:

Image for post
Image for post
Image for post
Image for post
WPhoto by Ashim D’Silva on Unsplash

As already mentioned, even if we are not data scientists, in many jobs we are more and more embarrassed to process larger amounts of data.

However, the less we have programming knowledge or the larger our company and the more restrictive the IT department, the fewer tools we have available for these purposes and the more important it becomes to know the right tool.

It is clear that not everyone can master Python, Julia or other programming languages alongside their job.

And as many VBA macros show, it is not advisable when novices half-heartedly try to put together code snippets for their business processes.

And so it unfortunately happens again and again that the analysis and processing of data takes place in a highly inefficient way in Excel. Mostly with the same methods as 20 years ago and often with stupid repetitions, every week, every month, every quarter …

And that’s where Power Bi Desktop comes in. Because Power Bi Desktop offers the following advantages:

  • the tool is free
  • it is a tool that every IT department can release without having to release and configure thousands of packages, package managers and virtual environments etc.
  • Power Bi makes it possible to read in data from many different sources even without programming knowledge
  • Power Bi can transform data without (much) programming knowledge
  • We can do repetitive tasks again at any time just by clicking on “Refresh”
  • We can join and combine data simply with a click
  • Power Bi is partially self-documenting
  • Power Bi never changes our original data
  • Power Bi is highly customizable
  • Power BI files are easily portable
  • We can create dashboards and interactive reports
  • The knowledge from Power Bi Desktop also helps us in Excel, because Power Query and Power Pivot are deeply integrated in the new Excel versions and at the same time both are the heart of Power BI Desktop; we can also reuse this knowledge if we want to work with the Power Apps platform
  • Oh, and did I say you don’t have to code? Yep, but if you do want — yes you can! M and Dax are naturally integrated in Power BI and Python and R can be easily integrated.

Talked enough, let’s begin!

Image for post
Image for post
Power Bi guides you in importing data

Importing data into Power Bi is very easy. After opening Power Bi, it is practically impossible not to find the data import options. Here is a brief overview of the most important ones.

Image for post
Image for post
data import options (only a few)

In particular, we can retrieve data from

  • Excel
  • Text/CSV Files
  • SQL Server
  • Folders, so we can combine multiple files
  • the Web

If you are interested, I can introduce these possibilities in more detail in future articles. And of course there is a heap of more import options available.

So if you’re new to Power BI, the first big takeaway is that Power BI is great when you need to import data from multiple or different data sources.

For our Covid-19 statistics I will use two data sources. The first one is published by the EU and can be found here.

Image for post
Image for post
Covid-19 data published by EU

So we could download the file and import it into Power Bi. The whole thing would be almost self-explanatory.

However, we don’t want to download a new updated CSV file every day, we want to automate it. So we need a link that I got this way.

Image for post
Image for post
Copy Link-Adress (screenshot is German but I think you’ll get it :-) )

Our first link to use in Power BI soon is:

https://opendata.ecdc.europa.eu/covid19/casedistribution/csv

And that’s the second important thing to know if you’re new to Power BI. It’s always about automating. Everything you do now or later with your data in Power BI can be repeated at any time, simply by clicking on the “Refresh” button, even if the data from the data source changes completely.

Since I originally created the Covid statistics with a view to my planned vacation in Sweden, I wanted to evaluate the data from the Swedish authorities, too. I hoped (and was true) that Swedisch statistics would also include details about the different Swedisch regions. Further, I was curious to see if there would be any significant difference to the EU figures.

For this reason, we will also use the Swedish data in this article. With the help of the Swedish data, I can also better show later a different way to work with data in Power BI.

Finding these statistics is not that easy, we have to ask Google for “Sjukdomsfall per dag JSON”, which (according to my rudimentary Swedish knowledge) should mean something like “Disease cases per day JSON” and should give us the data as a JSON file. I once found the term “Sjukdomsfall per dag” in an article on a Swedish website — so sometimes you have to search a little. 😃

The result is not a JSON File but CSV File again:

Image for post
Image for post
click on CSV button
Image for post
Image for post
copy the link

In this way we now have the second link for our queries: https://free.entryscape.com/store/360/resource/15

To get the data from the web into Power BI, we need one query for each source. To do this, I suggest to proceed as follows.

  1. Step
Image for post
Image for post
Get data -> Web

2. Step

Image for post
Image for post
paste link to EU data

As you can see in the following picture, Power BI allows us to make numerous settings for web queries. I also think, that you can see how easy it is to grab data from a source. Power Bi will mostly guide you.

Image for post
Image for post
Options for web queries
Image for post
Image for post
preview of our data — import settings

If our query was successfull, Power BI shows a preview of our data in the next window.

We can see here that the EU is also providing statistical data for other countries, cool.

But what is much more exciting is that Power Bi has already structured the huge mess of characters (see screenshot A-1 below) from the csv file for us here.

Power Bi guessed the encoding and delimiter. But of course you can adjust that here (or later). In this case I set encoding to UTF-8.

Image for post
Image for post
Fig. A-1 — the original csv-file

Then just click on “Transform data” (not load) ❗️ and the magic begins, the Power Query Editor opens.

Another point to keep in mind if you’re new to Power BI. Power BI makes it very, very easy to import data from CSV or TXT files. Think about how time-consuming this can be in Excel.

An important part of Power BI is the Power Query Editor, which does all of the import and transformation work.

Image for post
Image for post
Power Query Editor in Power BI — just powerful :-)

Power Query is a very powerful tool (that you will also find in many newer Excel versions), and I will go into this in more detail in later articles. For the moment we want to be satisfied with the following absolute basics.

In Power Query, many transformations can be done just by clicking. In our case, we’ll first simply remove the columns that we won’t need.

You can see in the context menu (below Fig I-1) that Power BI offers many options here to work very productively. We can apply one command to several columns at the same time, or we can apply one command to all other columns except the selected ones. 👏

Image for post
Image for post
Fig. I-1

Let’s also rename the columns, this is quite intuitive. Just click the header!

Image for post
Image for post
Renaming the columns

In the below figure I’ll quickly show you some really cool and helpful features. We can display information about our columns at any time, in particular whether our data contains errors or duplicates. Esp., if you have to work with unknown data, this can be very helpful to get a first impression.

Image for post
Image for post
data about data…

We can (and should) dispaly the formula bar. The bar shows us nothing less than the function (in M language) that each of our commands triggered in the background.

By the way, all the magic becomes visible when we display the Advanced Editor.

Image for post
Image for post
opening the advanced editor
Image for post
Image for post
advanced editor

You don’t need to understand this Advanced Editor right now. In fact, you can get very, very far in Power Bi or Power Query without ever opening this editor or worrying about the M syntax. Keep in mind: you can code, but you don’t have to!

I just wanted to show what’s going on behind the scenes here.

Everything we do with our data with a click in the Power Query Editor is saved in the background as M code, that we can inspect and change in the Advanced Editor.

In this way, data queries can be reused very easily, are traceable, maintainable and partly self-documenting.

Image for post
Image for post
applied steps

So that this works very well without us having to go into the Advanced Editor, Power Query has a bar on the right-hand side that also shows all the steps that have been applied to our data so far. It’s practically a visual implementation of the code. 🆒

We can use it to look at the result of each step at any time, we can delete, edit, move steps, give them meaningful names and we can add comments. Our original data gets never ever manipulated directly!

Image for post
Image for post
many transformations out of the box

I can only encourage you to try out the numerous commands in the “Transform” tab. An incredible number of things can be done with just a few clicks.

Image for post
Image for post
Overview: Power Query Editor

Back to the task.

We have now loaded the EU data into the Power Query Editor and adapted it a little bit to our needs. The full power of Power Query is — as mentioned — worth a small series of articles, but this should be enough for an initial understanding.

As can be seen in fig. W-1, I have given our query a name on the right, which is then immediately displayed on the left in the overview of all queries.

At this point we have already achieved great things. We can open Power Bi tomorrow or the day after tomorrow and would just press “Refresh” and the program would independently retrieve the data from the web and process it exactly as we have specified here. 👏

Image for post
Image for post
Fig. W-1

The last thing to do in the Power Query Editor is to press “Close & Apply”.

Image for post
Image for post
Back in the main Power Bi window, now with imported data

As a result, our query is now finally added to our Power Bi project and is available as a table in the main Power Bi window.

Of course, we can always go back to the Power Query Editor to adjust our query if necessary.

What is going on here cannot be fully explained at this point, the article otherwise would degenerate into a book. The main important thing to note at the moment is that we can see our queries on the right-hand side and that we can receive information about the fields (columns) there. We see that Power BI has recognized numerical values ​​and we see that Power Bi has found a date column in our data.

We can now use this data and/or the data fields to carry out all possible visualizations, joins or other evaluations.

Btw. we just imported 56,361 rows and cleaned them a little with only a few clicks! 👏

By clicking the “Data” button on the left, we can preview our data and start modeling on them.

There is one point that is important to understand. For data scientists and programmers this might be obvious, but I would like to emphasize this again for many users coming from the Excel world.

We should not try to edit individual values ​​here — usually we even cannot. We should not try to enter individuel formula in rows — we definitly cannot.

Power Bi is data modeling. Power Bi exists to define abstractly how our data should be processed. Power Bi is not Excel! A formula for a calculated column has to fit on every row, in any case Power Bi will use it in every row.

Image for post
Image for post

By clicking on “New Column”, we can now create calculated columns.

What comes next feels a bit like Excel. The formulas that we can enter here are so-called data analysis expressions (DAX) and in many cases are very close to the Excel syntax.

What we could do to warm up would be to create a new column “test” and add the number of new infections and deaths. It has no statistical worth, but it demonstrates the basic principle.

Image for post
Image for post
first time unsing DAX

The main difference to Excel is that we do not have cell references, but enter column names in our formulas. The name befor [ is the table/query name, the name between [ and ] refers to the column name. In this simple case the short version would suffer:

test = [deaths] + [new_cases]

Even conditional evaluations are similar to Excel:

Image for post
Image for post
Looks familiar, doesn’t it

For newcomers, DAX is the second important component of Power BI. After transforming our data in Power Query (with M language in the background), we use DAX to analyze and evaluate our data.

DAX is a very extensive collection of functions as we know it from Excel. With DAX, for example, we can aggregate our data very quickly or carry out year-to-date evaluations.

By the way, DAX is also available in newer Excel versions as Power Pivot.

However, I have to throw you in the deep end and start right away with a rather complex function.

What do we want to achieve now?

We want to compute the so-called 7 day incidence for each day and each country, i.e. the number of new infections in the last 7 days. If we wanted to do this manually or with Excel, we would have to:

  1. sort our data by country
  2. sort the country data again by date
  3. we would have to sum up each days (rows) number of new infections with the number of new infections from the previous 6 days.

Without sorting, the task would be practically impossible to solve in Excel. But even with sorting, it gets nasty when our data is updated.

In Power BI we don’t want to sort (and would have to go back to the Power Query Editor to do so), we just don’t want to be dependent on the order of our records at all.

Even in Python we would have to write quite a bit of iterations and / or sorting algorithms here.

In those cases — where we have to access and work with the rest of the data in every row, Power BI with DAX is extremely strong.

Below you can see the formula and the result.

Image for post
Image for post
Calculate and Filter — the universal weapon of Power Bi

And here is the function in case you want to copy & paste it.

infections_last_7_days = 
var currentDate = EU_Data[reportingDate]
var currentCountry = EU_Data[country]
Return CALCULATE(
SUM(EU_Data[new_cases]),
Filter(
EU_Data,
EU_Data[country] == currentCountry &&
EU_Data[reportingDate]<= currentDate &&
EU_Data[reportingDate] > currentDate-7
)
)

What stands out here compared to Excel is that we get syntax-highlighting, line numbers and auto-complete out of the box. With the Alt key you can even set multiple cursors and revise formulas extremely quickly.

What happens in the formula might be confusing but is quickly explained.

Basically, Power BI only ever accesses the current row for calculated columns. So only the respective values ​​of the current line / current record can be part of the calculation.

But we need the values ​​of the last 7 days for the respective country, so our formula has to evaluate other lines. For this we need access to other lines (records) — in Power BI language: we need a different context.

This is basically the only hurdle you have to take in Power Bi: Understanding the context principle of Power Bi.

Many functions in PowerBi allow you to use a different context, i.e. accessing other data or parts of it. So does the Calculate function used here.

The function performs a calculation over all rows that are found on the basis of a filter.

The calculation here is Sum([new_cases]) and just sums up all [new_cases] fields that our filter has left.

The advantage is: we don’t have to force our rows into a dedicated sorting order and we do need to copy any formulas — as in Excel. The Calculate function simply goes into each line, then filters all other lines according to the filter entered and finally executes the Sum([new_cases]) function and assings the result to the calculated field of this line.

The filter must be dynamic and not a static expression (in our case), since we only ever need the data records from one country and from the last 7 days. Therfore the filter function is the function of choice. In simple cases we could have simply entered a filter condition, but with the filter function we have more power.

Specifically, I filter for each row in the entire table for all data records in which the country corresponds to the country of the respective row and the date is in the last 7 days.

I am sure there are even more elegant solutions, nevertheless, on this basis we get our calculated column without any problems, which we can use for further calculation steps.

Please don’t be shocked or frustrated if the formula is not very clear at the moment. I promise that Power Bi is much faster to learn than programming. And if you can do reasonably well with Excel, you will understand Power Bi quickly.

Image for post
Image for post

Above I divided the new infections by the number of inhabitants. For this purpose there is the terrific function Divide, which solves the old “Division by 0 problem” in Excel by allowing us to declare a “fall-back value”.
In Excel this would have resulted in an if function:

if(population = 0,0,infections / population)

In Dax this would work as well, but easier is:

divide(infections,population,0)

Measures are one of the most powerful features of Power BI. They’re a bit like calculated columns, except that they’re much more dynamic to use.

The topic of measures should also be reserved for a separate article. To get an first small idea about measures, look at the below screenshot. It shows some predefined often needed quick measures.

Image for post
Image for post

Measures ultimately give us the opportunity to analyze our data independently of the calculated columns and depending on a dynamic context.

In my covid statistics I don’t need measures. I just misuse measures here as global constants. And only for the sake of clarity, I organize the measures in a separate table. This is how it works. But you can ignore this, if you’re total new to Power BI.

  1. step:
Image for post
Image for post

2. step:

Image for post
Image for post

3. step:

Image for post
Image for post

Here we can now create a new measure:

Image for post
Image for post
creating a new measure
Image for post
Image for post
measure formula

We will name the measure incidence group and assign the fixed value 100,000 to it, so we can easily react should the governments once define a different reference quantity, i.e. no longer count per 100,000 inhabitants. Otherwise we would not have needed a measure, but could have entered 100,000 hard in our calculated columns, the measure is only used to keep the dashboard more maintainable.

Image for post
Image for post
calculated columns by using an external Measure

In our data table we have access to the measure and can calculate the new infections per 100,000 (or whatever value we have stored in the measure).

Basically that was all! For every day and every country we do now have the relevant value that European governments use (better to say: used in early summer 2020) for their travel warning decisions.

What comes next is freestyle.

Image for post
Image for post
Photo by Sharon Pittaway on Unsplash

Later in the visuals I want to highlight in color whether we are above, below or near the critical point of incidence value 50. Therefore, I also store the critical number of 50 as a measure.

Again, this is a little misuse of measures, but it helps! Should this value once be adjusted, we are be able to easily adjust our statistics, too — in just one place.

Image for post
Image for post

In a calculated column I now define a color values as hex codes.

0–40 new infections = green = #27f27f

40–50 new infections = yellow / orange = #f2a427

> 50 new infections = red = #eb345b

At this point, too, I have to admit that the following function is not beginner-friendly. But if you have ever created a nested “if” in Excel, you would intuitively come to a correct solution, because I could have done that here, too. However, the switch function seems clearer to me in the long term.

Image for post
Image for post
Color = SWITCH(
TRUE(),

[new infections per incidence group last 7 days]
>[critical incidence value],"#eb345b",
[new infections per incidence group last 7 days]
< [critical incidence value] &&
[new infections per incidence group last 7 days]
> [critical incidence value]-10, "#f2a427",
[new infections per incidence group last 7 days]
< [critical incidence value]-10, "#27f27f"
)
Image for post
Image for post

For time series it is usually a good idea to have a separate date table.

We can derive this dynamically from our data as a new table with the following formula.

Image for post
Image for post

These are also two important points. With Power BI, it’s very easy to work with dates. The second point is that we can always derive new ones from our data tables.

Power BI lives from the relationships between our data. Therefore I will now create one between the new date table and the data from our EU statistics. You see how easy it is to define relationships between data? No SQL joins, just clicking.

Image for post
Image for post

Relationships can also be displayed visually and created or changed by drag and drop (image below from another project).

Image for post
Image for post
Image for post
Image for post

Visuals are next great strength of Power BI. Visuals are used to represent our data graphically. In fact, our brains are still very good at recognizing patterns and relationships in such graphics.

Selecting data for the visual is easy.

Image for post
Image for post

Note: It is important to select the correct setting for the date, otherwise (Date Hierarchy) Power BI tries to group according to months, years, etc. This is often a cool feature, but not here!

Image for post
Image for post
Image for post
Image for post

Up to this point, Power BI unfortunately sums up the new infections of the last 7 days per 100,000 inhabitants of all countries and outputs them on a daily basis.

That’s not what we want.
Wouldn’t it be nice if we could select a country?

OK, we that’s easy. Let’s create a data slicer and see what happens.

Image for post
Image for post

Cool or cool? 🆒

But….

…wouldn’t it be nice if we could narrow down the reporting date? OK. Let’s do that too. We just create a second slicer and now it’s time to use our date table.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

The cool thing: Whenever we want to update our evaluation, we only need to click on Refresh and we always get the latest values. Everything that we have set so far just runs in the background. Starting with querying the data, clearing up unneeded columns, calculating columns, determining the color code and displaying it as a visual, all of this now runs automatically.

Otherwise there are no limits to our creativity in terms of colors. For my Sweden tab i used the yellow and blue similar to the Swedish flag.

Image for post
Image for post

Visualize on a Bing map #1

Image for post
Image for post

Visualize on a Bing map #2

Image for post
Image for post

Schwedische Daten

Trust is good, control is better

OK. now we can pull the data from the Swedish authorities for Sweden. For this we create a new query and just import it, as descibed for the EU data above.

Image for post
Image for post
the swedish data

We can easily create new tables from existing tables in Power BI. DAX enables us to do this, so I will also briefly show it here by isolating the most important data with the SELECTCOLUMNS function.

I could have removed columns that were not needed in the query, but I might want to do an analysis by region again, so I have largely adopted the original data.

Image for post
Image for post
Image for post
Image for post

Above I created a calculated column again, the Todos are similar to the EU data.

Image for post
Image for post

But if we want to divide by the population here, we will find that we are missing this number. We can look it up in the EU data. In Excel, VLOOKUP would be the method of choice, DAX has a similar but better function.

Image for post
Image for post

One advantage of Lookupvalue is that we can search for multiple search criteria. Here we are looking for the country “Sweden” and the date.

Of course, we quickly establish a relationship to our date table.

Image for post
Image for post
Image for post
Image for post

To use the visual that we had already created, we can simply duplicate the page with the visual of the EU data and then adjust the data fields and slices a little.

And here is the result.

Image for post
Image for post

We can easily display the data from the EU and Sweden in one visual to compare them.

Image for post
Image for post
compairing data provided by EU and by Sweden

We can change the diagram type just as quickly in order to have the critical value displayed as an additional line. Here with Italy as an example.

Image for post
Image for post

Give Power Bi a chance, it might come in handy.

If you liked my article, please let me know. If you’re interested in smaller but more detailed articles about Power BI, please let me know too.

But most importantly: stay healthy

Tax consultant and IT-Nerd

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store