Power BI * Power Query
The strange “each” and “_” in Power Query
Introduction to Power Query functions for advanced users
German version of this article can be found here.
The strange “each”
The powerful M-language
Power Query is often described as a no-code or low-code tool. And basically this is not wrong, because you can get damn far in Power Query without writing a single line of code. Many functionalities are simply selected with a single mouse click and then they just work in an almost magical way.
But if you want to create really complex queries and use Power Query efficiently, you can’t get around Power Query’s powerful formula language. For each step we create via mouse click or dialog box, Power Query generates a “piece” of source code in the background, in this language called M.
We can view and edit this code at any time.
For the first steps in M, however, it is in my opinion not necessary to deal with the editor and all the amazing possibilities of M. For the beginning it is sufficient to activate the Formula Bar and to analyze from time to time which code Power Query has generated for the respective step. In this way, the basic features of M can be learned quite easily.
The formula bar allows you to quickly and efficiently make small adjustments to formulas that would have been impossible or difficult to make in the traditional way (i.e., by mouse click).
“Each” everywhere
One thing you’ll see over and over again in the formula bar is the magic word “each”.
In the image above, for example, I have filtered a table for persons of male gender (“gender” = Male). The easiest way to do this is by mouse click (right mouse click on Male).
Power Query thereby generates the code shown below.
Well, and there it is again: “each” — what’s it doing there? Why isn’t [gender] = “Male”
sufficient — without each
?
If we take a look at the M documentation, we unfortunately only find the following — unhelpful — statement.
Each keyword
The each keyword is used to easily create simple functions. “each …” is syntactic sugar for a function signature that takes the _ parameter “(_) => …”
- source Microsoft
What the hell is syntactic sugar? What is a function signature that takes the _ parameter? And again, why isn’t simply [gender] = “Male”
without each sufficient?
Table.SelectRows()
To answer these questions, we first need to go back to the Table.SelectRows()
function. Because this is the built-in M function that Power Query creates/uses to filter a table. Microsoft describes the syntax of this function as follows.
And with respect, the documentation for M is just crap and if I had a wish at this point, I would wish that someone would fundamentally revise the whole documentation. However, we will solve the puzzle anyway.
Table.SelectRows()
is therefore a function that takes as its first parameter a table to be filtered.
This first part is easy to fulfill.
In the above example, the last step was called “sorting results” and the result of this step was known to be a table, namely the table we want to filter. We therefore only need to write the name of the last step with a prefixed #
in the Table.SelectRows()
function and thus pass a table to the function.
But now let’s move on to the second parameter. Here Table.SelectRows()
expects, according to the above syntax description, a function. So we need a function as parameter of another function.
And that answers the first question: Why isn’t simply [gender] = “Male”
without each
sufficient? — Because [gender] = “Male”
is not a function.
Functions in M
Functions are, somewhat roughly, a block of code that can be executed repeatedly.
We know functions from mathematics: y = x²
or from other programming languages like JS: let y = (x) => x ** 2
or function powerTwo(x){return x ** 2}
.
A function typically has one or more input values (so-called parameters). In addition, a function has a return value (a result). Ideally, the result will depend on the parameters. In the above cases, the function has the parameter x, and returns the result (return value) x².
In M, functions have the following syntax:
(Parameter_1, Parameter_2, ...) => do_operations_and_return_the_result
z.B.
(Number_1, Number_2) => Number_1 > Number_2
That means, we define the parameters of the function in the brackets. So in the example above Number_1
and Number_2
.
After the arrow (=>) comes the function body, in which we determine the return value. In the above example, the return value is a Boolean value (true/false). This is because the function checks whether Number_1
is greater than Number_2
and returns true or false accordingly.
In many cases in M we will have to deal with only one parameter. For example, if we want to compare whether our input value is greater than 5.
(myNumber) => myNumber > 5
In these cases, instead of (myNumber) => myNumber > 5
, we can also express the one and only parameter as “_” and we end up with the following function.
(_) => _ > 5
By the way, this is the function quoted above, which has a function signature that takes the _ parameter “(_) => ….”. (As I said, I wish the documentation for M would be fundamentally improved).
And that brings us right to syntactic sugar. Syntactic sugar is a simplification of the syntax in certain cases.
If we are dealing with only one parameter, as here, the syntax of M allows us to choose a different/simplifying notation.
each _ > 5
That means, we can simply omit the function header (the brackets and their content) and instead use each
to directly access the implicit (single) parameter _.
(myNumber) => myNumber > 5
(_) => _ > 5
each _ > 5
All of the aforementioned notations do exactly the same thing: they take a number, compare whether it is greater than 5, and return either true
or false
.
What is this good for? In more complex and nested functions, this notation can significantly increase the readability of the code.
The search operator
We are not yet finished with our simplifications. A function in Power Query can receive not only numbers or text (strings) as parameter. In fact, with the built-in M function Table.SelectRows(table as table, condition as function)
, we have already seen that a whole table or even a function can be a function parameter.
Besides whole tables, we will often have to deal with table rows in M. That means that we often need functions that take a row as a parameter.
(row) => do_something_with_row
Now let’s assume further that the table has a row “Price”, so each row has a field with the name “Price”, which we want to increase by 5 and return this as a result. How can we access a field of a row?
By using the so called search operator: [ ]
or to be more specific: line[fieldname]
.
So our function should look like this.
(row) => row[Price] + 5
Armed with our knowledge from the sections above, we can simplify this function.
(_) => _[Price] + 5
O.K. let’s simplify it even more by using each
each _[Price] + 5
Now, in combination with the search operator, the simplification goes even further. If we have a single parameter that is a row, we can directly access a field of this row with the search operator. Thefunction then looks like this.
each [Preis] + 5
Everything fits together
At this point, admittedly, we now somehow have a lot of loose threads of information in our hands that we need to tie together. Unfortunately, we are still missing one piece of information. And that brings us to a problem that basically runs through the entire documentation of M: The documentation is incomplete and misleading!
We can only guess what is meant by “condition”. But if we think about what is actually happening here, we will also find this last piece of the puzzle.
Table.SelectRows()
is, as already stated several times, a function that requires a table as the first parameter; the table that should be filtered. The function now walks through that table row by row.
For each row, Table.SelectRows()
then executes the second parameter, which is our function (yet to be found). This function is then passed the current row as a parameter. So our yet to be found function must accept a row as parameter. Finally, our function to be found must contain some search criterion, and return true
or false
if the search criterion is found or not. If false
is returned, the current row is ignored by Table.SelectRows()
. If the result is true
, the row is added to a new table. In this way, Table.SelectRows()
finally returns a new filtered table after all rows have been processed.
So how should our filter function look like?
- Parameter must be a row:
(row) =>...
- Inside the function we need to access the field “Gender”:
(row) => row[Gender]
- The result of the function must be a boolean value:
(row) => row[Gender] = “Male”
- For functions with only one parameter, we can use the _ notation to simplify:
(_) => _[Gender] = “Male”
- Functions with a _ parameter can be simplified with
each
:each _[Gender] = "Male"
- We can omit the underscore when using the search operator in the context of
each
:each [Gender] = "Male"
- Thus, the result is:
Table.SelectRows(#"sorting rows", each [Gender] = "Male")
O.K. and now one last time all together:
Table.SelectRows()
takes the table from the step #”sorting rows” and runs through this table row by row. For each row it calls the function each [Gender] = "Male"
, passes the current row as parameter, looks for the field “Gender” in this row and checks if the value in this field is “Male”. If the value is “Male”, the function returns true
to Table.SelectRows()
and Table.SelectRows()
keeps this row in the filtered table. If the value is not “Male”, the function returns false
to Table.SelectRows()
and Table.SelectRows()
makes this row not appear in the filtered table.
And so we finally know what each
and _ mean, what functions in M look like, and why all the variants below are completely identical and equal. Try it out!
Notes: If someone is coming from a programming background, he might criticize this article as sloppy, because I didn’t exactly define the term function, because I didn’t go into optional parameters, because I used the word “parameter” even where the word “argument” would be correct, because I was otherwise linguistically unclean with regard to quite a lot of other terminology around the topic of functions. I know! And yes, this was intentional. Many beginners in Power Query do not have a programming background, but a background in Excel. And for them, I didn’t want to add more complexity than the topic already brings.