Microsoft Power BI Series 101 — Extract, Transformation and Load (ETL)

Shachi Kaul
5 min readDec 15, 2022
Photo by Iñaki del Olmo on Unsplash

Power BI is a Business Intelligence tool offering reporting and analytical capability. It could be defined as:

The Power BI Platform is a Microsoft cloud solution with various software services, connectors and apps that help to consolidate and ingest data from multiple unrelated sources, perform various pre-processing, transformations and finally utilize them to deliver useful reports.”

It comes with various services such as PowerBI Desktop, PowerBI Service(Web App) and Mobile PowerBI. This blog covers the ETL ways to be done via the Web App and Desktop version. Check this page for more elaborated info.

What is ETL?

ETL is abbreviated from Extraction, Transformation and Load. A process of ingesting (extraction), transforming and then loading the transformed data into a warehouse. So basically moving data from any kind of source to the destination.

Source

Extraction: Data in varied formats of relational (RDBMS), NoSql, text, files, XML etc can be extracted by using Power BI connectors.

Transformation: Process of converting the extracted raw data into a certain standard that can be used for creating a data model and reporting.

  • Cleaning, sorting and filtering operations
  • Feature engineering and selection
  • Bringing a column to a specific standard

Load: Transformed data is loaded into the data warehouse

Is Power BI an ETL?

Microsoft Power BI is not an ETL; it’s a BI tool that follows the ETL methodology. It’s a Business Intelligence and Analytical tool that creates interactive reports and dashboards.

So, after the loading activity, a destination in Power BI is not a warehouse that is there in an ETL process, instead, it’s always a Tabular model. It implies that the loaded data will always be in a structured format. So, loading data creates a dataset in PBI.

ETL in Power BI

The data extraction phase is achieved via Power BI API connectors such as SQL Server, Text/JSON, Excel, Azure SQL Database etc.

Power BI provides some ETL roles and features:

  • Can connect from varied data sources
  • There is a separate visual editor to practice the data transformation
  • The transformation steps done visually are written into an M query language; can be seen via Advanced Editor

Power BI, on the other hand, using Dataflows and Power Query, can only move the data to Power BI, creating datasets. Let’s see the practical steps for both ways.

  • ETL with Power Query Editor (Power BI Desktop)

Power Query maintains a step-by-step record of every action taken to ingest, transform and load data. This enables us to do a wide range of transformations right from removing necessary columns to adding up new ones with some transformations in columns.

Implementation

  1. Clicking on “Get Data” pops up a window asking for choosing a type of Power BI connector and fetch the respective source data

2. After loading the data, you will be navigated to the “Power Query Editor” window where you can perform various transformation techniques

Each of these transformation steps is captured as in steps under the Applied steps pane located toward RHS

Also, all these applied steps are recorded in M Query language which can be seen in Advanced Editor (highlighted in the image below)

  • ETL with online PowerQuery Editor via Power BI Dataflows (Power BI Service)

Dataflows is another way to ingest your data; it’s a simple data pipeline that connects to the data source such as SQL Server DB, Sharepoint file etc. It can’t be created in the Desktop tool, instead only the cloud-service tool inside your workspace. But after creation, it can be utilized in the Desktop app too. To avail the Dataflow services, you need to have a premium license.

Where the data from dataflows are stored?
It is an independent data transformation item of your report. The tables out from dataflows are stored either in the internal Azure Data Lake Storage Gen2 (ADLS) or Dataverse.

Dataflow is the data transformation in the cloud independent of the Power BI dataset.

Schedule the refresh of dataflows so as to keep the report updated with the source data.

Merits

  • It becomes difficult when we want to have the same transformational logic from a report in another one. Resuing the steps isn’t yet available! Here, Dataflows could be created having these steps and can be directly imported.
  • Whenever new logic is created, dataflow just needs to be refreshed.

Implementation

As said earlier, it can be only created via the PowerBI service.

  • Go to your workspace and select the DataFlow option
  • There are 4 ways to create a dataflow but for the showcase, go through the “Define new tables” option.
  • Select the appropriate data source. I have chosen Excel workbooks stored in my OneDrive.
  • Clicking on the Next above will hop into the below image where load the data.
  • Also, you can add more transformation steps here such as filtering rows or other processing.
  • Save & Close the window. Give a name to the dataflow which resides in your workspace now.

The transformation steps created via an Advanced Editor are stored in a dataflow. These steps can be applied to any report by simply importing this data flow.

References

Feel free to follow this author if you liked the blog because this author assures you to back again with more interesting Data Science related stuff.
Thanks,
Happy Learning! 😄

--

--

Shachi Kaul

Data Scientist by profession and a keen learner. Fascinates photography and scribbling other non-tech stuff too @shachi2flyyourthoughts.wordpress.com