BUSINESS IMPACT

Jul 09, 2020

Monitoring Azure Data Factory v2 Using Power BI

Evan O'Neill Posted by Evan O'Neill

Azure Data Factory (ADF) is one of the most useful services in the Microsoft Azure modern data platform. Using ADF’s seamless user interface, you can design and orchestrate complex data movement and transformation from just about any source. This is extremely valuable for data-rich companies that need to manage large ETL processes in both the on-prem and serverless spaces. One of the challenges of scaling ADF for larger implementations is monitoring its execution. The larger your process becomes, the harder it can be to monitor.

monitoring-data-factory-with-powerbi-2

Now don’t get me wrong, there have been noticeable enhancements from ADFv1 to ADFv2 in terms of monitoring features, however the UI still has its limitations. First, you can only see 100 items per page, which is great, unless your ETL performs multiple actions on hundreds of tables (as is usually the case). Secondly, wouldn’t it be nice if you could filter through more than just timeframe, status, or pipeline name? What if you wanted to find pipelines with long execution times, create aggregates, or drill down and slice using even more parameters? Lastly, you might be able to get by with simple pie or Gannt-style bar charts, but what if you wanted to add your own visuals to see the executions in a way that makes the most sense to you?

My solution is to leverage the flexibility and simplicity of the comprehensive Power BI business analytics suite to create a fully customizable report for monitoring your ADFv2 processes. This post will show you how this can be accomplished using the Azure Data Factory v2 REST API to query your data factory via Power BI.

Prerequisites

To make API calls using Power BI, you will need to create a Service Principal App with Contributor Access, then authenticate to the ADF Service using the App’s ID and Secret Key. There are two ways to create the Service Principal and I will go over each of them. The first is to use the Azure CLI:

  1. Execute the following command in the Azure CLI to create the Service Principal:

az ad sp create-for-rbac --name ServicePrincipalName

  1. Your new Service Principal will be created with Contributor access by default. If you would like to check the role assignment, you can run the following command in the CLI:

az role assignment list --assignee APP_ID

It is quite possible that your personal account does not have the permission to create a Service Principal at the subscription level. In this case, the second approach is to create an App registration in the Azure Active Directory UI, then assign it to a role at the resource group or ADF level (as long as you have Contributor access to these resources):

  1. Navigate to Azure Active Directory for your subscription.
  2. Select App registrations to create a New registration.
  3. Choose a Service Principal name (you don’t need to worry about the redirect URI in this case).
  4. Click Register.

After the application and Service Principal have been created, you need to manually assign a role to the application. Assuming you don’t have access to assign the role at the subscription level, let’s assign it at the resource group level.

  1. Select the resource group that houses your ADF instance.
  2. Navigate to Access control (IAM).
  3. Choose + Add at the top, then Add role assignment.
    1
  4. Select Contributor from the Role dropdown.
  5. Search for the Service Principal you created in the last section.
  6. Click Save and the permissions should be set up just as they would be if you used the CLI method.

Regardless of your method for creating the Service Principal, you will need to give it a new application secret to allow it to make API calls programmatically.

  1. Return to App registrations under Azure Active Directory.
  2. Select Certificates & secrets then choose New client secret.
  3. Set a description and duration for the secret.
  4. Click Add.
  5. Once the secret has been generated, it is important to copy it to your clipboard immediately and keep it on hand because we will need it for the next section.
    2

NOTE: It is also a good idea to store the new application secret in Azure Key Vault or somewhere your application can retrieve it, otherwise you won’t be able to view it later. For more information, see https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#create-a-new-application-secret.

Making the Connection

Congrats! You successfully set up the Service Principal and its permissions. Now you have a means for accessing ADF anonymously using a dedicated Service Principal. Next, we will go over the process for using Power BI to authenticate to the data factory and query your pipelines.

  1. Open Power BI Desktop and select Get Data. Select Blank Query from the data source options and click Connect.
    3
  2. Click on Advanced Editor in the Home ribbon.
    4
  3. Paste the code below into the editor window and replace the highlighted sections with your Azure Tenant ID, Subscription ID, Resource Group Name, and ADF Name.
  4. Retrieve the Application (client) ID from the App’s Overview page in Azure Active Directory and replace the remaining highlighted fields with your Service Principal’s App ID and secret (which you should have saved somewhere).
let
  //Set variables for authenticating to Azure using Service Principal and making API request
  tenantID = "<Azure Tenant ID Here>",
  subscriptionId = "<Azure Subscription ID Here>",
  resourceGroupName = "<Resource Group Name Here>",
  factoryName = "<Data Factory Name Here>",
  apiVersion = "2018-06-01",
  appId = "<Service Principal App ID Here>",
  clientSecrets = "<Service Principal Application Secret Here>",
  uri = "https://login.microsoftonline.com/" & tenantID & "/oauth2/token",
  res = "https://management.azure.com/",
  today = DateTime.LocalNow() as datetime,
  prev = Date.AddMonths(today, - 1),
  startDt = DateTime.Date(prev),
  startDtText
    = "'" & Text.From(Date.Year(startDt)) & "-" & Text.From(Date.Month(startDt)) & "-" & Text.From(
      Date.Day(startDt)
    )
      & "T00:00:00.0000000Z'",
  endDt = DateTime.Date(today),
  endDtText
    = "'" & Text.From(Date.Year(endDt)) & "-" & Text.From(Date.Month(endDt)) & "-" & Text.From(
      Date.Day(endDt)
    )
      & "T00:00:00.0000000Z'",
  //Obtain Authorization token for Service Principal
  authBody = [
    grant_type = "client_credentials", 
    client_id = appId, 
    client_secret = clientSecrets, 
    resource = res
  ],
  authQueryString = Uri.BuildQueryString(authBody),
  authHeaders = [#"Accept" = "application/json"],
  auth = Json.Document(
    Web.Contents(uri, [Headers = authHeaders, Content = Text.ToBinary(authQueryString)])
  ),
  token = auth[access_token],
  //Build request URL & Body using variables
  url
    = "https://management.azure.com/subscriptions/" & subscriptionId & "/resourceGroups/"
      & resourceGroupName
      & "/providers/Microsoft.DataFactory/factories/"
      & factoryName
      & "/queryPipelineRuns?api-version="
      & apiVersion,
  reqBody = "{ ""lastUpdatedAfter"": " & startDtText & ", ""lastUpdatedBefore"": " & endDtText
    & " } ] }",
  //Make API Call to query Data Factory
  reqHeaders = [#"Content-Type" = "application/json", #"Authorization" = "Bearer " & token],
  result = Json.Document(
    Web.Contents(url, [Headers = reqHeaders, Content = Text.ToBinary(reqBody)])
  )
in
  result
 

The first section of the code above sets the necessary variables and makes a call to the Microsoft API to obtain an authorization token. This token is then passed to the next section of code which queries the Azure Data Factory API using the request body. You may notice that we are setting the start and end dates in the body to create a window for the last 30 days. There are other useful fields that you can filter on in the request body (such as “PipelineName”), but for demo purposes I decided to query the entire data factory for the last 30 days. For more information on the capabilities of the ADF REST API you can find the Microsoft documentation here.

5. Finally, click Done and your M query will be saved.

5

NOTE: Once you save your M query, Power BI will ask you to set privacy levels for the data in this dashboard. It is a good idea to consult your IT department or data governance team to determine the recommended settings, but if your data factory is not deemed sensitive then you can check the box to ignore privacy restrictions on the API results.

6

Power BI will call the API to query the data for your ADF pipelines, but there are a couple more steps we need to take to turn the JSON result into readable data.

  1. Click List in the query preview window.7
The first time you connect to the Azure API, Power BI will ask you to select your authentication settings. I have found that it is best to authenticate at the Subscription scope, that way you can use the API to connect to other resources within the subscription without changing your settings. To set the scope to the subscription level:
  1. Click Edit Credentials.
  2. Choose the third option from the top in the dropdown to select your specific subscription.
  3. Click Connect.8

Once this is done, you will need to add a few more steps to parse the JSON and transform the API response into a table in your data model:

  1. Click To Table in the Transform ribbon.
  2. Click OK.9
  3. Click the icon at the top right of the column to expand the JSON records into unique rows and columns.
  4. Uncheck the box Use original column name as prefix.
  5. Click OK.
    10
  6. Repeat steps 3-5 on the parameters column to expand the ADF pipeline details.
  7. Click Close & Apply to add the query to the data model.
    11

Hooray! Now you have a functioning data source that can seamlessly load pipeline execution data from ADF to the Power BI data model. But wait a minute… there are only 100 records. Didn’t the ETL execute more than 100 pipelines last night? What you’re seeing is a result of the API’s limitation for returning more than 100 records in the JSON response. But have no fear, there is a workaround that involves invoking a Power BI function to get all the records.

Creating the API Function

Whenever an API call to Data Factory returns more than 100 records, the JSON response includes a “ContinuationToken” field which can be passed as a parameter in a successive API call. If you do not change any of the other parameters (e.g. date windows, pipeline names, or other filters), you can add this token to the body of another API request and get the next batch of records. Doing this recursively, using the next ContinuationToken from each response, will give you the full dataset. To make these recursive API calls, you just need to wrap the API call in a simple Power BI function that takes the ContinuationToken as a parameter and loops through the code, merging each batch together until you have a full master dataset. Once you create the function, you can add a separate M query to invoke the function and execute the loop whenever the report is refreshed!

  1. Create a new Blank Query in the editor.
  2. Right-click the query and select Create Function.12
  3. Click OK to create a function with no parameters.
  4. Name the function “pipe_runs_recursive” and select OK.13
  5. Open the function in the Advanced Editor and paste in the code below which allows the Continuation Token to be passed to a subsequent API call (don’t forget to insert the values for the Tenant ID, Subscription ID, Resource Group Name, Data Factory Name, and App ID/Secret variables just like before):
    let
      pipe_runs_recursive = (optional contText as text) as list => 
        let
          //Set variables for authenticating to Azure using Service Principal and making API request
          tenantID = "<Tenant ID Here>",
          subscriptionId = "<Subscription ID Here>",
          resourceGroupName = "<Resource Group Name Here>",
          factoryName = "<Data Factory Name Here>",
          apiVersion = "2018-06-01",
          appId = "<Service Principal App ID Here>",
          clientSecrets = "<Service Principal App Secret Here>",
          uri = "https://login.microsoftonline.com/" & tenantID & "/oauth2/token",
          res = "https://management.azure.com/",
          today = DateTime.LocalNow() as datetime,
          prev = Date.AddMonths(today, - 1),
          startDt = DateTime.Date(prev),
          startDtText
            = "'" & Text.From(Date.Year(startDt)) & "-" & Text.From(Date.Month(startDt)) & "-"
              & Text.From(Date.Day(startDt))
              & "T00:00:00.0000000Z'",
          endDt = DateTime.Date(today),
          endDtText
            = "'" & Text.From(Date.Year(endDt)) & "-" & Text.From(Date.Month(endDt)) & "-" & Text.From(
              Date.Day(endDt)
            )
              & "T00:00:00.0000000Z'",
          //Obtain authorization token for Service Principal
          authBody = [grant_type = "client_credentials", client_id = appId, client_secret
            = clientSecrets, resource
            = res],
          authQueryString = Uri.BuildQueryString(authBody),
          authHeaders = [#"Accept" = "application/json"],
          auth = Json.Document(
            Web.Contents(uri, [Headers = authHeaders, Content = Text.ToBinary(authQueryString)])
          ),
          token = auth[access_token],
          //Build request URL & Body using variables
          url
            = "https://management.azure.com/subscriptions/" & subscriptionId & "/resourceGroups/"
              & resourceGroupName
              & "/providers/Microsoft.DataFactory/factories/"
              & factoryName
              & "/queryPipelineRuns?api-version="
              & apiVersion,
          reqBody = 
            if contText = null
            then "{ ""lastUpdatedAfter"": " & startDtText
              & ", ""lastUpdatedBefore"": " & endDtText
              & " }"
            else "{ ""lastUpdatedAfter"": " & startDtText
              & ", ""lastUpdatedBefore"": " & endDtText
              & ", ""continuationToken"": '"
              & contText
              & "' }",
          //Make API call to query Data Factory
          reqHeaders = [#"Content-Type" = "application/json", #"Authorization" = "Bearer " & token],
          result = Json.Document(
            Web.Contents(url, [Headers = reqHeaders, Content = Text.ToBinary(reqBody)])
          ),
          //Get continuation token from previous call if available
          contToken = try Record.Field(result, "continuationToken") otherwise null,
          //Run function for each new continuation token
          CombinedResults = 
            if contToken = null
            then {result}
            else List.Combine({{result}, pipe_runs_recursive(contToken)})
        in
          CombinedResults
    in
      pipe_runs_recursive
  6. Open the Blank Query you created in step 1 and enter the following code to trigger the function and add your data to one big result set:
let
  Source = pipe_runs_recursive(null)
in
  Source
 
The function takes the Continuation Token as a parameter, but this parameter is optional. When your query runs it will trigger the function passing null as the token for the first set of 100 records, then it will pass any token it receives to subsequent calls. You will still need to expand the JSON like we did earlier, but once that’s done you will have a functioning query that returns the entire result set to your data model without the 100-record limit.
  1. Click Close & Apply to save your queries and look at your new Power BI data model!
    14

Once you have the connection set up to get raw pipeline data, you can start creating personalized measures and dimensions to visualize your data. Some things I found helpful were building a date picker to filter the last 30 days of runs down to more manageable slices. I also created measures that calculate Execution times in hours and minutes, which could help you see the big picture in terms of average ETL execution or long-running pipelines. I provided some quick examples below, but for more in-depth data visualization tips and tricks, consider attending one of our Power BI “Dashboard-in-a-Day” webinars, or subscribing to our YouTube channel.

15

While Azure Data Factory’s comprehensive integration and orchestration capabilities offer data transformation at cloud-scale speed, Power BI simplifies data visualization and interaction. Whether you’re considering streamlining and modernizing your data platform, or want to further explore Azure’s robust capabilities, check out BlueGranite’s complimentary Modern Data Platform Fundamentals webinar. Using our unique BlueGranite Catalyst for MDP, we guide companies through digital transformations from start to finish. Contact us today to discover how we can help!

New call-to-action
Evan O'Neill

About The Author

Evan O'Neill

Evan is a BlueGranite Consultant with business intelligence experience in banking, retail, and healthcare. As a Microsoft Certified Azure Data Engineer Associate, he enjoys leveraging the Azure stack to create unique data solutions for our clients. Evan received his Bachelor of Arts, in Economics, from Ohio University.

Latest Posts

New Call-to-action