CICD Automation in Synapse Analytics: taking advantage of custom parameters in Workspace Templates

Published Jun 21 2022 08:00 AM 2,491 Views
Microsoft

Introduction

 

Azure Synapse Analytics is an integrated analytics platform, which combines data warehousing, big data analytics, data integration, and visualization into a single environment. Azure Synapse Studio is the primary tool to interact with the many components in the platform, allowing a wide range of activities against your data.
 
When working collaboratively in Synapse Studio, it is recommended that you integrate your Synapse Workspace with your Git repository to take advantage of source control, as it will allow developers to collaborate on code and track changes. Working in a multi-environment scenario that requires the implementation of Continuous Integration and Continuous Delivery processes is also an important reason for integrating your Synapse Workspace with a source control system.
 
Automating the delivery of your Synapse artifacts (pipelines,notebooks,sql scripts,etc) across multi-environments scenarios need these artifacts to fit the different characteristics of the referenced resources in these environments. Here’s an example: a Linked Service that is using an Azure Key Vault (AKV) to get a connection string from a Secret. Different environments will use different AKVs and different Secrets. When delivering this Linked Service across many environments you want that linked service to refer to the appropriate AKV and Secret in each environment. You can achieve this by taking advantage of custom template parameterization in your Synapse CICD lifecycle. The goal of this article is to show how you can use custom parameters in your Synapse Workspace templates to ease the delivery of your code in a multi-environment scenario

 

Before we start…

The Continuous Integration process in Synapse Workspace starts from the moment you integrate your Workspace with your Git repository, and you begin writing, testing, and reviewing your code, building a shared codebase with your team members. The culmination of this process happens when you Publish your code and generate the ARM templates into your Git repository.
 
These ARM templates will be the input for your Continuous Delivery process, as they contain all the workspace artifacts that you want to deploy into a target environment. Although these ARM templates contain all the workspace artifacts and their corresponding properties, not all artifacts’ properties are exposed and parameterized by default. The default parameter template is limited to a few artifact properties, and you will observe that as you create different artifacts in your workspace you will need other artifacts properties to be exposed rather than the default. This is where the custom template parameterization comes in.
 
This article outlines how to use a custom parameters template to override non-default properties in your Synapse Workspace ARM templates enhancing the Continuous Delivery process for your Synapse Workspace artifacts.

 

 

 

Synapse Workspace ARM Templates

 

When you integrate your Synapse Workspace with your Git repository, you need to define the organizational structure of your Source Control System (Organization->Project->Repository) , your workspace shared codebase folder (collaboration branch) and workspace ARM templates folder (publish branch).

 

RuiCunha_0-1655333097620.png

 

 

Once you integrate your workspace with your Git repository, you will no longer be authoring your code against the Synapse service. Instead, all the changes will be first committed to your Git repository before getting published in Synapse Service (Live Mode).

RuiCunha_1-1655333097673.png

The Continuous Integration Process in Synapse Workspace

The Publish operation is divided in two stages: a first stage where all the pending changes from your Git collaboration branch are stored in your workspace (Live Mode); and a second stage where the workspace ARM templates are generated and saved in the workspace publish branch. These two ARM templates represent the outcome of the Continuous Integration process in your Synapse Workspace:

TemplateForWorkspace.json is the ARM template containing all the workspace artifacts and resources

TemplateParametersForWorkspace.json is the ARM template containing only the artifacts parameters.

 

 

The default parameters template

 

After integrating your Synapse Workspace, your first Publish will generate a TemplateParametersForWorkspace.json file containing a global parameter for your workspace name and a parameter for each workspace default linked service: the default SQL Server and the default Storage account.

 

RuiCunha_2-1655333097679.png

 

 

Where are these parameters coming from?

When in Synapse Studio, go to the "Manage Hub" -> select "Linked Services" and mouse over one of the workspace default linked services (in this example below I'm selecting the default sql server) and select “Code” {}

RuiCunha_3-1655333097681.png

 

 

You can see these properties highlighted below that are being exposed by the Workspace default parameter template.

RuiCunha_4-1655333097684.png

 

 

RuiCunha_5-1655333097685.png

 

 

Now let’s create a new Linked Service, using the Azure Key Vault connector, and publish the pending changes to generate the new ARM templates.

RuiCunha_6-1655333097687.png

 

Check the TemplateParametersForWorkspace.json in your publish branch to confirm that the new AKV linked service "baseUrl" property is also being exposed by the default parameter template.

RuiCunha_7-1655333097691.png

 

RuiCunha_8-1655333097693.png

 

Now let’s create and publish a Notebook attached to an existing Spark Pool.

 

RuiCunha_9-1655333097695.png

 

 

Check the TemplateParametersForWorkspace.json in your publish branch. No sign of any notebook property,right?

 

RuiCunha_10-1655333097700.png

 

But if you check the TemplateForWorkspace.json in your publish branch, you will find several notebook properties!! Here’s a clear example of an artifact whose properties are not exposed by the default workspace parameters template.

 

RuiCunha_11-1655333097704.png

 

Let's use a different kind of artifact, a Dataset, and see if the default template will expose its properties.

 

RuiCunha_12-1655333097706.png

 

Again, no sign of these Dataset properties in the parameters file:

 

RuiCunha_13-1655333097710.png

 

Although these Datasets are part of the main template file with several properties associated:

 

RuiCunha_14-1655333097719.png

 

 

 

Using a customized parameters template

 

The use case for using a customized parameters template in Synapse is simple: when you want to automate your CICD process in Synapse and you need to override any artifact property that is not parameterized by the default parameters template.

 

How does it work?

 

After publishing your pending changes from the collaboration branch into Synapse Service (Live Mode), Synapse will verify if there is any custom template file stored in the root folder of your collaboration branch with this exact nametemplate-parameters-definition.json”. If this file exists, Synapse will use its configuration to generate the ARM template parameters; if it does not exist, it will use the default parameters template.

 

Creating your custom parameters template

 

From your Devops collaboration branch, hit the “More Actions” button and then select + New -> File to create a new file in the root folder of your collaboration branch.

 

RuiCunha_15-1655333097723.png

 

 

Important: Create a new file with this exact name: template-parameters-definition.json

 

RuiCunha_16-1655333097726.png

 

Hit the “Create” button and copy the parameters template definition JSON example from Microsoft Public documents: Continuous integration & delivery in Azure Synapse Analytics - Azure Synapse Analytics | Microsoft D...

 

Paste the JSON content into the new template-parameters-definition.json file. Don’t forget to select “Commit” to save your changes.

 

RuiCunha_17-1655333097734.png

 

 

Now that we have saved the custom template file, it’s time to generate the new Synapse Workspace ARM templates.

Switch to Synapse Studio and do a minor change in your code to force a new commit, publish this change in the Live Service to generate the new ARM template files.

Once the ARM templates get generated, check the TemplateParameterForWorskspace.json arm template in your publish branch. This file content will now look much different from the original one, as you have now exposed more properties to parameterize.

 

RuiCunha_18-1655333097744.png

 

 

You may ask: If we have published the Storage Account and the Synapse SQL pool datasets, why the properties for the latest are missing in the TemplateParameterForWorskspace.json?

 

Let’s take a look at the template-parameters-definition.json file and check the /datasets section:

 

RuiCunha_19-1655333097746.png

 

 

We are exposing any key-value pairs that are included under the “properties” -> “typeProperties” object.

 

Let’s analyze the JSON code associated with each dataset.

Starting with the Storage Account dataset:

we have four key-value pairs listed under the “typeProperties” object:

 

RuiCunha_20-1655333097750.png

 

Looking at the TemplateParameterForWorskspace.json file, we confirm the presence of these properties.

 

RuiCunha_21-1655333097752.png

 

Now let's look at the Synapse SQL pool dataset JSON.

Since there are no key-value pairs under the “typeProperties” object, no properties will be exposed in the ARM template to parameterize.

 

RuiCunha_22-1655333097756.png

 

 

Example: Using a custom parameter template to attach a Notebook to a different Spark Pool

 

Microsoft strongly recommends that you prepare your pools before migrating the workspace artifacts, making sure you use the same name for your pools across your environments. In some circumstances, you may need to attach your artifacts to a different pool in your target environment. Using a custom parameter template can help you achieve this goal.

 

In this example, I’m going to show how you can take advantage of custom parameterization in your parameters template to attach a Notebook to a different Spark Pool, when deploying this Notebook to a target environment hosting a Spark pool with a different name.

So here’s the case where you have two environments each one hosting a Spark Pool with different names.

 

DEV Environment

UAT Environment

RuiCunha_23-1655333097757.png

 

RuiCunha_24-1655333097759.png

 

 

Here’s an example of a Notebook in the DEV environment that is attached to a Spark Pool named “mysparkpooldev”.

 

RuiCunha_25-1655333097762.png

 

 

Taking a closer look at the Notebook JSON code, there are multiple properties where this Spark Pool is being referenced.

RuiCunha_26-1655333097770.png

 

 

Now we need to change the template definition file (template-parameters-definition.json), and find the Microsoft.Synapse/workspaces/notebooks section to expose these additional properties. You can find highlighted below, the code that you need to add to this section to expose these properties.

 

"Microsoft.Synapse/workspaces/notebooks": {

        "properties": {

            "bigDataPool": {

                "referenceName": "="

            },

             "metadata": {

                "a365ComputeOptions": {

                        "id": "=",

                         "name": "=",

                        "endpoint": "="

                }

            }

        }

     }

 

Don’t forget to select "Commit" to save your changes.

 

Switch now to Synapse Studio, and make sure you make a minor change in your notebook to force a commit and publish your changes. This will generate the ARM templates based on the new template definition file.

Once the template generation is finished, check the TemplateParametersForWorkspace.json in your workspace publish branch to confirm that the new notebook parameters are now being exposed.

 

RuiCunha_27-1655333097776.png

 

Once you confirm that the necessary properties are being exposed,  you can config the Workspace Deployment task in your Release Pipeline and add these new parameters in the “OverrideParameters” section.

 

RuiCunha_28-1655333097780.png

 

 

As an example, I’m overriding this parameter “NotebookA_properties_metadata_a365ComputeOptions_id” using the target Spark Pool resourceURi:

/subscriptions/<target_workspace_subscription>/resourceGroups/<target_workspace_RG>/providers/Microsoft.Synapse/workspaces/<target_workspace_name>/bigDataPools/<target_spark_pool>

 

After executing your Release Pipeline in Azure DevOps, go to your target Synapse Workspace and open the Notebook to confirm that it is now attached to a Spark pool with a different name.

 

RuiCunha_29-1655333097782.png

 

Parameter renaming

 

To simplify the parameter overriding operation and code maintenance, you can take advantage of a custom parameters template to provide shorter names to your parameters.

Let’s take this parameter name as example: NotebookA_properties_metadata_a365ComputeOptions_id. Lengthy name, right?

Let's make this parameter name shorter,  like “NotebookA_meta_id”.

 

You just need to edit the Notebooks section in your template-parameters-definition.json and use the custom parameter syntax as explained here.

 

Use the format <action>:<name>:<stype>

<action> -> we are using the “=” character to keep the current current value as the default value for the parameter.

<name> -> we are using “-“ character (because we don’t want to keep the default name) followed by the new name.

<stype> -> we don’t want to change the default type, so we are omitting this value (by default the parameter type is a string).

 

Here’s how the Notebook section will look like:

 

RuiCunha_30-1655333097784.png

 

Now if you switch back to Synapse Studio and publish any pending changes from your collaboration branch to generate the new ARM templates, you will see that this Notebook parameter has been renamed from “NotebookA_properties_metadata_a365ComputeOptions_id” to “NotebookA_meta_id”.

 

RuiCunha_31-1655333097789.png

 

 

Known Limitations

 

At the time of this writing, Synapse will fail to generate the ARM templates if they exceed the 20MB limit each.

If you are experiencing this limitation and failing to generate these ARM templates during the publish operation, you can evaluate if by using a custom parameters template and renaming you parameters to use shorter parameter names will decrease the ARM file size and allow the ARM template generation.

 

Conclusion

 

When using automated CI/CD in Azure Synapse Analytics, users can take advantage of custom parameters to extend the capabilities of the default Workspace template, allowing the exposure and the overriding of any artifact property that is not parameterized by default.

 

Source control in Synapse Studio - Azure Synapse Analytics | Microsoft Docs

Learn how to configure source control in Synapse Studio

 

Create custom parameters in the workspace template

Learn how to use custom parameters in Synapse CICD

 

Best practices for CI/CD in Azure Synapse Analytics

If you're using Git integration with your Azure Synapse workspace we recommend these best practices

 

2 Comments
Co-Authors
Version history
Last update:
‎Jun 20 2022 03:03 PM
Updated by: