Welcome to the April 2022 update for Azure Synapse Analytics! This month, you’ll find a highlight of the Spark 3.2 Public Preview, the new Dataverse connector added to Synapse data flows, a revamped exploration experience in database templates, and how to clone a lake database. Other new features are in SQL, Spark, data integration, and developer experience.
Check out our companion video below!
Being able to restore across subscriptions has long been one of your top requested features. Previously, it took many undocumented steps to restore a dedicated SQL pool to another subscription. You often align subscriptions with the internal cost center that should be financially responsible for a resource.
Now, with the PowerShell Az.Sql module 3.8 update, the Restore-AzSqlDatabase cmdlet can be used for cross-subscription restore. This feature unlocks billing separation, as well as other scenarios like dev/test restores. There is support for both Azure logical SQL Servers with dedicated SQL pools, as well as the dedicated pools in Azure Synapse workspaces.
To learn more and for full end-to-end PowerShell examples, read Restore a dedicated SQL pool (formerly SQL DW) to a different subscription.
Sometimes accidents happen. Occasionally, you dropped a server or workspace through automation tasks or in the portal before you realized that there was an attached SQL pool that had data. When a server or workspace was dropped, a support ticket needed to be filed with Microsoft to recover the SQL pool.
With the PowerShell Restore cmdlets in Az.Sql and Az.Synapse modules, you can now restore from a deleted server or workspace without filing a support ticket. The Source Database Id must be in the Microsoft.Sql format, as shown in the example below. These cmdlets will work for both Restore-AzSqlDatabase as well as Restore-AzSynapseSqlPool.
```PowerShell # construct the resource ID of the sql pool you wish to recover. The format required Microsoft.Sql. This includes the approximate date time the server was dropped. $SourceDatabaseID = "/subscriptions/"+$SubscriptionID+"/resourceGroups/"+$ResourceGroupName+"/providers/Microsoft.Sql/servers/"+$WorkspaceName+"/databases/"+$DatabaseName # Restore to the target workspace with the source SQL pool. $RestoredDatabase = Restore-AzSynapseSqlPool -FromDroppedSqlPool -DeletionDate $DroppedDateTime -TargetSqlPoolName $TargetDatabaseName -ResourceGroupName $TargetResourceGroupName -WorkspaceName $TargetWorkspaceName -ResourceId $SourceDatabaseID ```
Database templates make Azure Synapse Analytics industry-aware and help you standardize your data lakes by adding a business-context rich metadata layer for data in the lake. Database templates provide industry-specific schema for pre-defined business areas that can be used as blueprints or accelerators for creating the foundation of an enterprise data lake or data platform. Using the no-code database designer, these templates can be easily customized and/or extended to create a lake database.
Several quality-of-life improvements have been shipped as a part of GA. Below are some of the key updates:
Based on popular customer feedback, we’ve made significant improvements to our exploration experience when creating a lake database using an industry template. Previously, customers did not have an easy way to find all the relationships to a given table. The option to 'Add related tables' added hundreds of tables to the canvas that were related to a given table, making it difficult to navigate and remove the unnecessary tables from the canvas (see image below).
We’ve now added the relationship toggle, which enables users to view all relationships of a given table in a list view and select specific tables and relationships to be included in the canvas. This significantly improves the user experience and makes it easy to search for the related tables that matter the most and add them to your lake database. You can also undo a single last action if you've added all related tables by mistake without having to start over from the beginning. Users can also add or remove an entire business area on the left with a single click instead of having to go through and click on every single table within the business area. Each of these features helps to simplify the exploration experience and allows customers to easily create a subset of the database template suited for their business needs.
To learn more, read Quickstart: Create a new Lake database leveraging database templates
We’ve also added the option to create an exact copy of a lake database. This unlocks additional opportunities to manage new versions of databases or support schemas that evolve in discrete steps. You can quickly clone a database using the action menu available on the lake database. This feature is particularly useful when you have defined a basic enterprise lake database and require each of the departments such as Finance or Marketing to start with this enterprise definition and extend the lake databases to include tables relevant to their business functions. Each of these teams can now clone the enterprise lake database and then begin adding additional tables as needed. Similarly, you can also clone individual tables within a lake database, if required.
To clone a lake database, select the ellipse next to the database you want to clone and select ‘Clone’.
To learn more, read How-to: Clone a lake database
Lake databases sit on top of data that is in the lake and this data can live in nested folders that don’t fit into clean partition patterns. Previously, querying lake databases required that your data exists in a simple directory structure that you could browse using the folder icon without the ability to manually specify directory structure or use wildcard characters.
To allow support for custom or highly nested folder structures in your data lake, you can now leverage the pencil icon to freely enter the input folder information. You can specify wildcard paths to ensure Synapse can traverse your directories correctly and bring back query results without any issues. Being able to specify wildcard characters enables you to traverse complex directory structures without having to redefine the hierarchies present in your data lake.
To learn more, read How-to: Modify a datalake
We are excited to announce the preview availability of Apache Spark™ 3.2 on Synapse Analytics. This new version incorporates user-requested enhancements and resolves 1,700+ Jira tickets.
Highlighted improvements include:
Below is an extended summary of key new features:
Assigning parameters dynamically based on variables, metadata, or specifying Pipeline specific parameters has been one of your top feature requests. Now, with the release of parameterization for the Spark job definition activity, you can do just that. With this new feature, you can directly choose the Spark job definition and update the main definition and reference files. You can also select a different Spark pool and specify resource settings based on the specific pipeline needs. You now have more control over the Spark activity parameters and have more flexibility to change the value of parameters on the fly, either through static value or dynamic content.
For more details, read Transform data using Apache Spark job definition
We often receive customer requests to access the snapshot of the Notebook when there is a Pipeline Notebook run failure or there is a long-running Notebook job. With the release of the Synapse Notebook snapshot feature, you can now view the snapshot of the Notebook activity run with the original Notebook code, the cell output, and the input parameters. You can also access the snapshot of the referenced Notebook from the referencing Notebook cell output if you refer to other Notebooks through Spark utils.
This new feature helps you to better troubleshoot and conduct performance tuning by allowing you to access the snapshot of the Notebook activity at the time of the Notebook run.
Since the GA of Synapse, customers have asked for a fine-grained RBAC (role-based access control) role that allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications.
Now, customers can assign the Synapse Monitoring Operator role to such monitoring personas. This allows organizations to stay compliant while having flexibility in the delegation of tasks to individuals or teams.
Learn more by reading Synapse RBAC Roles.
Microsoft has added Dataverse as a source and sink connector to Synapse Data Flows so that you can now build low-code data transformation ETL jobs in Synapse directly accessing your Dataverse environment. Dataverse is a Microsoft Power Platform service that lets you securely store and manage data that's used by business applications. Data within Dataverse is stored within a set of tables and has had Linked Service and Dataset support in Synapse Analytics for use in pipelines and other areas of the Synapse workspace.
For more details on how to use this new connector, read Mapping data flow properties.
We heard from you that a 1-minute timeout for Web activity was not long enough, especially in cases of synchronous APIs. Now, with the response timeout property 'httpRequestTimeout', you can define timeout for the HTTP request up to 10 minutes.
The Web activity in Synapse Pipelines is helpful when invoking an external endpoint and helps provide extensibility when building data pipelines that span across different services. As default action, the web activity will timeout within 1 minute if it does not receive any response. Web activities work exceptionally well with APIs that follow the asynchronous request-reply pattern, a suggested approach for building scalable web APIs/services. In such scenarios, it automatically polls for the status endpoint (when it receives HTTP 202) until the API returns HTTP 200. It behaves synchronously and waits until the response is HTTP 200 before kicking off the downstream activities.
Most Azure-based APIs are async request-reply pattern and work with the existing web activity behavior. But in case you have API endpoints that do not follow the above pattern and expect the caller to wait for a response until the processing/actions are done, the response timeout property 'httpRequestTimeout' in the Web activity will now let you provide larger timeouts for such synchronous APIs.
Learn more by reading Web activity response timeout improvements.
Previously, if you wanted to reference a notebook in another notebook, you could only reference published or committed content. Now, when using %run notebooks, you can enable ‘unpublished notebook reference’ which will allow you to reference unpublished notebooks. When enabled, notebook run will fetch the current contents in the notebook web cache, meaning the changes in your notebook editor can be referenced immediately by other notebooks without having to be published (Live mode) or committed (Git mode).
Referencing unpublished notebooks is very helpful when you want to debug “locally” and you don’t want to commit or publish changes in the referenced notebook. By leveraging this approach, you can easily avoid common libraries getting polluted during the development or debugging process. Referencing unpublished notebooks also makes debugging much easier and streamlined, saving you time.
You can enable Reference unpublished notebook from Properties panel:
To learn more, read Reference unpublished notebook.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.