The Dremio Open Lakehouse Platform and Microsoft Provide a Full-Stack Solution for Cloud Data Analytics
Cloud data lakes represent the primary storage destination for a growing volume and variety of data. For Microsoft customers, Azure Data Lake Storage (ADLS) provides a flexible, scalable, cost-effective, secure, cloud-native analytics file system for a variety of data sources. ADLS powers thousands of Microsoft and third-party applications with data.
The challenge for many organizations is making that data available for Business Intelligence (BI) and reporting. In this article, I’ll share how the Dremio Open Lakehouse Platform simplifies data architectures and accelerates access to insights on ADLS and enables ad hoc analysis and exploration with Power BI.
Data Architectures are Complex and Brittle
Traditional BI & reporting workloads rely on proprietary data warehouses to manage and run SQL queries on structured data. Companies made significant investments in this architecture, and it made sense when the primary data sources for BI & reporting were structured data from business systems housed in the data center. As the volume of data increased, however, businesses turned to the data lake as a cost-effective repository for a variety of data sources. The data lake never replaced the data warehouse as the de facto solution for BI & reporting workloads, however, and if data consumers needed access to an important source of data from the data lake, data teams built Extract, Transform, & Load (ETL) pipelines to move that data into the data warehouse in a proprietary format.
Over the next several years, the most important sources of customer and operational data will shift significantly from structured data in business systems housed in the data center to semi-structured and unstructured data from a variety of sources outside of the data center. Sensor data, social media, and the Internet of Things (IoT) will represent the largest and fastest growing sources of data for businesses. A consequence of this shift is that data consumers will increasingly require access to data residing in the data lake for BI & reporting.
This scenario creates a bottleneck for data consumers, who rely on data engineering teams to build manual, ad hoc ETL pipelines to access data. Further, to get around the performance limitations of many proprietary data warehouse platforms, data teams typically construct data copies in the form of BI cubes and extracts. An enterprise organization could have hundreds, or even thousands, of data copies.
Dremio Provides Direct Access to the Data Lake
Dremio is a SQL query engine, a query accelerator, and a semantic layer providing ad hoc and exploratory analytics capabilities directly on ADLS. Dremio eliminates the need to move or copy data from the data lake into proprietary formats, and the semantic layer provides self-service capabilities, so data consumers are more self-sufficient. Dremio dramatically reduces the effort required to provide data consumers with access to the newest sources of customer and operational data.
Dremio features a direct connector to Power BI, so data consumers can select Dremio as a data source and immediately access the data lake. Dremio’s integration with Azure Active Directory (AAD) provides fine-grained access control for a seamless and more secure user experience.
Together, Dremio, ADLS, and Power BI create a simple, yet powerful, architecture that gets insights into the hands of data consumers fast. It provides a number of benefits:
It reduces the management and maintenance effort associated with ETL pipelines and data copies. Data engineers and data architects can spend their time innovating with data, instead of responding to a backlog of data requests.
Data teams experience improved data governance and a reduced risk position with the no-copy architecture.
Data remains in place and in an open table format, such as Apache Iceberg. This was the top lesson learned by Microsoft as they were implementing their own data lake, as Microsoft’s Raji Easwaran shared in her Subsurface Session. Data needs to be accessible by multiple engines, and by a variety of different tools.
Dremio can query data in ADLS and across disparate data stores, such as relational and NoSQL databases, as well as on-premises object storage and HDFS, so data consumers have access to a unified view of their data as they transition to the cloud. Customers like NCR accelerated their cloud adoption by building a semantic layer that bridged the gap between their legacy and modern data architectures.
Technical and non-technical data consumers are more self-sufficient. Dremio’s semantic layer enables SQL users to create and leverage virtual datasets on their own. Power BI and Dremio give data consumers a powerful self-service platform to explore and discover actionable insights.
Dremio’s accelerated query performance, regardless of the scale of the data, and direct access to the data lake improve the Power BI user experience, and ultimately expand adoption and return on the customer’s investment in the platform.
Customers often experience a significantly reduced Total Cost of Ownership compared to traditional data warehouse architectures.
A Global Consumer Products Company Improved Supply Chain Analytics
A multinational consumer products company with 150 manufacturing sites in 30 countries needed accurate and timely insights into its supply chain and consumer demand across its sprawling operations. They had tons of data, but it was siloed and fragmented, and analysis was inefficient. Hundreds of analysts across the company were using different BI tools and trying to recreate the same business logic and Key Performance Indicators (KPIs) in various BI tools.
The company launched a strategic initiative to centralize data access and data analytics across the company with a data lake. They used Azure Data Factory to ingest data from a variety of sources into ADLS. Dremio served as a query accelerator and semantic layer, providing fast, centralized access to the data in the cloud, as well as to data in a variety of relational databases that the company has yet to migrate. Dremio connects Power BI users directly to the data stored in both ADLS as well as other repositories.
The company gained wider visibility of their supply chain, including additional factors like weather patterns and traffic information. Dremio accelerated adoption of their modern data platform by allowing users of other relational databases to join that data with data in ADLS. Ultimately, the solution gave data consumers across the business access to a consistent and unified view of their data, eliminated the effort and costs associated with manually duplicating business logic across the organization, and accelerated access to analytic insights.
A Powerful Modern Data Stack for BI & Reporting
As data sources have expanded, data teams have essentially relied on a series of workarounds in order to provide access to that data for insights. If they haven’t already, those workarounds will become a burden that will limit an organization’s ability to address analytics use cases with insights derived from the newest and fastest growing sources of data.
Dremio and Microsoft provide a data architecture that simplifies and accelerates access to data for analytics in ADLS. It leverages open technologies, ANSI-SQL functionality, and tight integration between the storage layer, the query engine, and the presentation layer to empower data consumers to explore and discover insights quickly and easily.
Get Started with Dremio Today
Register for Dremio Community Edition on the Azure Marketplace. You can start exploring ADLS with Dremio in minutes and connect your Power BI instance directly to Dremio as a data source. Get started here.