Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (2023)

This article is based on exciting information just released by MicrosoftBarleyConference on 23 May 2023.

what we have today

IfSynapse-Analysewas created, technical sessions inspired me with some comparisons and explanations, and I reproduced them in my own technical sessions and texts.

Synapse was created at the request of many Microsoft customers. They required the ability to use a single tool for the entire data intelligence platform: collect data, store it, process it, query it, apply data science and create reports.

Synapse is a real Swiss army knife: we can take it with usSynapse data factory;query and process data using different methods,Serverloser SQL-PoolorDedicated SQL pool;and apply data science using Spark Pool and additional ML frameworks. Finally, Synapse is also connectedPower BI, so we can use some shortcuts to create visualizations.

These unique features have always been great, far better than the isolated tools we had before. But in light of Microsoft Fabric, we can see the missing points in Synapse:

  • The integration of the various tools was limited. It was the best at the time, but the integration was limited compared to Microsoft Fabric.
  • We still have to choose between different infrastructure resources like serverless SQL pool and dedicated SQL pool instead of sharing all data.
  • We still need to make infrastructure decisions, specifically the size of the dedicated SQL pool. Decisions were often largely based on guesswork.
  • It does not completely isolate storage and processing. When you use a dedicated SQL pool, processing and storage are linked.

Synapse is considered so advanced that few have noticed these problems and not all problems. Microsoft Fabric, the new product announced during BUILD, shows us this and more.

What is Microsoft Fabric?

Like Synapse, Microsoft Fabric brings together all the services required for a data intelligence environment, is highly integrated, and built in a way that requires far less technical effort to implement.

The image below shows the services included in Microsoft Fabric

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (1)

In the following sections, I will introduce these new concepts.

Data Intelligence og Software as a Service (SaaS)

Microsoft Fabric is coming, breaking standards and cementing new ones. In the cloud environment, we are used to classifying services into Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and SaaS. Synapse is classified as PaaS, while Microsoft Fabric is officially classified as SaaS. The following diagram shows the general areas that each management level and hosted management level offers.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (2)

Undoubtedly, the level of managed services provided by Microsoft Fabric is far above Synapse. Many tasks in Synapse would require careful configuration, but in Microsoft Fabric there is a kind of automatic setting that works out of the box.

Usually, when we think of a SaaS service, we think of an end-user application like Office 365 or many other applications where the user simply uses them. It is a concept not typically associated with software used to collect, transform, model and generate intelligent results from data.

This is Microsoft Fabric, software that pushes the boundaries of what we know about cloud software and services.

Microsoft Fabric is not in the Azure environment, it is in the Power BI portal. This results in a very different environment than what we have in Synapse.

But the new environment looks like something we knowPower BI-Portalas well as. The environment is designed for different experiences: you choose an experience according to the type of task you want to perform, and the environment adapts to the usual tasks associated with that experience.

The following experiences are available:

  • Power BI:The typical Power BI environment and tools
  • fabric data:This persona allows you to create and manage data streams and data pipelines like in a data factory.
  • Data enabler:This is a brand new feature that allows you to create triggers for your visuals in Power BI
  • computer technology:This experience includes several tasks. It is responsible for creating and managing Lakehouses, but also allows you to create notebooks and orchestrate them with pipelines.
  • Data Science:With this experience, you can apply Azure ML techniques to your data.
  • Data warehouse:This experience allows you to model your data as you would in a SQL database and use SQL for your data. It's hard to compare this to anything else. We can create many star models across our data lake and these models will be reused by our Power BI datasets, making it easier to have a central model for all our reports.
  • Real-time analysis:This persona is somewhat comparable to Power BI Streaming Dataflows and allows for real-time data ingestion.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (3)

Switching from ingestion (data factory), processing (data engineering), modeling and SQL (data warehouse) and more is just a matter of choosing the right experience to do the work with the same data sets.

Switching personas is like a way to focus the environment on the type of activities you want to do. Creation of the object itself still takes place in a Power BI workspace.

In addition, the main new objects are: aHaus am Seeand adata warehouse, have their own way of switching work between the two.

Microsoft Fabric and OneLake

OneLakeis the core performance iMicrosoft fabric. It exposes a data lake as a service so we can build our data lake without having to deploy it first. It is the central data store for all data inMicrosoft fabricand it will be given to the tenant on the first occasionMicrosoft fabricartifact is created.

The nameOneLakeAlso goes very well with the shortcut function inOneLake: We can create shortcuts to files that are external and directly access them as if they were in our own lake.

The image below shows how this worksOneLakeis related to the other Microsoft Fabric features.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (4)

Onelake, Lakehouse and Workspaces

The lake house is one of the core objects that we can create in a buildingGoodbye. We will create the lake house with the Data Engineer persona, and the lake house will be contained in a workspace that we know as the Power BI workspace.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (5)

Once we have created a lake house, we can use the Data Factory to load data into the file space or table space.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (6)

ThefilerThe area is the unmanaged area of ​​the lake that accepts any type of file. There we place the RAW files for further processing. Thatshouldhowever, only contains data in delta format.

Søhuset optimizesshouldArea with a special structure that can make a regular delta table up to 10x faster, while maintaining full compliance with the delta format.

However, Lakehouse is not the largest data structure we have. This position is reserved for OneLake. It is an invisible, automatically provisioned data store that contains all data for data warehouses, seahouses, datasets and more.

This allows us to build an enterprise architecture that leverages workspaces as department lakes. Lakehouse joins make it possible to share data across departments. This simultaneously ensures domain ownership of the data and a relationship between domains.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (7)

This is just the starting point for an enterprise architecture: OneLake ensures consistent control and management of the data. Data lineage, privacy, certification, catalog integration and more are unified features that OneLake brings to every lake house created in an organization.

All these functions are handled by the Power BI environment, ensuring a business management environment for the company.

OneLake and treatment isolation

When you use Synapse, the Synapse Dedicated Pool stores and processes the data. This is a scenario where storage and processing are linked.

In OneLake, storage and processing are independent of each other. The same data in OneLake can be processed using many different methods, which ensures independence in storage and processing.

Let's analyze the different methods we have to process the data in OneLake.

Kick notebooks

All work areas are activated forMicrosoft fabrichas a function calledLive-Pool. TheLive-PoolAllows notebooks to run without prior Spark cluster configuration.

As soon as the first block of code runs in a notebook, it willLive Spark Poolhappens in seconds and performs execution.

We can manage thatOneLakeData with Spark Notebooks with the benefit ofLive-Pools

computer factory

Data Factory objects such as pipelines and dataflows in the Power BI environment are the beginning of a union of ETL tools: we have pipelines and dataflows from Data Factory and dataflows from Power BI.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (8)

These two are now united and work together under Microsoft Fabric. We have an additional advantage: Dataflows Gen2.

TheData Streams Gen2is a step up from the Power BI dataflows or arguments dataflows we are used to. One of the most interesting features, in my opinion, is the ability to define the goal of a transformation that we have never been able to achieve beforePower BI dataflow(orConflicting data streams)

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (9)

SQL queries

TheMicrosoft fabricprovides two different methods of accessing data using SQL as if the data were in a regular database.

One of the methods is to use the Lakehouse object. This object provides us with a SQL endpoint that allows us to model the tables and query the data using SQL.

The second method uses a data warehouse object that provides a complete SQL processing environment for the data in OneLake.

The table below shows all the differences between the Lakehouse SQL Endpoint and the data warehouse. Some of these differences are available in the documentation. Some of these are my personal conclusions.

Microsoft Fabric offering

lager

Lakehouse SQL-endepunkt

processor motor

SQL MPP – Polaris

optimization engine

To the vertical

Vertipaq for tables

layer layer

Open data format - Delta

Primary skills

ACID-konform

Complete data warehouse with transactional support in T-SQL

Read-only, system-generated SQL endpoint for Lakehouse for T-SQL queries and deployment.

Only supports queries and views over Lakehouse delta tables

Recommended use case

  • Data warehousing for business use
  • Data warehousing to support departmental, business unit or self-service use
  • Structured data analysis in T-SQL with tables, views, procedures and functions and extended SQL support for BI
  • Explore and request delta tables from Lakehouse
  • Provision of data and archiving zone for analysis
  • Medallion architecture with zones for bronze, silver and gold analysis
  • Linking with warehouses for business analytics use

development experience

  • Warehouse editor with full support for T-SQL data collection, modeling, development and querying. User interface for data collection, modeling and query
  • Read/write support for 1stand 3rdparty tool
  • Lakehouse SQL Enpoint with limited T-SQL support for views, table-valued functions, and SQL queries
  • UI experiences to model and query
  • Limited T-SQL support for 1stand 3rdparty tool

T-SQL functions

Full DQL, DML and DDL T-SQL support. Full transaction support

Full DQL, no DML, limited DDL-T-SQL support like SQL Views and TVFs

loading of data

SQL, pipelines, datastrømme

Spark, Pipelines, Dataflows, Joins

Delta table support

Reads and writes delta table

Reading the delta table

Power BI

Microsoft Fabric is closely related toPower BISurroundings. We can in many aspects of our work, either from aHaus am Seeor a repository, start creating a Power BI report.

The access method is best:Power BIhas a new method to accessOneLake, calledDirector See.

Director Seeis a new connection method betweenPower BI datasetsandOneLake.

If we useDirectQuery, each update requires a reload of the source, which slows down the connection. On the other hand, if we useImport, the data is stored in memory and performance is better, but when the data is updated, a refresh of the record is required. Updates are not immediately visible in datasets and reports.

TheDirector SeeConnection combines the best of both scenarios: it offers the power of import mode to keep the data in memory and real-time update of data obtained by import modeDirectQuery.

Data intelligence at the speed of light: Microsoft Fabric – Simple Talk (10)

What about Azure Synapse and Data Factory?

Customers using Data Factory and Synapse Dedicated Pool can also expect easy ways to migrateMicrosoft fabric.Microsoft is focused on making the transition as smooth as possible.

Data Factory users even benefit from Gen2 data flows that Azure Data Factory does not support. So you benefit from developing data streams and pipelines using Microsoft Fabric and have an easy migration path from one to the other.

Diploma

Microsoft Fabric seems to be the beginning of a new era. In an era of Open AI/ChatGPT and Co-Pilot, we get an extremely powerful tool that makes complex data solutions available to all companies and can think of a Co-Pilot in the futureMicrosoft fabric

References

Top Articles
Latest Posts
Article information

Author: Laurine Ryan

Last Updated: 05/31/2023

Views: 5679

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.