Introduction to Microsoft Fabric

Microsoft Fabric's end-to-end cloud solution has empowered us to easily understand and act on high-volume, high-granularity events in real-time with fewer resources. Microsoft has announced new unified analytics SaaS platform Fabric.

4.4K

In this document, I share my initial encounter with Fabric, providing a starting point for your data engineering journey. Although there is a vast amount of information on Fabric to explore and learn from, I focus on my first-hand experience with the platform.

Follow below steps to start the Fabric Journey

1. Sign up for trial version https://aka.ms/try-fabric

Alternatively you can login to https://app.powerbi.com/

2. From user section, start 60 day trial version of Fabric

3. Once free trial is successfully activated, you can choose experience you are interested as shown below. (Option to choose the experience is at left-bottom corner)

Here either choose the experience you are interested in or click on “Microsoft Fabric” link to open the Fabric in new tab

4. Microsoft Fabric Data engineering landing page will look like

5. Here you can create first lakehouse from “Lakehouse (Purview)”

6. Once the lakehouse is created, you can register or feed in data to this lakehouse

7. There are multiple options to upload the test data to lakehouse.

While first 3 options are self-explanatory, create a shortcut is new in fabric. With this option you can connect data from your existing storage services(ADLS or S3). For creating shortcut, you need the URL of the storage, and there are different ways for authentication.

8. Once data is uploaded, for processing this data, create the notebook by following below

9. From here, you can load data in spark dataframe with just click of buttons. Click on 3 dots in front of file you want to load into dataframe -> load data -> spark and you will get spark code to read parquet into dataframe. (This is similar to synapse spark)

If you have data in directory, you need to modify the code for changing the path to read data from directory.

10. Once you run the code-cell, it loads the data into dataframe. Creating cluster and running the spark code to load the data is very fast. It took 9 seconds to start and load the data into dataframe and run display action on dataframe.