As every tech professional knows, cloud providers are always updating their services, adding new solutions and coming up with new ways to manage, process and transform data. Microsoft are no different. They've recently brought a new service into the mix, Microsoft Fabric and I'm going to show you today how to build an Azure-native data platform solution with Fabric at the helm of your data ecosystem.
The Solution
Here's the full end-to-end architecture, leveraging as I mentioned above, Fabric at the helm. To make things a lot easier to follow I'll quickly cover points:
Ingestion Services - All services related to landing data from multiple sources including streaming data, event data and 3rd party landing data directly into storage using SAS Keys.
Landing Zone - Storage account where all data is landed, split into containers relating to the service/product data is ingested from.
Microsoft Fabric Workspace - Dedicated workspace for moving data through the necessary layers and to be surfaced to Data Science and Business Intelligence/Insight teams.
Extract Service - Providing API endpoints for 3rd parties to pull data extracts.
Extended ML & AI Capabilities: Usage of Azure ML (AML) is optional but provides a lot of good features that Fabric doesn't at the moment. As well as Azure AI foundry for creating custom GPT models. applications and individual AI resources like computer vision. I'll explain this a bit more further down the article.
Outside of the above we have operational steps that encompass the entire solution:
SecOps - Entra, covers authentication, role-based access (RBAC) and service principals/management identities for interacting with services across the platform. Defender, a good tool for monitoring poor deployments and attack paths. Sentinel, for security logging and enhanced threat detection. Key Vault, which is a service for storing secrets and keys across the platform.
DataOps - DevOps for CI/CD implementation, management of Fabric assets and general version control, testing and IaC deployments. Fabric Data Pipelines which automate data flows across the medallion architecture. Lastly, Purview, enables governance, data cataloging and lineage tracking across the entire platform for auditing and discovery.
FinOps - Azure Cost Management, tracks and analyses resource consumption using tagging strategies as well as suggestions for optimisation and spending trends. Azure Budgets, a key tool for making sure you have cost control across the platform. Power BI Dashboards to visualise spend, usage and metrics by workspace, service and subscription level.
This is a real-world practical example of a platform that supports high-scale use cases like marketing pipelines, sales system support, ML workflows and self-service BI. All contained within operational best practice and tools to make sure you're not allowing ridiculous access levels and keeping your eye on costs that can get out of control.
Explaining the AI Foundry part of the solution, i mentioned earlier that you can use AI Foundry to create custom GPT models and also individual AI resources like computer vision. As of right now you can't directly create AI services like computer vision and document intelligence with the AI Foundry Studio, but the AI Foundry service within the Azure Portal contains all of the provisioned AI services. So, it makes more sense to group everything within AI Foundry instead of a load of icons of all the potential services you can deploy.
You Don't Have to Use EVERYTHING in the Above Solution
All organisations have data requirements of all different shapes and sizes, some stream most of their data, some land most of their data via csv and 3rd party tooling. Some even cover it all or employ specific data teams to look only at marketing data. So, i wouldn't look at that solution as:
"So many services, the costs are going to be insane"
Look at it from a perspective of getting your core services in place like:
Storage
Fabric
PowerBI
DataOps - CI/CD
SecOps - Entra & Key Vault
FinOps - Cost management and budget control
Building a scalable data platform is about iterative improvements, not a big-bang of mass deployments. Start with one project, template it and apply to everything else. Apply small improvements and new features when new projects are more complex than the previous ones.
The main goal of a modern data platform, regardless of cloud provider, is Engineer for AI-Readiness. You don't want to build purely with reporting in mind and then figure out further down the line you need to completely redesign your data model, metadata layers, and semantic models to be able to utilise ML and AI initiatives.
This means you need to allow for testing, incremental upgrades, and effective planning. There's a saying I've seen on LinkedIn and it's stuck with me ever since:
"You can have it fast, cheap, or good... but you can only pick two."
Fast & Cheap... low quality.
Fast & Good... expensive.
Cheap & Good.. slow.
This should ultimately provide you with the foundational knowledge and direction to follow when it comes to developing an Azure-native data platform. As well as the operational standards you need to follow to make sure everything is inline with best practices from the get-go. If you're interested in having this broken down into key segments let me know in the comments or drop me a direct message, more than happy to chat through this in more detail.
I’ve also created a freemium eBook: “Data Engineering for AI-Readiness” that covers the operational principles that you need to apply within your data function to aid AI-readiness and foster collaboration across teams. Give it a download here, its free but if you feel its worth something then drop me whatever you think its worth.