Breaking data silos and consolidating disparate data is one of the hardest tasks for any organization, despite efforts to have a single source of truth consumers continue to maintain the own sources of truth because it gives them flexibility and freedom to “tend” to their data.

The downside to this freedom is missed opportunities when the inaccessible data can’t be incorporated in decision making and centralizing data then becomes very expensive because multiple copies of the same set must be maintained.

This is where data lake, data marts and data warehouses came into picture because they solved the centralization, security and accessibility but took away the freedom.

What if data remained in its original source? No ETLs, no duplicity, no robust infrastructure to host all the data, is that even possible? Yes, yes, yes and yes

What if data remained in its original source? No ETLs, no duplicity, no robust infrastructure to host all the data, is that even possible? Yes, yes, yes and yes

This is what’s defined as data virtualization, you access the data from the source without moving it.

Benefits

  1. Accelerated Development
  2. Reduced Data migration and hosting Cost
  3. Simplified data management
  4. Centralized Access – Data will be accessible from a centralized location with control and governance
  5. Improved Collaboration

Use Cases

Eliminate Data Silos

Breaking down data silos – when certain/specific dataset is controlled by a business unit/department and isolated from the rest of organization. When an organization has limited interoperability, silos tend to grow. They do cause business operations and analytics problem by limiting data driven decision making.

Data Fabric

Eliminate need for Data Fabric

To substantiate data accessibility where silos exist, a foundation was needed to at least circumvent the gaping security and accessibility issues. Data fabric became that foundation, it would unify architecture and help organizations manage data as it moved throughout the “fabric” organization. It carried with it permissions and helped comply with regulations but still it hasn’t eliminated the adage problem – cost.

Virtualize Data Lake and Data Warehouses Tradition or even Modern data warehousing involves consolidating data into a central location through data pipelines and extraction, transformation and loading processes (ETL). Data Warehouses have been an integral part of business intelligence (BI) solutions for decades and have been evolving with time. From being hosted on premise (traditional data warehouse) to being in the cloud (modern data warehouse) in a more scalable environment. The evolvement has timely because data has continued to grow and the need for analytics increased thus the need for scalability.

The problem with the DW is the development and cost required to achieve a robust environment that fits the entire organization needs. Killing the need to ETL data enhances productivity and reduces cost which is well achieved by Data Virtualization.

The picture should be clear on where we can employ data virtualization, but how do get there?

Solutions (how to)

Data Virtualization follows a simple three step concept – Connect, Combine, Consume

Allows connection to all types of data sources – databases, cloud, big data applications, end userCombines related data into a single perspective irrespective of data format. Structured dataEnables the business to make decision through consumption with your favorite visualization or analytical tools.

Tools in the Market

  1. Azure SQL Managed Instance (Data virtualization – Azure SQL Managed Instance | Microsoft Docs)
    • OPENROWSET Syntax
    • External Tables

Next Steps

Partner with us on your data, analytics and AI strategies, we will provide a comprehensive business intelligence roadmap to help you get from 0 – 60 in shortest time possible.

Leave a Reply

Your email address will not be published.