Case Study

30% performance improvement by converting Netezza and Informatica to Azure-Databricks stack

30% performance improvement by converting Netezza and Informatica to Azure-Databricks stack

140 Informatica ETL scripts transformed using automation and operationalized in 16 weeks


Business needs

An American retail company wanted to migrate their legacy data warehouse from Netezza to a cloud-native centralized architecture to improve performance, reduce maintenance cost and architectural complexity, and ensure business continuity.

They wanted to:

  • Sunset Netezza and migrate all their data warehouse workloads to Azure Synapse
  • Migrate Informatica ETL workflows to Azure Databricks for processing
  • Move Tableau and SAP BO reports to Power BI on the cloud for reporting

 

Transformed 140 Informatica ETL scripts to Databricks-native PySpark code

 

Solution

The Impetus team used LeapLogic for end-to-end automated workload assessment to strategize the migration plan of Netezza schema and data, Informatica ETL workflows, and BI reports.

End-to-end lineage
End-to-end lineage

The solution used Azure Data Factory (ADF) pipelines for the one-time migration of 27 TB compressed historical data and ~100 TB of uncompressed data from Netezza to Azure Synapse. The incremental migration of 10GB data per day was performed using Databricks ADF pipelines. The team also transformed 140 Informatica ETL scripts with medium-high complexity to Databricks-native PySpark code and 20 SAP BO reports and 16 Tableau reports to Power BI reports.

 

27 TB compressed data and ~100 TB of uncompressed data migration from Netezza to Azure Synapse
Informatica to Databricks transformation
Informatica to Databricks transformation

 

A high-level functional component architecture of the implemented solution is given below:

Informatica Databricks Architecture

Highlights

  • Used multiple Synapse instances for data segregation and better maintenance
  • Brought data from DB2 and a SQL Server external source into Azure Data Lake Storage (ADLS)
  • Replaced Tivoli scheduler with ADF
  • Used HDInsight for data validation and cell-to-cell comparison between Netezza and Azure Synapse tables
  • Built 55 pipelines to ingest data in ADLS via File Sweeper
  • Cleansed data residing in ADLS raw layer using custom utilities of LeapLogic

 

25% cost reduction by sunsetting Netezza

 

The team also provided end-to-end production transition assistance, enabled operationalization on the Azure Synapse platform, and recommended areas for optimization on the target data warehouse.

 

Impact

LeapLogic helped the client sunset Netezza and realize the following benefits:

  • Reduced cost by 25%
  • Improved performance by 30% using automation
  • Improved scalability by transforming workloads to a modern, cloud-native stack
  • Ensured data availability in a single layer across the enterprise, enabling innovation and eradicating data silos
  • Operationalized Power BI reporting for clients