A Fortune 500 bank migrated 3.4 billion records and 7000 Vertica DDLs in 3 months

Released 20% capacity and reduced 40% TCO using automation for end-to-end transformation and validation

Challenge

The Fortune 500 multinational bank wanted to accelerate time-to-market and reduce TCO by transforming from Vertica to a modern data platform. Their current architecture was complicated, had data size limitations, and was proving to be expensive
with large data sizes.

The bank wanted to minimize data storage costs and simplify their data lake architecture. They also wanted real-time insights from customer interactions to offer them recommendations based on their interests, risk appetite, and capital commitment.

50% reduction in conversion and validation time

Discovery

Transforming legacy Vertica workloads to a modern platform involved identification and transformation of costly and resource-consuming ETL and analytical workloads. The bank was struggling to convert complicated and mission-critical logic. They
were looking for a solution that would simplify and accelerate the transformation process with minimum risk to ensure data integrity and quality. The solution scope involved assessment, identification, recommendations, and transformation of
workloads to be offloaded to a modern data platform.

Solution

The Impetus team used LeapLogic to automatically identify and migrate resource-intensive Vertica workloads like DDLs and KSH scripts to Hive, thereby offering expanded capabilities and opportunities for data exploration and analytics.

Workload profiling

Workload assessment for schema optimization

Schema creation and data migration

Logic transformation

Workload validation and execution

Analytical data moved to EDW

An end-to-end solution was delivered by automatically converting:

Approximately 7000 Vertica DDLs and 1500 views to Hive
Around 3000 KSH scripts
Modularized 2500 VSQL and embodied Vertica SQL (VSQL) queries in Shell-compatible HiveQL/Spark SQL
Awk scripts
50 Autosys scheduling jobs
Migrated around 100 tables from Vertica to Hive which contained about 200TB of compressed data

Released 20% capacity and reduced TCO by 40%

Impact

With fast and reliable code transformation, the solution enabled the bank to leverage their existing investments; shorten the development, testing, and validation cycles; and improved developer productivity. The delivery of the solution included:

SQOOP-based data migration scripts for all Vertica tables
Migrated and validated sample data for 140 tables used in 8 VAPs (3.4 billion records)
High-level solution architecture and design
Partitioning, bucketing, and clustering recommendations for all tables
Functional testing support and defect fixing
Hive UDF support for the native function of Vertica, which were not available in Hive