Case Study

Data platform modernization on AWS significantly reduces passenger wait time for a major US airline

Established a futuristic data platform on AWS with integrated analytics, built-in governance, and intelligent data profiling capabilities

Business needs:

A global Fortune 500 airline wanted to migrate their 20-year-old legacy warehouse to an AWS data lake for personalizing the user experience with contextual content. In the Covid-19 era, where airlines need to check the travel readiness of passengers in real-time, they wanted an intelligent data platform to meet new-age business and technical needs.

  • Support AI and ML-based use cases to swiftly provide actionable insights at scale
  • Cater to diverse and complex use cases leveraging real-time and batch analytics
  • Process over-structured, semi-structured, and unstructured data
  • Provide a single source of truth

Technical needs:

  • Ingest, cleanse, catalog, optimize, and analyze data from diverse sources in real-time
  • A self-service framework to facilitate:
    • Data onboarding and AWS pipeline creation
    • Automated resource creation
    • Adequate data quality assurance
    • Metadata capture and cataloging
    • Cloud cost monitoring and optimization
  • Intelligent monitoring and alerting to ensure:
    • Scalability
    • High availability
    • AWS cost optimization
    • Operational readiness


Support for 200+ batch and 30+ real-time multi-TB data feeds with automated infrastructure setup and deployment



The data platform modernization journey involved three steps:


Rearchitecting the platform

Using frameworks and accelerators

Onboarding key use cases

The Impetus team automated common patterns and enabled template and reusable components for rearchitecting ~50% of workloads in an Amazon Redshift/S3-based data lake. The remaining workloads were automatically migrated as-is to AWS, using LeapLogic, a cloud transformation accelerator by Impetus. LeapLogic’s automation capabilities helped save 22% time and 70% effort compared to manual migration.

The team also created ingestion, monitoring, and validation frameworks to ingest data feeds from various sources like Teradata, Oracle, MS SQL, Hadoop, Rest API, and Google Analytics to Amazon S3. For certain batch and real-time use cases, the framework leveraged Gathr, Impetus’ all-in-one data pipeline platform. The data lake architecture was built using AWS offerings like Glue, Kinesis, Athena, Redshift Spectrum, EMR, and SageMaker.

Their centralized data platform’s capabilities are as follows:

  • Support for 200+ batch and 30+ real-time multi-TB data feeds
  • Unified data catalog and governance to authorize, manage, and audit access to data
  • Automated data quality checks (null, regex, data type, etc.) to segregate bad and good records
  • Single-click deployment for data pipelines and platforms using AWS CloudFormation templates and AWS CodeDeploy
  • Serverless data pipelines leveraging Lambda, Managed Airflow, Glue, EMR, and CloudWatch
  • TeamCity pipelines to onboard Spring Boot and Java applications as Docker containers
  • End-to-end DevOps for data platform and use cases
  • Intelligent data profiling and data quality checks
  • Unified consumption layer for seamless onboarding of diverse use cases


Saved 70% of workload translation efforts using LeapLogic



The AWS-based data platform enabled easy onboarding of use cases and ensured data accuracy and quality for downstream applications. It helped the airline improve the passenger experience by:

  • Personalizing bundle offers for customers using ML models
  • Reducing wait time at the airport gates by predicting passengers’ adherence to COVID travel norms
  • Auto-approving travel readiness by verifying the validity of COVID documents like test reports, vaccinations, government forms, etc.

The flexible platform enabled unified governance and consumption and equipped the airline’s business teams to process data faster for real-time decision making and improved cost efficiency.

You may also be interested in…