Big Data Platform Migration – Testing

How do you plan your testing activities when it comes to migrate a big data platform?

The purpose of testing

The purpose of testing is ensuring that the end-to-end data flow is executed as expected and all the key phases are assessed and tested. 

  1. Ingestion – the platform is able to receive and consume data
  2. Enrichment – data is processed, transformed and saved for later use by other components
  3. Extraction – the platform is able to send data out from enriched sources

Therefore, is paramount that a clear test plan is discussed, agreed and produced in order to fit within the overall migration plan.
Such test plan should answer the following questions.

Questions to answer

  • What is in scope?
  • what is the testing approach?
  • What are the test phases?
  • What are the deliverables?
  • What is the schedule?

Each every question requires effort and adequate planning and preparation. Get the wrong answers or overlook some of the above and you can seriously compromise your testing exercise and outcome.

Testing phases

Once the scope and requirements are collected and agreed, you can expect to have these generic phases:

  • Smoke testing
  • Connectivity testing
  • E2E Regression testing
  • Data validation / comparison of volumes
  • Performance testing (NFT)
  • Failover testing

Challenges to overcome

If you follow all these steps you will still face challenges and blockers, which can manifest in the form of connectivity issues (e.g. firewall rules omitted), security and access issues (user accounts to be granted the right level of access), testing data missing (you might have not copied all the sample Hive tables for instance).

It is also important to provide clear lines of communication with application teams and anyone impacted by the migration of the platform, as they ultimately will be involved in connecting to the new environments and signing off their testing activities. Get them involved as soon as possible and keep them updated on the project progress and key timescales, so that they can arrange and line up the required resources for when the time come to switch over to the new cluster.