Migrating your knowledge warehouse workloads is without doubt one of the most difficult but important duties for any group. Whether or not the motivation is the expansion of your enterprise and scalability necessities or decreasing the excessive license and {hardware} price of your current legacy methods, migrating is just not so simple as transferring recordsdata. At Databricks, our Skilled Providers crew (PS), has labored with a whole lot of shoppers and companions on migration initiatives and have a wealthy document of profitable migrations. This weblog publish will discover finest practices and classes discovered that any knowledge skilled ought to think about when scoping, designing, constructing, and executing a migration.
5 phases for a profitable migration
At Databricks, now we have developed a five-phase course of for our migration initiatives primarily based on our expertise and experience.

Earlier than beginning any migration challenge, we start with the discovery section. Throughout this section, we purpose to know the explanations behind the migration and the challenges of the prevailing legacy system. We additionally spotlight the advantages of migrating workloads to the Databricks Knowledge Intelligence Platform. The invention section includes collaborative Q&A periods and architectural discussions with key stakeholders from the shopper, Databricks. Moreover, we use an automatic discovery profiler to realize insights into the legacy workloads and estimate the consumption prices of the Databricks Platform to calculate TCO discount.
After finishing the invention section, we transfer on to a extra in-depth evaluation. Throughout this stage, we make the most of automated analyzers to guage the complexity of the prevailing code and procure a high-level estimate of the hassle and value required. This course of supplies beneficial insights into the structure of the present knowledge platform and the functions it helps. It additionally helps us refine the scope of the migration, eradicate outdated tables, pipelines, and jobs, and start contemplating the goal structure.
Within the migration technique and design section, we are going to finalize the main points of the goal structure and the detailed design for knowledge migration, ETL, saved process code translation, and Report and BI modernization. At this stage, we will even map out the expertise between the supply and goal property. As soon as now we have finalized the migration technique, together with the goal structure, migration patterns, toolings, and chosen supply companions, Databricks PS, together with the chosen SI companion, will put together a migration Assertion of Work (SOW) for the Pilot (Section I) or a number of phases for the challenge. Databricks has a number of licensed Migration Brickbuilder SI companions who present automated tooling to make sure profitable migrations. Moreover, Databricks Skilled Providers can present Migration Assurance companies together with an SI companion.
After the assertion of labor (SOW) is signed, Databricks Skilled Providers (PS) or the chosen Supply Accomplice carries out a manufacturing pilot section. On this section, a clearly outlined end-to-end use case is migrated to Databricks from the legacy platform. The info, code, and stories are modernized to Databricks utilizing automated instruments and code converter accelerators. Greatest practices are documented, and a Dash retrospective captures all the teachings discovered to determine areas for enchancment. A Databricks onboarding information is created to function the blueprint for the remaining phases, that are usually executed in parallel sprints utilizing agile Scrum groups.
Lastly, we progress to the full-fledged Migration execution section. We repeat our pilot execution method, integrating all the teachings discovered. This helps in establishing a Databricks Middle of Excellence (CoE) throughout the group and scaling the groups by collaborating with buyer groups, licensed SI companions, and our Skilled Providers crew to make sure migration experience and success.
Classes discovered
Assume Massive, Begin Small
It is essential through the technique section to completely perceive your enterprise’s knowledge panorama. Equally vital is to check a number of particular end-to-end use circumstances through the manufacturing pilot section. Regardless of how nicely you intend, some points might solely come up throughout implementation. It is higher to face them early to search out options. An effective way to decide on a pilot use case is to begin with the top objective – for instance, decide a reporting dashboard that is vital for your enterprise, work out the info and processes wanted to create it, after which strive creating the identical dashboard in your goal platform as a take a look at. This provides you with a good suggestion of what the migration course of will contain.
Automate the invention section
We start through the use of questionnaires and interviewing the database directors to know the scope of the migration. Moreover, our automated platform profilers scan via the info dictionaries of databases and hadoop system metadata to offer us with precise data-driven numbers on CPU utilizations, % ETL vs % BI utilization, utilization patterns by varied customers, and repair principals. This info may be very helpful in estimating the Databricks prices and the ensuing TCO Financial savings. Code complexity analyzers are additionally beneficial as they supply us with the variety of DDLs, DMLs, Saved procedures, and different ETL jobs to be migrated, together with their complexity classification. This helps us decide the migration prices and timelines.
Leverage Automated Code Converters
Using automated code conversion instruments is crucial to expedite migration and reduce bills. These instruments help in changing legacy code, comparable to saved procedures or ETL, to Databricks SQL. This ensures that no enterprise guidelines or capabilities carried out within the legacy code are neglected because of the lack of documentation. Moreover, the conversion course of usually saves builders over 80% of improvement time, enabling them to promptly overview the transformed code, make obligatory changes, and give attention to unit testing. It’s essential to make sure that the automated tooling can convert not solely the database code but additionally the ETL code from legacy GUI-based platforms.
Past Code Conversion—Knowledge Issues Too
Migrations typically create a deceptive impression of a clearly outlined challenge. Once we take into consideration migration, we normally give attention to changing code from the supply engine to the goal. Nonetheless, it is vital to not overlook different particulars which can be essential to make the brand new platform usable.

For instance, it’s essential to finalize the method for knowledge migration, much like code migration and conversion. Knowledge migration could be successfully achieved through the use of Databricks LakeFlow Join the place relevant or by selecting one among our CDC Ingestion companion instruments. Initially, through the improvement section, it could be obligatory to hold out historic and catch-up hundreds from the legacy EDW, whereas concurrently constructing the info ingestion from the precise sources to Databricks. Moreover, you will need to have a well-defined orchestration technique utilizing Databricks Workflows, Delta Dwell Tables, or comparable instruments. Moreover, your migrated knowledge platform ought to align together with your software program improvement and CI/CD practices earlier than the migration is taken into account full.
Do not ignore governance and safety
Governance and safety are different elements which can be typically neglected when designing and scoping a migration. No matter your current governance practices, we suggest utilizing the Unity Catalog at Databricks as your single supply of fact for centralized entry management, auditing, lineage, and knowledge discovery capabilities. Migrating and enabling the Unity Catalog will increase the hassle required for the whole migration. Additionally, discover the distinctive capabilities that a few of our Governance companions present.
Knowledge Validation and Consumer Testing is crucial for profitable migration
It’s essential for the success of the challenge to have correct knowledge validation and energetic participation from enterprise Topic Matter Consultants (SMEs) throughout Consumer Acceptance Testing section. The Databricks migration crew and our licensed System Integrators (SIs) use parallel testing and knowledge reconciliation instruments to make sure that the info meets all the info high quality requirements with none discrepancies. Robust alignment with executives ensures well timed and centered participation of enterprise SMEs throughout user-acceptance testing, facilitating a fast transition to manufacturing and settlement on decommissioning older methods and stories as soon as the brand new system is in place.
Make It Actual – operationalize and observe your migration
Implement good operational finest practices, comparable to knowledge high quality frameworks, exception dealing with, reprocessing, and knowledge pipeline observability controls, to seize and report course of metrics. It will assist determine and report any deviations or delays, permitting for rapid corrective actions. Databricks options like Lakehouse Monitoring and our system billing tables help in observability and FinOps monitoring.
Belief the specialists
Migrations could be difficult. There’ll at all times be tradeoffs to steadiness and surprising points and delays to handle. You want confirmed companions and options for the individuals, course of, and expertise points of the migration. We suggest trusting the specialists at Databricks Skilled Providers and our licensed migration companions, who’ve in depth expertise in delivering high-quality migration options in a well timed method. Attain out to get your migration evaluation began.