In this special guest feature, Tony Velcich, Senior Director of Product Marketing, WANdisco, discusses what he calls the “data migration gap” and how this gap has grown even more prominent and acute given recent events. Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.
Even as a recent survey found that cloud migration remains a top priority for enterprises in 2020 and beyond – big data stakeholders still face a serious gap between what they want to do and what they can do.
Miriam-Webster defines “gap” as “an incomplete or deficient area” or “a problem caused by some disparity.” In the case of enterprise data lakes, this disparity is the difference between what big data professionals want to migrate to the cloud and what they can migrate without negatively impacting business continuity.
I call this the data migration gap. And this gap has grown even more prominent and acute given recent events. Cloud migration has never garnered greater mindshare in the executive suite and in the work-from-home trenches. The COVID-19 pandemic has ensured that everyone realizes that cloud migration is crucial for remote productivity. But even as enterprises push business-critical applications and data to the cloud, much data accrued in recent years is left behind in on-prem legacy data lakes.
Data Lakes: Left Behind
The on-prem data lake was conceived and adopted as a cost-effective way to store petabytes of data at a price tag that was a fraction of traditional data warehousing. Yet enterprises quickly realized that storing data and using it were two entirely different challenges. Organizations were unable to match the performance, security or business tool integration of their data warehouses – which were more expensive but more manageable.
Today, data lakes live on in their original formats in industries where time-sensitive and insight-rich analytics are less important, and where cost trumps efficiency. Yet more dynamic enterprises are moving from on-prem storage and billions of batch-based queries to real time analytics over massive cloud-based datasets. And for these enterprises, the question has become not whether to move petabytes of business critical and actively-changing customer data, but how to do so without causing business disruption and minimizing the time, costs, and risks associated with legacy data migration approaches?
Current Methods: Pluses and Minuses
What are the strategies being used to bridge the data migration gap? How are enterprises currently migrating their active data? There are three common approaches, each with their relative benefits (and pitfalls):
- Lift and Shift – A lift and shift approach is used to migrate applications and data from one environment to another with zero or minimal changes. However, there is a danger to assuming that what worked on-prem will work as-is in the cloud. Lift-and-shift migrations don’t always take full advantage of cloud’s enhanced efficiencies and capabilities. Often, the shortcomings of existing implementations move with data and applications to the new cloud environment – making this approach acceptable only for simple or static data sets.
- Incremental Copy – An incremental copy approach is where new and modified data are periodically copied from the source to the target environment during multiple passes. This requires that original source data first be migrated to the target, then incremental changes to the data processed with each subsequent pass. The key challenge with this approach comes when dealing with a large volume of changing data. In this case, the passes may never catch up with the changing data and complete the migration without requiring downtime.
- Dual Pipeline / Ingest – A dual pipeline or dual ingest approach is where new data is ingested simultaneously into both the source and target environments. This approach requires significant effort to develop, test, operate and maintain multiple pipelines. It also demands that applications be modified to always update both source and target environments when performing any data changes – requiring significant development efforts.
A Fourth Way: Bridging the Data Migration Gap
A different strategy, and one perhaps better suited to the dynamic data environments of most data-intensive enterprises, would be to enable migrations with no application changes or business disruption – even while data sets are under active change. This paradigm enables migrations of any scale with a single pass of the source data, while supporting continuous replication of ongoing changes from source to target.
While existing methodologies have their validity and use cases, new technology is empowering big data stakeholders to bridge the data migration gap more cost-effectively and efficiently. Choosing the right option can make migration to the cloud faster and more attainable for any enterprise.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1