Data lake technology has drawn the attention of organizations that need a place to hold massive amounts of raw data until its use in analytics applications.
The data lake storage market is set to grow rapidly. Data lake providers offer such benefits as storage scalability and cost savings.
“While it remains an emerging solution, data lake storage is an increasingly popular approach to data architecture,” said Gene Locklear, AI research scientist at Sentient Digital, a technology solutions provider that serves government and commercial clients.
Unlike a data warehouse, which places data into files or folders, a data lake stores data in its native format. This capability eliminates the need to restructure data before organizations use it for various types of analytics.
Organizations need to understand their needs and their options to choose the right provider. Learn where data lakes prove most advantageous and the key buying points of this technology. Then skim the main features of major data lake providers.
Benefits of data lake storage
The technology’s primary beneficiaries include sales, marketing and customer support organizations, said Radhakrishnan Rajagopalan, global head of technology services at IT consulting and services company Mindtree.
“A data lake brings together diverse data onto a single unified data platform, enabling agile decision-making,” Rajagopalan said.
Data lakes are scalable, which enables adopters to store data in a relatively inexpensive manner. The technology also helps to decommission legacy analytics applications, which frees up capital and resources.
“It also allows companies that perhaps had legacy applications and databases to move the data to a more cost-effective storage mechanism,” said Craig Kelly, vice president of analytics at IT consulting firm Syntax.
Businesses increasingly make decisions based on insights derived from data.
“For many companies, data lakes are more economical than data warehousing, especially where speed of data retrieval is key,” Rajagopalan said.
Picking a data lake provider
Selecting a provider hinges on the type of storage platform — on premises or cloud — as well as the organization’s data governance and data types.
Data lake hosting. On-premises data lakes are most effective when the adopter invests in long-term infrastructure — including storage space, power, hardware and software — as well as the talent necessary for running the systems, Rajagopalan said. Data lakes in the cloud are best for organizations that want to outsource and need a nimble infrastructure.
Security. A match with the organization’s security and accessibility profile is the most important attribute to look for in a data lake cloud storage provider.
“There’s an inevitable tradeoff between security and ease of access and processing,” Locklear said. “If you’re working with a provider that emphasizes safety versus ease of use, or vice versa, in contrast with your priorities, you’re going to struggle from day one.”
Data handling. The data lake should easily handle all data types, whether structured, semistructured or unstructured.
“Organizations will generate all forms exponentially,” Kelly said.
While it remains an emerging solution, data lake storage is an increasingly popular approach to data architecture. Gene LocklearAI research scientist, Sentient Digital
Examples of data lake providers
Many major storage technology vendors, including IBM and HPE, can help enterprises build an on-premises data lake. Microsoft Azure and AWS are the largest cloud-based data lake providers.
Data Lake on AWS combines the core AWS cloud services needed to tag, search, share, analyze and govern subsets of data, according to the vendor. Features include a managed storage layer, encryption at rest through AWS Key Management Service and data access flexibility.
HPE touts its Apollo 4200 Gen10 Plus storage server as a building block for a modern data lake. It suits data-centric workloads and features NVMe flash, persistent memory, and high throughput and low latency required for in-place analytics, according to HPE.
IBM offers data lake deployment through its Power and Spectrum Scale products. Organizations can choose from on-premises, cloud and hybrid options. Through a partnership with Cloudera, IBM provides data governance, security and analytics.
Microsoft Azure Data Lake is a cloud service that stores and analyzes petabyte-size files and trillions of objects. Microsoft manages the Data Lake product. It includes data encryption at rest and in motion, multifactor authentication and auditing.
Many smaller players — including Dremio and Databricks with Delta Lake — are also entering the market, potentially leading to a wider supply of options at lower prices.
“As current trends continue into 2022, a wholesale migration to data lake storage may well be in the cards,” Locklear said.
Adopter challenges
Data lake adopters face problems, particularly cost issues, with storing, updating and retrieving massive amounts of data. In fact, much of that data is useless.
“Companies become data hoarders,” Locklear said. “Nobody wants to be the one who says, ‘delete it.'”
Meanwhile, without constant, vigilant management, data lakes can gradually become less effective.
“The risk is the so-called ‘data graveyard,’ where potentially relevant data becomes lost among unnecessary files, skewing metrics and analytics,” Locklear said.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.