Resource management is increasingly important. Lakes are drying up [1]. The climate emergency demands a change in our consumption habits. Datalake management is crucial. It helps minimise the impact of data storage. This includes both financial and environmental costs. We regularly clear our inboxes. However, we often neglect our servers. These accumulate valuable and unnecessary data. Effective datalake management helps address this. It optimises storage and reduces energy consumption. It also minimises waste.
The growing challenge of data storage
The world’s lakes are shrinking [1]. They play a vital role in carbon storage. Several factors contribute to this decline. These include rising temperatures, human activity, and reduced rainfall [1]. At the same time, the volume of digital data is surging. This trend stems from the proliferation of connected devices and the growth of cloud computing [2]. “Dark data” refers to unused data. It represents a significant portion of all stored data [3].
Organisations rely on datalakes to manage their data. However, these datalakes consume substantial energy. Storing unnecessary or obsolete data is a major challenge. Businesses generate vast amounts of data [4]. A considerable portion of this data is never used [5]. This contributes to the growing volume of cold data [6].
The urgency of responsible datalake management
We must reduce our environmental footprint. Net Zero commitments are no longer enough. We need to act urgently. However, many struggle with this. They understand the need for action, but they lack a clear path forward. The “keep it just in case” mentality is detrimental. It harms the environment and negatively impacts a company’s finances. Research suggests that “dark data” costs businesses billions [5].
Key considerations
How can we identify and address dormant data? What data unnecessarily consumes valuable resources? We must re-evaluate data management within IT. Several key factors warrant attention:
- Governance: Clearly define responsibility for data. Specify who has the authority to add or remove data. Consider how to engage users in responsible data management practices. The role of Chief Data Officer (CDO) is becoming increasingly important in this regard [7].
- Skills: Optimise resource utilisation by considering the context of data usage. Key roles include data architects, data engineers, and data scientists. Raise awareness among all stakeholders.
- Corporate culture: Foster a culture of economy and environmental responsibility. Translate company-level commitments into concrete actions. Evaluate the motivations for data cleanup. Is it driven by environmental concerns or financial considerations?
- Storage methods: Select storage methods that align with business requirements and regulatory constraints. Consider GDPR compliance and cost limitations. Leverage cold data storage to reduce costs.
- Technologies: Assess the efficiency of various technologies. Ensure you have a plan for decommissioning systems. Understand the environmental impact of your data platform.
- Continuous improvement: Monitor storage growth and analyse trends. Examine the relationship between data growth and business value. Track the ratio of stored data to used data.
- Technical debt: Technical debt can significantly hinder efficient datalake management. Data accumulates over time. Develop strategies to address legacy systems. Integrate these strategies into your datalake management practices [8].
- Costs: The cost of storing cold data is escalating. This is largely due to rising energy prices. Consider the full spectrum of costs. This includes financial costs, environmental costs, and the increasing cost of labour.
The future of datalake management
Once we understand these factors, we can envision the future of datalake management. Two key aspects are paramount:
- Absolute value decline: The environmental impact of digital technology cannot continue to grow unchecked. We must find ways to curb this growth. This may involve technological advancements. It may also involve prioritising use cases based on their potential impact.
- Stable operating model: Define clear processes for data ingestion. Qualify and format data appropriately. Establish a framework for data suppliers and consumers. Stay informed about storage technology trends.
Data lifecycle phases
Consider these distinct phases in the data lifecycle:
- Ingestion: Once data is stored, it tends to persist. Therefore, managing data ingestion is crucial for limiting growth.
- Data cleaning: Implement processes for cleaning temporary tables. Define data lifespans and automate deletion.
- Data exposure (datamesh): Enable secure data access for other teams. Promote data leverage and optimise storage.
- Data cleaning: Implement automated data cleaning processes. Utilise appropriate monitoring and alert systems.
The CIO’s role
CIOs play a pivotal role in achieving an organisation’s environmental objectives. They must actively respond to growing environmental concerns. They must also take responsibility for reducing the impact of their own IT infrastructure. Fortunately, numerous principles and best practices are available to guide them.impact of their own assets. Fortunately, many principles and best practices in tech exist to help CIOs reduce their impacts and empower CSOs.
Sources:
[1] According to the latest study published in Science
[2] Statista
[3] According to a study by “Le GreenIT”
[4] Study conducted by the Enterprise Strategy Group (ESG) for MEGA International on “The Strategic Role of Data Governance and Its Evolution,” October 2022
[5] Forrester, 2016
[7] Statista, Mars 2023 + Statista, Février 2023
[8] Source McKinsey, Juillet 2020
Source : https://www.linode.com/content/cloud-block-storage-benchmarks/
Want to learn more about datalake management?
Check out TechRadar by Devoteam to see what our experts say about its viability in the market.