Skip to main content

3 posts tagged with "Super Data Swamp"

Mis-managed Data Lakes and Lakehouses will result in a Super Swamp

View All Tags

Data - Asset vs Liability?

· 4 min read
Dan Peacock
Chief Hustler

At its core, an asset is something that creates value and drives growth—directly or indirectly. It should strengthen resilience, fuel innovation, and deliver competitive advantage.

Now consider your data: Is your Enterprise Data Platform positioned as an asset, or is it on track to become a liability?

As your business expands, the foundations you set for your data become critical. A true data asset reduces reliance on ever-changing applications and removes the need for specialist technical skills just to interpret the numbers.

If your data is fragmented, locked inside specific applications, or difficult to access and trust, it isn’t working for you—it’s working against you. That’s a liability. An asset, by contrast, is structured, accessible, and independent of the tools that produce or consume it. It supports better decisions, scales with growth, and enables innovation without compromise.

The challenge is clear: applications are changing faster than ever. Modern users adopt and discard tools in rapid cycles, and every swap or upgrade risks breaking your data foundations if they’re tightly coupled to the source systems.

A future-ready data platform solves this. By decoupling your data from the tools, it preserves continuity and consistency, no matter which applications are in play. This agility ensures innovation can move quickly—without undermining governance or data integrity.

So the question is: which path is your organisation on—the liability route, or the asset path?

Liability

Whenever an application changes—whether through a full replacement or a version upgrade—it sends ripples through your entire data architecture. Pipelines must be rebuilt, and data engineers are pulled in to reconstruct ingestion processes, transformation layers, and business logic from the ground up.

CryspIQ

The result is often a proliferation of new data marts and products, detached from historical context. Instead of strengthening your data asset, you duplicate effort and fragment insight.

Take the shift from SAP ECC to SAP HANA. What appears to be a straightforward technical migration usually involves redesigning the data model, remapping source tables, rewriting logic, and redefining KPIs. The outcome? Disrupted continuity, redundant datasets, and an ever-growing maintenance burden.

Without decoupling your data from the systems that produce it, every upgrade becomes a costly rebuild—leaving you with a patchwork of short-term fixes instead of a durable, scalable data platform.

Asset

Your data isn’t bound to the applications that generate it. Instead, you capture what really matters—business-relevant data—through modular, application-specific plug-ins or APIs. These plug-ins extract and map data into a stable, reusable enterprise model that remains consistent no matter which tools your teams adopt.

CryspIQ

When a new application is introduced, it’s simply a matter of mapping. You configure the plug-in to align with your established business definitions and structures, and once connected, data continues to flow—intact, governed, and ready for use—without disrupting downstream analytics or products.

This makes system changes a non-event. Because your data already lives in a durable, enterprise-wide model, moving from ECC to HANA, for example, is just a matter of mapping business objects into the model. Your KPIs, definitions, and logic remain unchanged.

No rebuilds. No duplication. No loss of continuity. Your data asset stays consistent, reliable, and immediately usable—regardless of the underlying application.

That’s the power of a true Asset: system changes don’t break your business logic—they simply plug in and keep going.

Outcome

We believe the trajectory of every data platform is determined by the data warehousing methodology it follows. Traditional, application-centric approaches inevitably result in fragmentation, duplication, and escalating maintenance costs. In contrast, a business-centric, model-driven methodology builds a true data asset—delivering continuity, scalability, and lasting value, no matter how often applications change.

That’s why we built CryspIQ®—to empower you to turn data into a long-term asset, avoid the liability pitfalls, and lead your business with confidence. Sign-up

Unpacking Buzzwords.

· 7 min read
Dan Peacock
Chief Hustler

"Lakehouses", "Lakebases", "Meshes", and "Medallion Architectures". Regardless of the "buzzword" being used, it's essential to understand the underlying methodology, as these ones all follow the same foundational Data Lake methodology — yet the core business question often remains unanswered or simply assumed. Before committing to a data journey that typically spans 3–5 years and costs in excess $25 million — an approach frequently promoted by industry quadrants, shaping strategic architecture decisions — maybe worth considering the following points:

Replication

Most data solutions today are fundamentally flawed because they rely on rigid, complex data pipelines that replicate or copy data from source systems into a central data lake or warehouse. While this may have made sense a decade ago, in today’s AI-driven landscape, it introduces major inefficiencies. These pipelines create an additional layer of infrastructure that is not only expensive to build and maintain, but also scales poorly as data volumes — and AI workloads — continue to grow.

The world is generating unprecedented volumes of data, across every function and system, yet the traditional approach treats all data equally: copy it, store it, and hope it becomes useful later. The result? Huge costs, significant latency, and teams stuck managing infrastructure instead of driving insights. In the age of AI, we need a fundamentally different approach.

CryspIQ

Bottom line: Are we confident this is still the right approach? Should we be focusing more on business-relevant data, rather than potentially including unnecessary noise?

Catalogues

Cataloguing data that has simply been replicated from source systems requires specialist knowledge. Most business users interact with the system through the user interface and aren’t familiar with the underlying database fields or their purposes. Proper data governance, however, demands classification down to the field level. Yet, this is often seen as impractical, leading teams to default to table-level classification instead. True field-level classification requires a clear understanding of:

  • The type of data (master, reference, or transactional)
  • Whether the data is subject to privacy regulations
  • The sensitivity of the data within the organisation
  • Any applicable legal or regulatory requirements (e.g. SOCI compliance)

Despite this, the industry focus has shifted toward identifying so-called “critical” data. In reality, around 80% of the data is irrelevant, kept only for “just in case” scenarios. This often leads business users to question the value of the effort—especially when more immediate concerns are competing for attention. Without clearly defined data ownership, securing business engagement and buy-in becomes a significant challenge.

CryspIQ

Bottom line: Perhaps we should consider whether it’s reasonable to expect business domains to catalogue data they may have never encountered before. This might be a knock-on effect of the broader Data Lake approach, and could be worth re-evaluating.

Products

Building data products involves identifying the data that’s truly relevant to the business and transforming it into curated datasets — essentially modelled, tabular views — that can be shared and reused across business units. This process requires domain-specific data engineering and modelling expertise, which is increasingly scarce and expensive. Examples of such modelled data products include:

  • Invoice data product from Finance – SAP
  • Work Order data product from Finance – SAP
  • Purchase Order data product from Finance – SAP
  • Vendor Contract data product from Procurement – SAP
  • Work Task data product from IT – JIRA
  • Customer Survey data product from Customer team – Website
  • Customer Activity data product from Customer team – CRM
  • Asset Maintenance Plan data product from Asset team – SAP
  • Asset Inspection data product from Operations – Inspection App
  • Asset IoT data product from Asset team – Sensor Apps
CryspIQ

Bottom line: Data products—such as business objects from applications—are ultimately an interpretation of what we believe the customer needs. Often, they’re shaped by assumptions, and while well-informed, they may still represent our best guess.

Layers

Controlling the duplication of master data across business domains becomes nearly impossible with the democratisation of data. As each domain takes ownership of its own data, multiple systems often end up holding versions of the same master data—raising questions about which source is truly authoritative. This challenge is further complicated by the common practice of replicating source data into a data lake.

Typically, the Medallion Architecture consists of three layers:

  • Raw Layer – a direct copy of the source data
  • Transform Layer – where basic cleansing and minor transformations occur
  • Curated Layer – where modelled, business-ready data (final data products) is made available to users
CryspIQ

Bottom line: Is it worth reconsidering whether we truly need three layers, especially if a single, well-designed layer could meet our needs? Traditional approaches may not always align with the agility required in today’s AI-driven landscape.

Usage

Using these data products assumes the business user will consult the data catalogue to understand each product’s contents and decide which are relevant. This requires a degree of technical proficiency—to connect to datasets, extract and combine the data, and integrate it with other products they deem useful. The user must also be able to prepare and link the data to carry out meaningful analysis. In the end, this still results in descriptive, “what happened” analytics, limited by the predefined structure and scope of the available data products.

CryspIQ

Bottom line: It might be worth reflecting on how the responsibility for building queries, reports, and dashboards has shifted toward the business. Is this the most effective setup, or is there an opportunity to rebalance the support between IT and business teams?

Outcome

When you take a step back and look at this linear, phased approach, it quickly becomes clear: it offers no real guarantee of business value at the end. There’s a significant upfront investment — in time, money, and effort — with little to show for it along the way. It’s a high-risk model built on deferred outcomes. And let’s be honest, that’s the opposite of Agile.

So, we challenge you to think differently—by focusing on what the business truly needs:

  • Fewer data processing layers
  • To solve real use cases or business problems
  • Reduced reliance on specialist technical skillsets
  • Unrestricted access to data across the organisation
  • High-quality data delivered with the right business context
  • The ability to answer all analytical questions, not just a limited set of predefined ones
  • To avoid the super swamp!
CryspIQ

Bottom line: Considering natural human behaviour—and the growing role of AI—there’s a real risk of drifting into a complex and overwhelming data landscape. But if that’s the direction things are heading, there’s still time to course-correct. We’d be happy to help you explore a clearer, more sustainable path forward. Join us

What's a Super Swamp?

· 2 min read
Dan Peacock
Chief Hustler

Data Lakehouses becoming Super Data Swamps highlights the strategic risk organisations face when modern data platforms scale without proper governance or value discipline.

The application of data governance often lags well behind the enthusiasm for filling data lakes and lakehouses. With the rapid adoption of AI, that gap is no longer theoretical—it’s becoming a costly and unavoidable reality for many organisations.

CryspIQ

As data volumes explode and AI models depend on that data for high-stakes decisions, the absence of governance introduces serious risks—from degraded insight quality to regulatory exposure. Without a clear framework for trust, control, and accountability, the promise of AI quickly turns into a minefield of uncertainty.

After speaking with a few people about this, one comment stuck with me: “We’ve gone from Data Lakes… to Data Hubs… to Data Lakehouses… and now to Super Swamps.” It made my day—not because it’s funny (though it is), but because it’s true.

More data doesn’t equal better AI. In fact, the opposite is often true: quality over quantity is what drives real intelligence.

In short: it's time to change the methodology.