Data ingestion into security information and event management (SIEM) is too expensive. In fact, it’s so expensive that “How do we reduce our SIEM ingest costs?” is one of the top inquiry questions I get from Forrester clients. And the problem is not new — security leaders have struggled with managing their SIEM budget for over a decade.

Visibility without actionability is an expensive waste of time.

The increasing spend in SIEM is driven by a few factors. First, the shift to the cloud produced more data to intake and store. To scale at the rate of ingest, SIEM vendors moved their offerings to the cloud — a shift that necessitated ingest-based pricing to balance out cost. But most importantly, the crux of SIEM cost challenges stems from the belief that more data in the SIEM is better. Security is a big data problem, right? More data, more visibility, more insights … right?

Not quite. Data — and subsequent visibility into that data — is meaningless without actionability. Data is brought into the SIEM for compliance requirements and for alerting on potential attacker activity. To alert on attacker activity, a human being needs to build a rule. Visibility into the data is only half the battle. You could have all the visibility in the world, but without those rules, you will not find the attackers consistently and in a more automated way.

Instead, we recommend focusing what you ingest on what’s most important for compliance and alerting. But it isn’t always easy to do so because:

  • Logs have extra fields that you don’t always need.
  • The structure changes and is different between vendors.
  • You want some logs to go to a certain datastore with others elsewhere.
  • You may want to redact data for privacy reasons.

Further, indexed data can often become 3–5x the original size. SIEM vendors have the ability to address some of these challenges, but the capabilities tend to be limited and cumbersome to use. The vendors haven’t created effective tools for log size reduction or routing especially, since it directly opposes their own interests: getting you to ingest more data into their platform and, therefore, spend more money with them.

Data pipeline management tools reduce data preparation.

This is where data pipeline management (DPM) tools for security come in. DPM tools can route, reduce, redact, enrich, or transform data. The benefits of a purpose-built data pipeline tool are to reduce the data preparation necessary to interpret the streams of data and events specific to security insights. With increasingly distributed and disparate systems, a purpose-built data pipeline tool is designed to address complexity of classification, integration, and modeling data for analysis.

Security teams get immediate value from its ability to reduce log sizes and thus ingest costs. In the longer term, however, much of the value comes from storage tiering or data routing — being able to redirect data to the storage location of your choice. For example, short-term data valuable for incident response can be routed directly to extended detection and response (XDR), while data for compliance requirements can be directed to longer-term, cheaper storage. This can be useful across the business, especially for those that have data storage requirements for different use cases such as compliance, detection and response, or IT.

When it comes to DPM tools for security, Cribl is one of the earliest to market and the most ubiquitous, but others such as Tenzir, Tarsal, DataBahn, Calyptia, observIQ, and Observo AI are also built to manage data pipelines for security use cases. Some SIEM and XDR vendors are also building more robust data pipeline management capabilities, like Splunk’s Data Management Pipeline Builders and CrowdStrike’s CrowdStream (CrowdStream leverages Cribl).

Generic DPM tools lack security-specific context.

Data pipeline management tools are not new; your enterprise likely uses them already, especially on the data team, but they are likely not specific to the security use case, which makes them more cumbersome for the security team to retrofit to support the security use case. For example, it will become more difficult to transform data to align to a standard like OCSF (Open Cybersecurity Schema Framework), since generic tools will not support the framework. The tools may also lack the integrations into security tools you need.

With that said, in upcoming reports, Forrester will be releasing research on data use case crossover and consolidation.

In December, I’ll be speaking on security data management strategies at Forrester’s Security & Risk Summit in Baltimore, Maryland. Come join us and get your questions answered!

In the meantime, if you have any questions about data pipeline management for security and IT, request an inquiry or guidance session with me or one of my colleagues.