In our last post, we explored how complex vendor pricing models make managing costs nearly impossible, and how to negotiate with your observability vendor to reduce your cloud bill.
Today, we’re going to shift our focus from the observability vendors, and towards the organizational problems that lead to runaway observability spending. Then, we’ll examine the steps you can take to improve the cost efficiency and sustainability of your observability investments.
The Problem of Ownership
As we’ve discussed, observability vendors use complex pricing models and fear-based messaging to encourage over-consumption and over-provisioning. But vendors aren’t the only ones to blame for high observability costs. Uncontrolled observability spending is often a symptom of broader systemic issues within an engineering organization.
For many organizations, observability is not a high priority concern. It often takes an SLA to lapse or a massive vendor bill to bring observability to the forefront. For most orgs, a considerable amount of technical debt has accrued at this point, making the path forward complicated at best, and an absolute mess at worst.
Problems tend to arise most frequently when there is no clear ownership over observability. Without centralized ownership, teams operating in silos will often adopt specialized observability products and tools with overlapping functionality. This leads to tool sprawl, redundant vendor costs and duplicate data storage and processing.
Additionally, without guidelines, ICs tend to make poorly-optimized decisions that can lead to wasteful resource consumption. Without standardization across an organization, data quality becomes a significant issue as well. Logging data, in particular, can get out of hand quickly. This makes it harder to respond to outages and increases your o11y bill without a proportional decrease in MTTR.
So, how can you build a healthy observability culture in your organization? Here are some practical steps from an old hand in observability that will help you do just that.
Step 0: Get buy-in and assign ownership
To sustainably lower observability costs, you need buy-in from leadership. Without buy-in, the status quo just won’t change. If and when your organization is aligned on the need to improve the state of your observability, it's time to allocate engineering resources to getting it done. Ideally, there are clearly defined, cross-cutting observability owners who are given the authority and agency to make any necessary changes.
Step 1: Establish objectives
Set some high-level objectives for your organization’s observability. Some examples include a lower MTTR, reduced spending, or consolidating toolchains.
Step 2: Implement standardization
A great place to start your organizational transformation is to introduce and enforce standards. For example, enforcing a standard data format and shape for your logging data. You can do this by implementing logging libraries to enforce a data format (JSON is a good choice). You should also embrace open standards for log metadata, including details in each log, like:
- Service name
- Message
- Log level
- Team name
That last one is important - accountability is the only way to make changes stick. After enforcing these standards, you will have a better understanding of who is generating what volume of log data. Map the log data volume to the value added by the team that’s generating it for a simple heuristic that determines who needs to reign in their logging.
Standardizing your logging data comes with the added benefit of more performant and optimized queries and indexes.
Step 3: Understand the status quo
Now that you have a handle on your logging data, start cataloging the lifecycle of your existing telemetry data. Aggregate and analyze your query patterns to understand how different teams consume the data. This can be a lengthy process. Observability pipeline tools like Datable.io or Cribl provide insights into the nature and shape of your telemetry data before it hits your vendors, streamlining the process considerably.
Once you know what data is collected and how it’s used, you can categorize and prioritize the telemetry data by use case. For example, you might group the data collected and its subsequent usage by importance (mission critical metrics and logs), sensitivity, or retention requirements. Assigning the estimated cost of each use case will provide a good sense of whether your organization’s spending is misaligned with its priorities.
Step 4: Evaluate and consolidate
During the cataloging step, you may have uncovered overlapping tools with duplicate functionalities. For many orgs, reducing the number of cloud services is a priority, and this is a great opportunity to start limiting tool sprawl. Users of each tool will likely champion their product as superior. When evaluating different observability tools, we think it’s important to focus on ergonomics. Which tools best fit into existing workflows? It’s difficult to get people to use a tool in an optimal way if it’s unpleasant to use. Build standards around logging frameworks, observability pipelines, and dashboarding tools that have the most buy-in, the most enthusiastic users, and the most approachable interfaces for the teams who will be using the tool in their day-to-day.
Step 5: Refine and maintain
After setting the groundwork with standardized practices and streamlined tools, the journey towards efficient observability doesn't end. Continuous refinement and maintenance are key to ensuring sustainable observability costs and high operational efficiency. Keep your observability practices lean and effective through:
- Continuous Monitoring and Optimization: Regularly review your telemetry data usage and costs. Look for trends in data generation and query patterns that might indicate inefficiencies or areas for optimization. Tools that offer insights into data flow and usage can be invaluable here.
- Feedback Loops: Establish feedback mechanisms from the users of your observability tools to the owners. This ensures that the tools and practices remain aligned with the needs of the teams and can adapt to changing requirements.
- Training and Awareness: Keep your engineering teams informed and educated about best practices in observability. Regular training sessions, documentation, and guidelines can help maintain high data quality and efficient use of observability resources.
- Cost-Benefit Analysis: Continuously perform cost-benefit analyses to ensure that the value derived from your observability investments justifies the expense. This involves not just looking at the direct costs, but also considering the impact on MTTR, system reliability, and customer satisfaction.
Conclusion
Overspending on observability is often a sign of deeper issues within an organization's approach to managing telemetry data and tooling. By addressing these challenges head-on through strategic ownership, standardization, tool consolidation, and continuous refinement, companies can turn observability into a cost-effective asset rather than a financial burden.
Remember, the goal of observability isn't just to have comprehensive monitoring in place, but to do so in a way that is economically sustainable and adds real value to the organization. With a thoughtful approach, you can ensure that your observability infrastructure is not only robust and scalable but also cost-effective and aligned with your business objectives.
By adopting these strategies, your organization can enhance its operational efficiency, reduce downtime, and make more informed decisions. Observability pipelines like Datable.io can jumpstart organizational efforts to improve observability by giving you a global look at your telemetry data, combined with powerful real-time data processing to clean and optimize your data before it lands with vendors. If your organization is ready to take a more proactive approach to observability, consider signing up for our waitlist here.
Until next time!