By John Giere, CEO • July 5, 2021
The most important asset of a technology business is data. When it comes to moving your business to the cloud, how you manage your data is a critical success factor. Broadly, a cloud-native solution significantly enhances your competitive position.
The benefits of this architecture are well understood in terms of addressing the need for agile innovation, increasing operational flexibility, delivering faster product version releases, high resiliency, and resource elasticity. What is not straightforward is the strategy of how to manage your data to achieve optimal total cost of ownership (TCO) and performance in a cloud-native scenario.
When Optiva embarked on this journey, we quickly discovered that the traditional database approach was not fit for a cloud-native architecture. We could see clearly that no “one” database solution solved both how we manage and effectively persist our data. We realized that to be successful we had to revolutionize how we addressed the management of our varying data types.
We started our quest to find the right answer by taking a close look at the existing database technology landscape. Most database solutions managed several specific classes of data and related management scenarios very well. However, none of them had the capability to manage all the required database performance scenarios found in the demanding BSS domain, such as:
- Traditional SQL database – ACID compliance, scale vertical, end-to-end resiliency, and heavy dependency on hardware resources
- Distributed database – ACID compliance with latency impacts, scaling may impact latency, synchronization and data durability needs to be watched
- No SQL type database – horizontally scalable but generally has limitations on full ACID compliance. Furthermore, they are optimized to a specific data access pattern or data structure.
- Managed database – different products for different use cases, portability partially given as they are linked to dedicated infrastructures, tuning options are limited, performance and latency needs may directly drive a high TCO, and managing it is not in your control
Looking at all of the challenges and limitations on capability, we reset our thinking as to how we approach solving the task of persisting our myriad of data in the cloud. Instead of looking for a specific database to solve our issues in the traditional sense, we focused on solving how we manage our data in a cloud-native architecture to achieve our target business goals.
Using this methodology, we began by identifying the baseline telecom industry requirements to operate in a cloud-native mode:
- Portability – telecom operators want to have the option to deploy in their infrastructure of choice, either private, public, or hybrid cloud, and retain the flexibility to migrate between them with minimal business impact on the current mode of operation (avoid major migration projects).
- Cloud-native fundamentals of ephemerality – at first glance, the cloud fundamentals of never knowing when a resource or component is going to disappear — ephemerality — seems to be in conflict with the telco mantra of greater than five-nines reliability and availability. In reality, it is a trade-off. Architecting around this aspect leads to significant gains with respect to what matters most — serviceability.
- Scalability and high performance – When it comes to high performance, 5G will demand ultra-low latency for several different use cases. This will take into account different deployments like centralized versus on the edge site, enable tuning end-to-end, and consider dedicated different infrastructure challenges in an abstract way. Additionally, the solution must comply with telco-grade SLA requirements for applications and databases.
- TCO – as traffic is expected to grow exponentially due to new use cases, it is important to optimize the required data management and its associated TCO.
Where do databases correctly “fit” in a cloud-native architecture? — our experience
The cloud-native implementation of the core products at Optiva allowed us to re-think and re-architect our products to use what we refer to as our cloud-native data persistence (CDP) approach. We looked at different data storage technologies and compared them to cloud-native databases as well as managed products serving cloud infrastructure. We baselined against different data management performance metrics depending on the use case and aligned with relevant operational scenarios, e.g., charging, billing, analytics, customer management, self management, etc.
We developed our CDP management from these findings, which we captured into a framework based on the following principles:
#1: Tune in to portability and high performance
- Support various deployment options with the same software version — “implement once, deploy often.” The chosen data technologies need the flexibility to use multiple infrastructures from private or public cloud, hosted or managed infrastructure, and the ability to react on local conditions for data center setups, intersite latency, regional distribution, etc., especially when you are architecting an active/action solution.
- Autotuning – technologies, different infrastructures, and new hardware provide new options for tuning. The key element is that the tuning is focused on the data layer to have it localized. We designed our solution to autotune to key variables, including the deployment resources, resiliency settings, and application profile. This enables us to maintain our “zero-touch” principle within our CNA.
#2: Pay attention to cloud resource availability and reliability
- Don’t forget the unexpected resource removal – database technology needs to operate in a container orchestration environment, acknowledging that a resource will unexpectedly disappear, which means that designs implicitly contain resiliency from the resource level up.
- Verify the algorithm on various non-functional behaviors – the algorithm on the resiliency from node recovery, scaling in and out, and upgrade process are often the same. Special focus should be given to ensure zero business impact on any maintenance activity or failover scenarios. All of these unexpected occurrences will happen, and a true cloud database will continue with no interruptions in serviceability.
#3: Remember to address elasticity and latency SLA
- Elasticity of distributed setup – high-availability systems need to have clear architecture on scalability, resilience, and data durability. To ensure you meet the SLA, make sure to:
– Choose your “sync” model carefully to achieve performance and resiliency.
– Change the database access pattern to tune the queries to achieve low latency.
– Remove data that is not needed for your core, decompose it, orchestrate it via a message bus, and move to near-real-time, cloud-native app processes.
– Ensure resiliency design in a way that the probability of failure matches with the utilization of the database resources.
- Look out for latency when scaling – distributed data layers have partitioning mechanisms that do the corresponding split to resources with different algorithms available. Verify which ones fit you best and how you can tune scaling to avoid impact on latency.
#4: Cost balancing act — ensure the best TCO
- Decide on the impact of lost resources – distribute the resources to ensure the best resiliency and define what should be the maximum impact, e.g., 30% resource loss, continue handling100% traffic, zone failure, etc.). Define the RTO/RPO corresponding to the restoration of resources and align to the capabilities the infrastructure gives you.
- Maintain attention on managed resources – it is possible that you will need more resources to achieve your SLA like more storage as needed. Because IOPS is the limiting factor, scaling is only possible with dedicated utilization, etc.
Do databases become part but not all of your data persistence solution?
The concept of “databases” will, of course, continue to exist, however, they do not form the complete solution to your data management requirements. Our view is that architecting your data management with the principles that underpin a cloud-native solution must be the foremost phase in the journey. This is applicable and expected while also leveraging varying data management technologies, including the myriad of database solutions. After all, data is your most valuable asset, and proper management of it is vital to successfully and fully leveraging the benefits of the cloud.
Have feedback or questions for the author? Contact John Giere, CEO, Optiva
Discover more! Read Cloud: Lessons Learned From the Front Lines.