Technology

Home /
Continuous Architecture Framework (CAF) /
Technology

Technology

In his seminal article “Why Software Is Eating The World” ¹ Marc Andreessen observes Software is eating much of the value chain of industries that primarily exist in the physical world. For example in today’s cars, software runs the engines, controls safety features, entertains passengers and has the ability to drive autonomously. The direct consequence of this trend is that we are under an exponential flow of changes to support new business ambitions and an increasing rate of change of technologies.

But we also have another problem to deal with. Unless you’re working in a quite recent company, it’s likely that your Information System is the result of decades of changes. And perhaps it suffered over time a quite disordered evolution because of time and cost constraints for instance. Not even mentionning that per Conways law, it’s more than likely that your Information System is structured as your organisation possibly in siloes with complex integrations. We all witness every day a complex structure generated by the interweaving data flows between applications. Such complex information systems are strongly coupled, difficult to evolve, costly to maintain and subject to failures. Here the resistance of the system to change is at its maximum. It’s more and more difficult to deploy new features at the expected rate of change. Maintaining the system is costly, technical debt is increasing and the risk of failure is increasing with negatives impacts on quality of service.

These two phenomenoms combined are quite explosive. Our Continuous Architecture Framework is then trying to help you dealing with these huge changes by focusing our 4 areas:

Architecting for evolutivity and modularity
Architecting for scalability and resilience
Architecting for continuous delivery and operability
Symbiotic coupling of machines and humans ²

You can see them as principles that form the basis we believe architects should keep in mind to make their decisions. All in all they are designed to help architects designing systems that will be able to adapt to future needs that we don’t know yet, in a sustainable way.

Architecting for evolutivity and modularity

Modularity is key as monoliths impede agility. By increasing isolation between software components, we can deliver parts of the system both rapidly and independently. Improving agility is not the only benefit of modularity. Jim Gray in his study “Why Do Computers Stop and What Can Be Done About It?” , one of the first fundamental papers of fault-tolerance in distributed systems, discusses the origins and implication of failures and what should be done to design resilient, fault-tolerant computer systems and key ideas to achieve high availability are: modularity and redundancy. With modularity, a failure in one module affects only that module (isolation). With redundancy (also call process-pairs in Jim Gray’s paper) if a module failed, you give the illusion that you have an instantaneous repair, meaning that the MTTR is zero. If the MTTR (Mean Time To Repair) is zero, then availability is 100%, always on (as availability = MTBF / (MTBF + MTTR)).

Event Driven Architecture has become quite popular these days and is indeed a quite efficient approach to design our products with minimal coupling. It also promotes flexible composition of software components which is quite interesting to deal with unexpected changes. The micro-services pattern is also more and more used to implement products as it’s easier to have redundant deployment of each service and to leverage self-healing kubernetes platform feature to give the illusion of a zero MTTR for instance. Composition and integration between the different building blocks are more and more lead by APIs, both asynchronous (Kafka topics as an ex.) and synchronous APIs (REST APIS mostly), with event-driven APIs as the first choice.

Micro-services architecture, even-driven architecture, modularity, all are software architecture principles or patterns which have been around for a long time, but they are not commonly mastered. To help architects designing systems using these patterns, we do recommend to leverage the Domain Driven Design methodology. Among the many practices & tools DDD has, here are the one we used the most so far:

Bounded-contexts help to move away from monolith and design cohesive domain model for each business microservice. It forces to think about vertical decomposition instead of horizontal one.
Strategic patterns help to describe flow of data models and relationship between business domains/business context. It helps to make explicit where we need flexibility, where we need rigidity in our information system.
Event-storming workshop help to identify domain events and model the behavior of business domain.

Architecting for scalability and resilience

While Redundancy and modularity can help improve resilience, by themselves they are not not sufficient. We need to consider resilience for what it is: a technic that aims to design systems that can recover from failures including expected big ones. Scalability and resilience come together as in today’s world (“Think web scale”), we must be able to sustain unexpected workload in an order of magnitude we are not used to. It encompasses the capacity to elastically adapt the running infrastructure to the workload the end to end system is really dealing with. It means we need to scale up and down on demand so the system is still responsive regardless the usage conditions while optimizing the running cost. That is only possible if the solution is highly observable so we can detect deterioration of some key performance metrics.

The complexity induced by the distributed nature of the systems we’re building re-emphasize the absolute criticity to design solutions that are resilient to failures. More than even, our solutions are relying on multiple software running on & off premises and using a ton of middleware and other technical capabilities to get things done. Here we have to change the mindset of our teams as they need to accept failures for facts (and not probabilities) so they can design solutions that are able to survive them and resume to normal behavior in an automated fashion.

The reactive manifesto put it all together and we recognize ourselves in the way they define the changes we need to adapt to:

These changes are happening because application requirements have changed dramatically in recent years. Only a few years ago a large application had tens of servers, seconds of response time, hours of offline maintenance and gigabytes of data. Today applications are deployed on everything from mobile devices to cloud-based clusters running thousands of multi-core processors. Users expect millisecond response times and 100% uptime. Data is measured in Petabytes. Today’s demands are simply not met by yesterday’s software architectures. We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognized individually: we want systems that are Responsive, Resilient, Elastic and Message Driven. We call these Reactive Systems.

Architecting for continuous delivery and operability

The 2017 State of DevOps Report observes the quality of architecture correlates with fast CI/CD pipelines and more broadly with IT performance. This is in line with the 2015 DevOps report’s findings which states that high-performing teams are more likely to have loosely-coupled architectures than medium and low-performing teams.

We all know our solutions will be around for a couple of years if not more. It means we have a responsibility to design it in a way it can be deployed frequently (so we can bring value to our customer quickly), easily (so we reduce the risk to deliver new stuff) and without any downtime. We call this architecting for continuous delivery.

Continuous delivery is a practice that aims to reduce as much as possible the delay between a commit in the software configuration management tool and the deployment of this code in production. Why doing this? Having source code not deployed in production is like having an item is inventory. It has unknown bugs. It may break scaling or cause downtime. It might be a great implementation of a feature nobody wants. Until you push it to production, you can’t be sure.

As your inventory gets larger, a bigger deployment (3 times a year for instance) with more changes could definitely be risky. When those risks materialize, the most natural reaction is to add review steps as a way to mitigate future risks. It slows down the release cycle and increase the feature stock so the risk to deploy them. There’s only one way to break out this vicious cycle: if it hurts, do it more often. In other words, deploy frequently small increments to limit the risk of breaking existing things and get feedback on these new features.

Frequent deployment also enables us to tend to near zero downtime paradigm. In the old world, we would find delivery teams developing new features, zipping their solution along with a release note and throwing it over the wall to operations teams. Operations would then schedule a release with a “planned” downtime. And here come the Fallacy of planned downtime. From an end user perspective, we must keep in mind, a down time is a down time; period. There is no such thing called “planned downtime”. All he sees is: my product not available.

Architecting for continuous delivery means we do everything we can to enable frequent deployment while keeping the risk and impact of new features under control. It encompasses the previously discussed modularity & decoupling so we can limit the impact but also all the technical capabilities to automate deployment and roll back in case of unexpected impacts. If the impact occurs (and it will), we can quickly be back on the previous stable state. We can also implement the feature toggles practice. It’s a way to push to production new features but hide them from end users until someone decides to make them visible to some or all users. Instead of keeping these features in inventory, you push them even if you have not decided to expose them to your users.

Getting teams to deliver fast CI/CD pipelines and then value for their end users is not only a matter of mindset. It requires technologies:

Cloud computing that delivers Infrastructure as a Service (IaaS) and Product as a Service (PaaS) services easier to use as they are build with autononmy and automation in mind.
Container platforms: pending
API Manager that not only act a efficient proxies in front of your APIs but that can also help
Event-driven broker and streaming platform
and much more

It’s clear that all product teams can’t afford to put in place such foundations and keep them up to date. Creating a product-centric delivery approach to deliver these capabilities as platforms can clearly help here. Those platforms are products delivered using long-live agile/devops teams with the responsabilities to plan-build-run technologies capabilities (internally developed and/or externally procured) as explained on the link:product.html[product page]. By technologies capabilities we mean data bases, integration middleware, infrastructure as code, containers, monitoring toolsets … On regular basis, these teams provide new releases to the whole organization. There are clear benefits you can expect from such approach:

Up to date capabilities
Self-services & automation
regular flow of new features
…

All of this helps product teams to do their job which deliver a continuous flow of features to their end users.

see: https://www.wsj.com/articles/SB10001424053111903480904576512250915629460 ↩︎
see: The Design of Future Things by Don Norman. ISBN: 978-0-465-00228-3 ↩︎