Beyond Frankencloud: living with your once and future platform architecture

David Fishman

July 7, 2021

•

0 min read

Voting...

I suspect that when the legendary Ward Cunningham coined the term “technical debt”, it did not take him long to realize he’d created a monster: “I am in favor of writing code to reflect your current understanding of a problem, even if that understanding is partial. If we fail to make our program align with what we then understood, we were going to continually stumble.”

This definition, to reiterate my prior post on the subject, is most useful when it applies not merely to “time-is-money”, but to the speed of learning. The inputs to learning are speeding up fast.

Loosely-typed, scripted programming languages accelerate the requirements feedback loop between customers and developers. Yet they live on borrowed time. Interest payments accelerate faster than you realize. Frankencloud happens when the speed of customer-facing feature development is paid for by charging interest to functions outside of the direct control of the application developers.

Look out for FrankenCloud

How can you tell when you live on a Frankencloud? Here are some signs to look for:

When change is hard to make as a result of cascading dependencies up and down the stack.
When you treat your cloud provider as a fractional colo.
When business-critical applications move out of the data center with “lift and shift”.
When you optimize application logic with DBMS stored procedures (thanks, Oracle) or object-relational mapping.
Expanding LAMP stack apps by mixing of Java, PHP, Python, Ruby, Haskell (I could go on…). What starts as an easy hack for a new feature that accesses the database directly sprawls quickly into reads and writes that conflict with all kinds existing functions.

None of these examples are theoretical; we’ve seen them in isolation and combination.

The B2B SaaS arms race will be won by those who can consistently translate technical debt into development versatility: adding new features, integrating new data sources and workflow integrations, trying new technologies, retiring locked-in dependencies. and onboarding new customers.

Bottom-up cost transparency is a key feature of any public cloud platform. It works best if the learning expressed in technical debt closes the loop. Leverage in SaaS architecture, like financing leverage from the capital markets, only works when you have a plan to keep the learned improvements delivering value. (Cloud compute spend is another form of technical debt; more on that in a future post).

Make no mistake, the expressive power of loosely-typed languages has been a game-changer (see Java vs Javascript). The User Stories at the core of Agile/Scrum development style are essential for translating customer intent into feature development. The reason that user stories use prose and not code is to allow for that change; that potential is built in. Stories evolve as customers use the software, as new technologies emerge, as market behaviors come and go.

But stories change because both users and developers learn from each story. Done right, it’s an exercise in customer discovery: technical debt allows for faster exposure of opportunities. Assuming that you’ve learned everything there is to know the very first time the user story is translated into a feature? You’ve locked in the costs of your mistakes.

“[B]orrowing money, thinking that you never had to pay it back, with your credit card? Eventually, all your income goes to interest, and your purchasing power goes to zero.”Ward Cunningham

Making truth better than fiction

Customers part with their hard-earned cash when your SaaS platform continuously helps them make or keep more of that cash. (Here’s where I have a small quibble with Cunningham: when he says the objective of modifying the code is “to look as if it had been as if we had known what we’re doing all along.” Sure, it’s awesome to get it right the first time, but it’s also pretty darned rare.)

Cunningham’s real point is in the difference between a Frankencloud and a sustainable SaaS platform. It’s “where one accumulates the learnings about the application over time by modifying the program.”

This view aligns directly with Bertrand Russell’s dictum about the inevitability of approximation. The long game is in the continuous translation of value derived by approximation into features and the infrastructures they run on. Understanding technical debt as a way to improve that translation is a full-stack problem. Here are three approaches:

Agile and Sprints: Time is a first-order principle. Learning as a method to make the most of technical debt is a core value in the world of time-boxed development. It matters less whether time is allocated to manage technical debt within a single sprint, or a dedicated sprint in a release cycle. The key insight is that one attends to managing technical leverage continuously over time.
More quality costs less (formerly known as “The Boy Scout rule): Martin Fowler, an intellectual hero of mine, prefers to refer to technical debt as cruft – the difference between the current code and how it would ideally be. Fowler applies the following test. If it “adds to the time it takes for me to understand how to make a change, and also increases the chance that I’ll make a mistake” it’s cruft.

This provides an excellent corollary to Cunningham’s rule about the value of technical debt. It’s useful only if you can translate any gains you made with shortcuts in the code today to better code tomorrow. Hence, the “Boy Scout” rule: when you arrive at a campsite, always leave it in better shape than you found it. Smart software development is driven by an actionable bias for paying it forward. Fix technical debt when you find it, and the code will be ready for change when the next boy or girl scout arrives.
Well-Architected Framework scorecard: One of the great advantages AWS has over other cloud providers is what it learns from the Amazon consumer eCommerce business. It’s probably the single biggest technical-debt leverage exercise in modern software history.

That scale of mega-learning (and failure, both public and hidden) has translated into the Well-Architected Framework, a system of best practices around the key attributes of cloud and SaaS architecture. It poses a set of 300+ questions divided across 55+ best practice areas and hands you a scorecard (it’s pretty much applicable across any cloud workload; it’s beside the point that the answers can be used to flog the AWS tech stack. An alternative: the Cloud Container Solutions Cloud Native Maturity Matrix. (See a useful background talk from their CTO Pini Reznik here).

Either way, focus your attention on the most important part of your architecture: how well does it serve the goals of your business? That’s the yardstick for identifying the most valuable technical debt to invest in remediation.

Looking ahead: friction and velocity of technical debt

The best way for any SaaS offering to rise above the noise of technology tooling (Cloud Native Frameworks! DevOps transformation! Containers! Software-defined anything!) is to look at how technical professionals spend their time. What proportion goes to adding value by working on software that benefits customers and users? What proportion is trapped in non-value-added manual efforts: unplanned work or rework, security remediations, or customer-support overcoming deficient user experience?

You may be surprised to learn that among the best in class, the ratio is 50-50%. A recent leading study characterizes this as “software delivery and operational performance”. Best-in-class organizations are those where technical professionals spend half their time on “proactive or new work to design, create, and work on features, tests, and infrastructure in a structured and productive way to create value for our organizations.”

The next tier below only managed to spend 40% of their time on new work. They experience significantly worse outcomes. The measures:

(a) deployment frequency;
(b) lead time from code commit to success in production;
(c) time to restore outages; and
(d) the fraction of changes that failed.

Not that big a difference between a 50:50 and a 40:60 ratio for reinvestment – what separates SaaS winners from losers.

Technical debt is a given because it provides leverage to close the gap between what customers need and what you can give them. The best SaaS software offerings use that leverage to make software better faster.

If you’re part of the SaaS revolution, ask not what your developers are doing about technical debt. Ask instead what your technical debt can do to serve both your developers and your customers.

Share this post

Voting...

Process

David Fishman

VP Products & Services

David is a longtime Silicon Valley executive and a skilled & experienced tech leader, with decades of experience in customer facing roles practicing product and service management grounded in process analytics. His work spans cloud infrastructure, analytics, mobile/embedded and open source. He’s a startup veteran (10+ venture-funded companies, both successful outcomes and the other kind), and has also served 12+ years in product & business leadership roles at publicly-traded enterprise tech corporations.

July 7, 2021

•