2020 left no doubt: the growth of cloud computing is firmly grounded in the SaaS business model. Investors like Bessemer have bet and made billions on the SaaS trajectory. Behind the curtain, selling essentially the same software to different users and companies, again and again, relies on a distinct product architecture: secure multi-tenancy.
On the technical side of the coin, this means one-to-many, loosely-coupled deployable code with higher flexibility, portability, security, and operational transparency. Done right, multi-tenancy through tenant isolation delivers benefits to both you as a SaaS vendor and to your customers.
Tenant isolation is the keystone of the SaaS architecture, holding it all together and keeping it up and running. It is the secret to effective tiering and scaling.
Let’s take a closer look at what that means and how it works.
- The Fundamental Goal of SaaS tenant Isolation – Know what tenancy is, its types, and key services from a user/customer context.
- Isolation strategy drivers – Key factors that drive isolation and essential considerations before you choose your isolation strategy for current and future needs.
- Types of isolation – Find out the different approaches to isolation with their pros/cons.
- Additional Isolation Options – Supplementary isolation approaches focused on compute and data Storage considerations.
- Weighing Your Options – Discover the advantages and disadvantages of isolation approaches; find out when to opt for a hybrid model and problems it solves for an organization transitioning to a SaaS-ready architecture.
Given the speed and intensity of competition in this market, it’s essential to SaaS success at any scale --- and at any point on the lifecycle of your SaaS product offering. If you don't get it right, your customers are just a point-and-click away dumping you.
The Fundamental Goal of SaaS Tenant Isolation
Selling the same software to different users relies on using cloud-based resources that can be leveraged across different customers. This is a not one-sided convenience to just make life easier for the technology side: it’s a delivery model that structures how resources within the SaaS platform serve your customers.
SaaS multi-tenancy means achieving a reliable level of efficiency and security, delivering an application that is feature-rich and cost-effective. A tenant is the set of application services dedicated to a single specific set of users and customers. Multi-tenancy is the set of architectural choices that makes that possible. At the most basic level, you can think of this as a choice between Single or Multi-Tenant Architecture.
- The Single-Tenant architecture that dedicates specific software and infrastructure services to a single customer (in many ways similar to the world before cloud). This essentially means a single stack is purposefully built and maintained only for a single customer.
- A Multi-Tenant SaaS model that leverages a single software and infrastructure instance to serve multiple customers, thus bringing in efficient usability and greater cost benefits.
There's no one-size-fits-all SaaS architecture, so practical strategies of building such frameworks will vary. The AWS Well-Architected Framework is one such approach that helps adopt architectural best practices (whether or not you run on AWS) and adapt continuously.
As we'll see below, not every component must be strictly single-tenant or multi-tenant. More specifically, when application services are running in support of multiple tenants, there are additional considerations to factor tenant context for these services. Let’s take a closer look.
Isolation vs. Authentication & Authorization
Isolation is a fundamental choice in a SaaS architecture because security and reliability are not a single construct. Here's how this looks through the lens of SaaS architecture.
- Isolation involves the creation of mechanisms and policies that apply and enforce tenant context. This allows an organization to enjoy the benefits of several tenants pooling resources while enforcing security and restricted access. As an organization considers a full-lifecycle approach to the SaaS model, isolation must become a top-down cloud development goal. In effect, this approach of achieving high-level isolation guides the subsequent design choices in an application’s environment. (More details on patterns and strategies to enforce isolation in an organization are outlined below.)
- Authentication & Authorization enforce access control, which comprises only a small part of the picture when compared to isolation strategies. Authentication & Authorization only gets a user through an application’s login process; it does not guide access to resources within a SaaS environment. This is like having a key to the front door, but no control of what goes on once that door is opened. In a shared environment, complete isolation involves more mechanisms, such as Identity & Access Management (aka IAM), to ensure tenants can access only those resources they are entitled to see. That means striking the balance between making it easy for the right user to get the right access to the right resources, without creating too much complexity so that every such access operates completely independently.
Another essential benefit of identity in a tenant context is that it aids in capturing and analyzing events from logs & metrics. This helps to gain visibility of the consumption and utilization behaviors of a SaaS application. By cross-referencing such events to specific tenants and users, SaaS businesses can achieve visibility is critical to a workload’s operation.
Bearing these contexts in mind, your SaaS product roadmap needs to account for an isolation strategy that suits your use-cases. As use-cases differ, so do isolation strategies.
How to choose an isolation strategy
Five key aspects drive the choice of isolation strategy for a SaaS application. These include:
- Product Tiering Strategy: Tenants can be categorized into different profiles, and then packaged into different isolation approaches for each profile. This means different levels and strategies of isolation across the range of client using an application’s services. When you charge a premium for better service, you can choose to isolate some of the resources to deliver a better product experience to those who have paid for it.
- User and Customer (Tenant) Experience - some users could place an excessive load on an application’s service or resource, adversely affecting other tenants. Such “noisy neighbors” can be assigned their own isolation protocol to minimize potential impacts on resource consumption, an important benefit of an isolation strategy. The goal is a customer-centric model in which each tenant is allowed to use only a rightful share of the pooled resources without impacting other tenants’ services.
- Compliance Requirements - perhaps the most significant drivers of isolation are Data Compliance and Consumer Safety/Privacy.These are critical factors driving an organization to opt for an isolation strategy. Governing authorities lay down regulation on private data protection. That, of course, dictates the isolation models used for information and access management. Often, Independent Software Vendors (ISVs) and public cloud providers with certified regulatory requirements are always perceived to be of greater value, and can command a premium.
- Legacy Architecture - the constructs of the legacy architecture that supports an application also directly affect the choice of isolation model. It's not unusual to have to build with software that you inherit as part of your product, that is not as flexible as you might want it to be, particularly given the time constraints of competition. building with the softer you have instead of the softer you want limits your options in making trade-offs in your isolation strategy.
- Customization Opportunity Sometimes, the customer has specific requirements on the degree of isolations, in softener and/or Hardware. Again, given time constraints, you may need to do what works for them before you can get to what works for you.
Choices of isolation model
Once your SaaS business analyzes the factors that influence isolation within its SaaS model, the next task is to choose the right isolation model. The approaches to protecting and isolating resources in multi-tenant environments include:
Considered as the classic approach, Silo models enable every tenant to get a separate stack of an application’s services. Each getting access to dedicated resources, Even if that means some redundancy and duplication (and perhaps the inevitable consequences, complexity and inefficiency). For instance, every tenant using an application might have access to its own database. Services within the tenant’s cloud instance are then used to enforce resource isolation. Queries going from one company and its users only reach that one database.
Now, there are certain advantages of using a Silo model, such as the reduced scope of impact, easier cost tracking, etc. To an ISV organization looking for faster movement to a new cloud-hosted infrastructure, the Silo model is a common short-term choice that can be adopted at a lower cost than other models. However, because each application is using its own copy of resources, pool and sharing are strictly limited. One customer might be using 100% of their tenant resources, while another customer's tenant resources sit idle; hence, the model offers limited scalability and reduced agility. Changes to one tenant environment need to be faithfully reproduced to all the others.
Strictly speaking, one temptation in the silo model leads to customizing some part of the stack for one tenant, but not propagating those changes to any others. Development and deployment require more operational discipline, otherwise each tenant becomes its own “snowflake” and technical debt snowballs. On the other hand, disciplined automation can cut down these deployment propagation delays, so that all tenants can be kept in sync.
In this model, tenants share some or all of the infrastructure and elements in an application’s SaaS environment. This allows shared services such as logging, object storage, user onboarding, etc., to be leveraged across multiple tenants. While doing so, all tenants are hosted in a unified environment, and runtime policies are used to control access to resources.
Mixed isolation Model
A particularly powerful variant of the application of preceding Pool/Policy-Based Model is in adding services from a single-tenant environment into a multi-tenant, shared services architecture. It is also very popular to containerize existing applications, and use shared leveraged services To reduce the burdens of overhead. When this happens a lot oh, you end up with the mixed isolation model coming up next.
This works for use-cases requiring a strategy that is not exclusively Silo or Pool-based, referred to as a mixed-model. Some SaaS environment elements are implemented in a Pool model, and some in a Silo model. SaaS businesses moving to a more versatile architecture, a Mixed model is often considered as an important transition model.
For instance, an application’s regulatory profile and noisy neighbor attributes are typically resolved through a Silo model. On the other hand, the cost profile, access patterns, and agility of another microservice may necessitate using a Pool model.
Fundamentally, the two models (Silo and Pool/Policy Based) are widely known to fit industry-wide use-cases. In exceptional cases, an organization may opt for a Hybrid model that picks suitable profiles from each of the models to meet its requirement. However, adopting a Hybrid model requires extensive expertise and is rarely a short term effort.
With the above in mind, let us take a look at different isolation strategies suitable for different architectural frameworks.
Silo Based isolation Strategies
In Silo based full-stack isolations, tenants get dedicated resources but are managed and operated through a single, unified experience. All tenants’ services run simultaneously, get updated at the same time, and run the same version of the software. Particularly in automating software provisioning, Silo models offer the benefit of maintaining consistent versions across all tenants.
When done right, siloed isolation can help achieve the SaaS-centric goals of operational efficiency, innovation, and agility. Each tenant in the environment is straightforward, as they are all provided by a different VPC/Account that holds the full-stack of services needed. Adding a new tenant is as simple as provisioning a new account or VPC with the same infrastructure footprints as the other tenants. Besides, this strategy also includes a collection of microservices that orchestrate operations, onboarding, and management. In effect, this introduces a number of services within a Control Plane that are used to manage and operate the tenants.
Architecture solutions that can be used to enforce Silo-based isolation include:
Account Per Tenant
Multi-account offerings help simplify architecture while maintaining tenant isolation in separated networks. We can see this through AWS Control Tower, a standardized provisioning solution that allows to automate how users set up and configure their accounts. AWS Control Tower embeds your multi-account best practices, making it efficient for larger workloads and teams looking to transition quickly to the cloud.
VPC per tenant
This strategy lets all tenants share an account but are separated by a VPC Virtual Network Layer. That means each tenant gets a secure virtual network, and the level of separation is maintained at the VPC layer. Typically, a CloudFormation or Hashicorp Terraform template can be used to create new tenant environments, and then further take advantage of the hub-and-spoke model to scale the application. Such templates also make it easy to perform complex builds for customers while making the application accurate, compliant and secure.
Subnet per Tenant
This is a rare flavor in which the VPC is broken down into subnets, and each is assigned a tenant. In this case, VPC need not be set up peering for communication between tenants.
These are important advantages, but they also introduce risks. A silo isolation strategy can hinder resource sharing, both at runtime and upstream in development. A strategy that enforces functional separation can also have unintended organizational side effects. Engineering specializations can become silos: As a result, team-level isolation and local optimizations can undermine feature innovation, shared learning from successes and failures, and even distract developers from what customers really value.
Pool-based isolation lets tenants share SaaS resources such as compute, storage, and messaging to achieve agility, manageability, and scalability. This allows some elements to run siloed, then uses fine-grained boundaries to define context across shared elements. Some key features of pool-based isolation include:
SaaS platforms can use Identity and Access Management (IAM) to define roles that ensure tenants do not cross context-boundaries. At its core, IAM roles allow users to temporarily gain permissions through the Security Token Service (STS). This essentially allows the usage of a Token Vending Tool to hide tenant isolation view from day-to-day development while creating a single path to resource access.
Runtime Scoped Access
This approach makes use of permission templates to create and load runtime policies. Developers can then add properly formed, valid security permissions to these templates, providing a layer of isolation.
Overcoming Limits with Dynamically Generated Policies
IAM policies could hit limits based on the number of roles and tenants in a SaaS environment. With Dynamically Generated Policies, a token manager and token generators are used to automatically populate policy templates that scale up based on the number of tenants.
Other isolation strategies
In general, it's a good idea to work towards a comprehensive SaaS architecture that supports rapid Innovation and delivers robust security and reliability. Given the versatility of the SaaS product architecture and the many business problems it can solve, it should not be surprising that there are certain use-cases that require domain-specific isolation considerations. Still, there are ways to solve tactical infrastructure challenges that deliver the benefits of isolation without the pain of deep refactoring of the application.
Compute isolation Patterns
Several approaches can help create a Silo-based isolated compute construct in a Cloud platform. Some of these include:
- Create a cluster of nodes per tenant
- Use IAM and other platform constructs to prevent tenant boundary-crossing.
- Control the scope of downstream interactions
SaaS predict platforms can also take advantage of Container Clusters or Serverless Platforms such as AWS Lambda, GCP Functions, and Azure Functions to attach IAM roles for Silo Compute Isolation.
For Pool based compute isolation, every node in a cluster is shared by various tenants. Each node gets a non-tenant specific policy that doesn’t constrain tenant experience. To do so, IAM policies provide context for every interaction, while a runtime-acquired tenant scope (e.g Amazon Cognito) can be used for access authorization. The tenants can then access compute resources (Lambda or Azure Functions, etc.) using the context outlined in these policies. To deal with Containers that offer a challenge by allowing a level of cross-tenant access, this can be remedied using namespaces, Amazon EKS, or other third-party tools.
Data Storage isolation Patterns
When it comes to data storage, Silo based isolation calls for the creation of a separate database for each tenant. Policies can then be set up to restrict cross-tenant access. In pool isolation, however, all tenants gain access to the same data set separated using keys. Keys and policies define the rows and columns that a client can access.
Making Pragmatic Choices
The greatest challenge with isolation is that there is no universal strategy to address all isolation needs. Every storage technology may support its own isolation model. Plus, IAM may not support all Silo/Pool based isolation constructs. A good alternative in such instances is often to use a top-down Application-Enforced isolation. To enforce application-wide isolation, it is recommended to consider the merits and demerits of each isolation flavor.
By creating a Hybrid model, you can take advantage of each model’s benefits to create the perfect isolation strategy based on a service type. More importantly, SaaS offering businesses that are transitioning to a higher architecture, Hybrid models allow adopting practical benefits with each iteration. When done right, a Hybrid model is highly adaptive as it picks the best approaches suitable to a specific use-case. However, adopting and managing such a model is often tricky, something that requires skilled architects who understand a framework’s nuances precisely.
Conclusion and Takeaways
Isolation is one of the foundation elements of SaaS, and every multi-tenant solution should ensure resources are isolated from the get-go. It is imperative to always design with isolation in mind, noting that isolation and authentication/authorization are distinct parts of a Well-Architected framework. As you create your isolation approach, you should always consider scale and account limits.
It is also important to note that there is no single rule-of-thumb while building out an isolation strategy. Some use-cases may require building a custom solution to address isolation. On the contrary, specific use-cases such as multi-service applications may require a Hybrid isolation model to meet the aspirations of a Well-Architected SaaS offering.
However, the end goal always remains the same - to design a framework that aids the development and deployment of reliable, optimized, and secured SaaS applications. It’s the key to the value of delivering SaaS applications for application Developers and customers alike.