Typical web applications spend most of their time reading data from a data storage (i.e. a database) which is then processed and converted to HTML for the desired frontend visualization.
Furthermore, when relying on programming languages such as PHP that are essentially based on the “shared nothing principle” the performance overhead for consecutive processing of possible the same request flow becomes obvious. Even when deploying modern PHP frameworks the basic flow of events often times remains pretty much the same and rather simple at the core:
- determine a controller/action for handling an URL specified in a request
- read required data from database based on the data model at hand
- send response to client and wait for more to come
Since performance is a key factor for visitors to not bounce we need to make sure to minimize any potential overhead in the process of delivering responses to them.
Horizontal Scaling and Caching
In contrast to PHP, Java or Node.js for instance don’t follow the shared nothing principle and keep read already data in memory that can potentially be shared across requests. So why do we not just use an Application Server such as Apache Tomcat and save all database queries in memory and programmatically decide when to persist data? Wouldn’t this just solve our waiting time for reading data from a database before it get converted and sent back to our clients? Well, this heavily depends on the software and system architecture at hand.
Imagine the simplest case with one application/database server. With the increasing amount of website traffic your server will have to be upgraded (vertical scaling) to meet your increasing memory requirements in order to keep data read accessible without querying your database again.
Horizontal scaling to the rescue?
At some point you will realize that you are forced to do horizontal scaling and add more machines to cope with this situation (which actually is part of designing the software and system architecture). Unfortunately, horizontal scaling adds an additional layer of complexity since you are now required to synchronize your application and data across multiple nodes. Furthermore, you need to make sure which processes are allowed to read and which are allowed to write to your database in order to prevent data-inconsistencies and possible race-conditions.
Caching to the rescue?
In order to improve the performance of web applications we oftentimes deploy additional caching mechanism to reduce database queries and unnecessary frontend rendering steps. Thus, technologies such as Varnish Cache are deployed to serve as a Reverse Proxy and full-page cache (FPC). These setups are required to handle user-specific frontend data too. For instance, imagine an online shop. Once a user logs in (in fact also prior to this) personalized data will be rendered and displayed in the frontend. The FPC will need to be able to cope with this situation too, which Varnish in fact is capable of using Edge Side Includes (ESI).
But, these personalized, dynamic frontend fragments that are rather costly to generate are in fact not part of the actual cache. Furthermore, deploying caching technologies adds an additional frontend layer that needs to be dealt with. Also, in practice purging only parts of a web-application is oftentimes not that easy to achieve. Thus, caching is not the definitive answer for our performance problem.
Command Query Responsibility Segregation to the rescue
Going back one step to our initial problem at hand we need to realize that the part of reading data from the underlying database is costly since it means that we also need to render the frontend response. So the actual goal is to determine when we are in fact required to (re-)generate frontend responses by going through all the steps of our process workflow.
Let’s look at this situation by using an online shop as example. In general, product data is not likely to change with every request. In fact, we only want to (re-)generate frontend data if product data has changed. Thus, we need to differentiate between read and write operations in our data models in order to be able to decide when to trigger a potential (re-)generation of frontend data. This is when Command Query Responsibility Segregation (CQRS) comes into play.
CQRS is an architecture pattern that strictly separates read and write operations. For data models this basically means that we now have two classes instead of one:
- class for read operations (“R”)
- class for write operations (“W”)
Based on this separation of concerns we are now able to have another closer look at our performance issue at hand, since now only write operations are legible to trigger (re-)generations of frontend data whereas read operations merely serve already existing data.
Going back to our online shop example we now only trigger the (re-)generation of a product page for the frontend once product data is changed (for instance in the ERP). Furthermore, we are able to generate frontend snippets based on this approach too, for instance to generate product item previews on category pages. These snippets need to be generated in a view that they can be persisted to a key-value-store. This way, we are able to quickly load pre-generated frontend snippets from high-performance key-value stores.
Key-Value-Store as simple cache?
You might think that the key-value-store just described is nothing more than a simple cache, right? Wrong! There are fundamental differences between them. From the frontend perspective the key-value-store became the primary data source instead of the underlying database. Thus, if the key-value-store is missing an entry the application will behave as if this entry is missing in the database. Hence, entries in the key-value-store do not have a TTL since they are per definition always up-to-date. The frontend does not know that these entries are updated by external components (through write operations as previously discussed) and it does not care too. Furthermore, the process of superseding entries with newer ones in regular caching solutions such as Varnish also does not happen for the key-value-store.
Micro services for frontend
In order for such an architecture to work efficiently micro services are deployed in the frontend to separate write requests, such as adding items to a cart in online shops. The aforementioned frontend snippets are generated in the background and added to the key-value-store waiting to be updated by write operations. This way, the front- and backend are completely separated and can be scaled independently. So in case you hit a traffic peak you are able to add additional frontend instances in order to tackle potential performance issues beforehand, pretty neat right?