Joining Hands and Singing Merrily Part 4

The last post I wrote in this series focused on the problem of authentication and the two places where we need a solid solution for it in XFIL. We saw that user authentication is solved in the traditional way, by requiring an email and password, but makes use of some more recent technologies such as a second factor for authentication. Our discussion of user authentication also gave us an opportunity to emphasize the importance of securing user credentials at rest, and how we can do that using password hashing libraries such as Stratum's scryptauth. We also presented our solution to authenticating agent software, which uses a challenge-response protocol taking advantage of asymmetric cryptography. In this post, we're going to look at the problem of authorization, and some of the challenges of addressing it in a microservice-based architecture.

Authorization vs Authentication

Often, authorization and authentication are thought of as being one and the same. However, the two have important differences, and thinking about them separately can be critical to the design of a system. In the context of software, I would define authentication as something along the lines of

A process of determining the authenticity of a claim, e.g. that a user is who they claim to be

A user submitting a standard login form could thus be thought of as that user making a claim like "I claim to own the account identified by the username U and prove this claim by including the secret password P tied to the account." As per usual, a server would then verify this claim, thereby determining its authenticity, by running a database query to obtain the password hash tied to the provided username before comparing the provided password to the retrieved password hash, if any.

On the other hand, authorization is a substantially more nuanced concept, which I might define like

A process of determining whether a particular action is allowed to be carried out by a particular actor.

Consider the case of a web discussion forum. On such a site, there may be a policy that any user can be allowed to create a new discussion thread, however only users who are friends are allowed to send private messages to one another. These two rules are used to determine actions that users are authorized to carry out. It is often the case, as in this example, that a policy will require an actor's identity claim to have already been authenticated before authorizing certain actions. In some cases, applications may even require further authentication before authorizing particularly sensitive actions, such as a password change.

Monolithic Applications vs Micro Services

In a monolithic software architecture, all of the modules or components of a system are baked into a single application with a single codebase. This often means, in the case of web applications, having a single server that will handle all of the authentication and authorization, data processing, presentation, communication with a database, and so on. In such a scenario, modules tend to have a high degree of reliance on the specifics of one another's implementations. I believe this to be a large part of the reason why authentication and authorization are so often seen as indistinct from one another. In the case of the monolithic application, when a user triggers an action, it is often the responsibility of the same module to determine whether the user's identity claim is authentic and whether that user is to be authorized to perform the requested action.

By contrast, a microservice architecture consists of multiple services, with the responsibilities of the entire system appropriately distributed amongst them. In XFIL's case, there are several services contributing to the completion of a given action. The critical distinction between a microservice architecture and a monolithic architecture here is that there is no overlap between services, except for data that may be stored in databases used by multiple services. The consequence of this is that the only context a service has with which to handle an incoming request is derived from whatever is provided to it by either the user directly or another service. Under such design constraints, one might be tempted to design very "noisy" solutions that require a lot of back and forth communication.

Let's use a generic example to see what such an approach might look like.

Here we can see that, with an auth service handling all of the authentication and authorization tasks, in an approach that attempts to mimic the monolithic architecture's design, we can end up with every service invoking the auth service to help complete the user's action. One can easily imagine this image expanding to include more services partaking in the completion of more complicated actions. In a bid to reduce the noisiness of the system, one may decide to design their most backend services, i.e. those which should never have a user interacting with them directly, to trust that, if it is being invoked, it is only by another service that has already taken care of any requisite authentication and authorization checks. Such an approach should never be adopted! This method leads to incredibly brittle systems that could be completely open to exploitation if they ever ended up Internet-facing, and because designs with such implicit trust requirements only lend themselves to compounding issues as complexity increases. What we need is a mechanism that will make our systems less noisy and also allow us to encode our trust requirements explicitly.

Distributed Authorization With JWTs

JWTs to the rescue! In order to make our services more self-sufficient and their communication less noisy, we can leverage the power of a neat little piece of technology known as JSON Web Tokens, or JWTs for short. Taking the definition from,

JSON Web Tokens are an open, industry standard method for representing claims securely between two parties.

JWTs consist of two JSON objects and a signature, each encoded and separated by the period (.) character.

  1. A header section
  2. A claims section
  3. A signature over the above two sections

JWTs are used in a lot of modern applications, including Google's Identity Platform.

Using JWTs, we can reduce the responsibility of our auth service so that it is not expected to be able to handle authorizing actions itself. Instead, it handles authentication, maintains relationships between different entities, and serves JWTs describing relationships between a given entity and other entities in the system that are a concern for permission determination. If we had a generic system wherein some entity would be granted read/write access to different resources depending on its position in a hierarchy, we could encode enough information for a service to make a decision about authorization into a claim like the following.

  "createdAt": <date>,
  "expiresAt": <date>,
  "parent": "<id of parent entity>",
  "children": ["<id>", "<id>"]

As you can see, our set of claims can actually be incredibly small! We can leave more specific details to other services to figure out using the relational knowledge present in the JWT claims. That is precisely where we reach the point of all of this. We have designed our services in such a way that each one is capable of authorizing (or rejecting) actions that it is responsible for completing. Instead of every service involved in an action invoking the auth service to check if an action is allowed, one service can invoke the auth service to authenticate the actor in question and obtain a JWT which can be passed on to other services, effectively delegating the task of authorizing actions to those other services.

Proof-of-Work Protocols

Everything described above works really well for handling events wherein an actor invokes a single service to perform an action, which may be delegated to several other services to be completed. However it doesn't give us everything we need to handle actions that depend on a previous stage having been completed. One solution that I have adopted to enforce adequate authorization tests for such scenarios are what I have come to call, somewhat grandiosely I confess, proof of work protocols. The idea is fairly simple. An action A2 that depends on the completion of another action A1, where A1 leads to some state change S, should be implemented in such a way that it is only possible to complete A2 after the existence of S has been confirmed. But that's pretty vague, so let's look at a trimmed-down example.

To summarize the diagram above:

  1. A user initiates a first action A, providing a token T as proof of their identity.
  2. The frontend application verifies that T is a valid token.
  3. The auth service checks that T exists and then produces a JWT J describing the owner of T, which can be included in requests to other services as in the previous section.
  4. Using Redis in this example, the frontend application creates a mapping from T to some state change identifier, which could be anything, S. A mapping from the concatenation of T, ":", and S is created to some empty collection of data that will be manipulated in the next phase of the action.
  5. The user informed of the successful completion of the first action.
  6. The user initiates the second action, attempting to upload some data D, and again provides their identifying token T.
  7. The service responsible for handling this action can now effectively verify T by checking (in our example) Redis to see if T maps to an S value, which it retrieves.
  8. The service completes the second action now that it knows the necessary key to access the data to manipulate.

It's worth noting for the sake of completeness that this example has its flaws. For example, if the user initiated the first action several times before moving to the second action, S would be overwritten and old T:S states would be ignored. To make this technique practical, you will have to adapt it to your use case. What's interesting to me about this approach is that it accomplishes a few particularly useful things:

  1. It reduces the load on the auth service by not requiring it be invoked by other services for the second action
  2. The absolute minimum amount of data required is communicated between services
  3. The agent's reported T value is never fully "trusted." If it is not verified in step seven, anything the agent does is ultimately ignored
  4. We never need to inform the agent of what S is

Coming up...

So there we have our discussion of authorization and how we distribute the responsibility to authorize actions across different services. We've seen how we can use JSON Web Tokens, served by a trusted authority, to delegate the responsibility of authorizing actions to the services that are actually responsible for completing a given action, rather than building a single, huge service that would somehow have to handle all authorization requests. We also saw at an example of how to design solutions to multi-stage actions by marking state changes from step to step with unique, secret identifiers, and by making it impossible to complete later steps without having obtained knowledge of prior state change identifiers. In the next post, we're going to conclude this series by looking at the details of access controls in a microservice-based architecture like XFIL's. This discussion will give us an opportunity to look at capability-based security and demonstrate an interesting means of implementing it in REST API services.