Joining Hands and Singing Merrily Part 5

Welcome to the last of Stratum Security's series of blog posts about the XFIL team's secure software development process. If you have made it here, hopefully you have already read the previous four posts. If you haven't yet, I hope you would consider reading through them from the beginning or even just from wherever you find things get the most interesting. Last time, I wrote about the problem of authorization. We started out by disambiguating authorization and authentication, and looked at how authorization differs in the traditional monolithic application architecture compared to a microservices architecture. We also introduced JSON Web Tokens and showed how they can be used to delegate authorization responsibilities to different services. At the end of the post we tied things up by talking about multi-step user actions and how we can embed information into the protocols we design to complete such actions in order to prevent malicious actors from skipping steps. In this last post we're going to dive into the subject of access controls. We'll discuss some of the shortcomings of traditional access control methods, present an introduction to the concept of capability-based security, and take a detailed look at how we can take advantage of JWTs to implement capability-based access controls in a REST API service. Let's get started!

Access Control

You're probably already familiar with the idea of access control, and if you're not you can probably guess what it's about just by the name. Access controls are bits of functionality in applications that are used to control users' access to data. As you might suspect, access control is pretty closely related to authorization. However, in this case we would expect to have already established that a user is authorized to perform an action, and now need to determine what data that action can affect.

Access Control Matrices

One popular way to implement access control is to create a binary matrix with one axis representing actors in the system, and the other representing other entities or pieces of data that each actor may or may not have access to. We can illustrate this kind of approach with a table like the following, which might be implemented as a database table.

Matrices like these are convenient to work with in one aspect because they are fairly simple to look at and understand. Provided that you have modeled your use cases accurately, your application should be able to use the data in your access control matrix to completely reconstruct the context that a request to perform an action was issued in. For example, if we were building a forum application around some scientific data gathered by users, you might look at this table and reason that

  • User2 is banned from viewing any data, but may be granted permission to read forum posts
  • User3 has been granted permission to take part in both studies

Already, you can imagine that a simple table like this one has some serious ambiguities. We would have to add a lot more columns to cover every possible action on every possible entity, and table would explode if we had user to user relations. It is all too easy for applications to wind up with access control vulnerabilities as a result of failing to properly inspect and interpret access controls from a matrix. A problem with this approach is that, even if you have managed to create a suitable matrix at one step of the development process, a great deal of understanding of the table is lost to other developers to whom the designer's understanding is not communicated.

Another problem with these matrices is the way that complexity explodes as columns are added. Every time one column is added, a the complexity of checking permissions grows relative to the number of other columns that have to be taken into account when building context around the new one. For better or worse, production systems don't tend to become simpler over time, so room for making mistakes and failing to properly encode appropriate tests into software grows with the software's complexity.

Capability-Based Security

Capability-based security is a model for designing systems using capabilities, which are essentially unforgeable tokens, to communicate information about what data an actor has access to. In other words, a capability is like a key that unlocks a box which should only contain data to which access is granted in a particular scenario. The problem with this box analogy is that it doesn't really help us imagine how we might implement this model. Instead let's break this concept down into its constituent parts and think about how we can implement it in the context of a microservice architecture such as the ones we established in the last post.

As soon as I read the words "unforgeable token" I immediately think of applying cryptography to solve the unforgeable constraint. Of course, as with any system, we want to be careful with how quickly we throw crypto into the mix, but this time we've really gotten lucky. As it happens, we already have some pretty useful tokens floating around our system in the form of JWTs! Better still, our JWTs already have a means of preventing forgery using cryptographic signatures. Indeed, our auth service is already perfectly poised to serve as an authority for distributing capabilities. It's almost as if we planned for this!

The second half of this picture pertains to communicating access rights. We already have a partial solution in the form of the JWT's claims section, but we need a strategy for reliably interpreting the claims to restrict access. Let's start with a hypothetical application, which will include roles to distinguish different types of users with differing permissions.

Here we're returning to our forum application example, which we could imagine having at least an auth service and a service for the forum itself. Each of the relational diagrams above contains the bare minimum necessary to implement the kind of capability-based system I will describe here, but you could easily imagine each entity containing more data.

First, let's illustrate what the claims in the JWT would contain.

    "createdAt": <date>,
    "expiresAt": <date>,
    "id": "<user id>",
    "permissions": [
        "group": "<group (forum) id>",
        "role": "<user role in group>"
        "group": "<group id>",
        "role": "<user role>"

Next, let's describe (part of) an access policy.

  1. Any user can create a new thread in a group they belong to.
  2. Any user can post in a thread in a group they belong to.
  3. Moderators can do the above as well as delete threads in groups they are moderators in.
  4. Admins can do any of the above in any group as well as delete any thread or post.

This much policy gives us enough room to illustrate the flexibility and precision offered by a capability-based system. Let's now discuss how we actually enforce this policy. For our example, let us suppose we are using the PostgreSQL database management system. PostgreSQL is a personal favorite of mine, and we use it extensively behind the scenes for XFIL. PostgreSQL, as the name implies, is a relational database using SQL to read and modify data, and happens to support some fancy behavior that allows us to embed access controls right into even our most basic SQL statements.

Aside: Database management systems like PostgreSQL, MySQL, etc. are all incredibly mature, optimized, and battle tested software. Any opportunity to push work to the DBMS instead of doing it in the application, especially when it means reducing the amount of data that needs to be read into memory or transactions that need to be issued, should be seized whenever possible.

We are actually going to implement all of the logic for restricting access to data right in the SQL queries we use. To do this, we're going to change the shape of our queries to take advantage of a feature of PostgreSQL. Where we would usually write something like

insert into discussion_threads (
  title, created_by, group_id
) values (
  $1, $2, $3

to create a new discussion thread, we can in fact also write

insert into discussion_threads (
  title, created_by, group_id
select $1, $2, $3;

The only difference here is that instead of using values() to specify the data to insert, we have used a select statement, which can also have a where clause! Normally, we would see the select <THING> where <CONDITION> pattern in queries that read data wherein the where clause filters the set returned by the query, but in this case, the pattern is actually used to effectively disable/enable the insertion. That is, if we instead write

insert into discussion_threads (
  title, created_by, group_id
select $1, $2, $3
where <CONDITION>;

then an insertion will only occur as long as the <CONDITION> evaluates to true. We can then use this behavior to enforce our policies.

-- Any user can create a new thread in a group they belong to.
insert into discussion_threads (
  title, created_by, group_id
select $1, $2, $3
where $3 = ANY( $4 ) or $5;

This where condition looks a little strange, but what we're aiming to do here is add two extra parameters to the usual query. The first is going to check that the group_id is one in a list that the user has permission to create threads in, which will be parameter $4. More on how we actually get that list in a bit. The fifth parameter $5 is going to be either true or false if the actor is an admin or not respectively.

-- Any user can post in a thread in a group they belong to.
insert into posts (
  content, author, thread_id
select $1, $2, $3
where exists (
  from discussion_threads D
  where = $3 and (D.group_id = ANY( $4 ) or $5)

Here we're basically doing the same thing as in the last query, but we're also testing the relationship between the thread that the post is being added to against the user's group permissions.

-- Moderators can do the above as well as delete threads in groups they are moderators in.
-- Admins can do any of the above in any group as well as delete any thread or post.
delete from discussion_threads
where id = $1 (and group_id = ANY( $2 ) or $3);

The above example demonstrates how simple it is to implement delete operations. I will leave out the delete from posts implementation since it will be similar to insert into posts.

Now there's one last detail to fill in to complete the picture, and that is to explain where the extra two parameters for each query comes from.

  1. A list (array) of IDs of groups that the user belongs to
  2. A boolean, true if the user is an admin, else false

Quite simply, both of these values will come from the JWT's claims. After verifying the JWT, we can get each parameter as follows.

// For a delete action, restrict to groups that the user is a mod or admin in.
const allowedUsers = ['moderator']
const accessibleGroups =
  .filter((permission) => allowedUsers.includes(permission.role))
  .map((permission) =>
// Let us suppose there is a group 0 which contains only admins, and so if a
// user is in that group, they have admin permission, and are an admin universally.
const isAdmin =
  .filter((permission) => permission.role === 'admin')
  .length > 0

Copying and adjusting this code would certainly be a pain to do for every endpoint handler your application implements, but fortunately this problem can be almost entirely alleviated by writing a function that generates wrappers for your handlers or by writing middleware. So that's the whole process! Let's summarize everything we just went through.

  1. We designed our database schema in such a way that it would contain a piece of information that acts as a "keyhole" of sorts, where the "key" is supplied by the auth service (the group_id field)
  2. We defined our access policies and then enforced them at the level of our SQL statements by using PostgreSQL's insert ... select support
  3. We extracted the capability (or "key") from a JSON Web Token provided by the auth service

It's true that this approach requires a bit of work to make work consistently. You need to be a little clever with your schema design, manage a tiny bit of duplication of identifiers across schemas, understand how to use SQL to add extra conditions to all kinds of queries, and write some functions to allow you to concisely define your role-based restrictions (in our case) and automatically process a JWT. Despite this work, I feel that the value we get far outweighs the cost. We've pushed a lot of work to the database, implemented our policy enforcement in SQL instead of application code prone to change, and at the end of the day we can do all of this fairly elegantly once we have it figured out.

Final Thoughts About Capability-Based Systems

Hopefully it is clear how the JWTs served by an auth service act as capabilities here, and don’t seem arbitrary. Our application isn’t going to know anything about data in the database without explicitly querying for it, and each query we have implemented can only be executed to a meaningful effect when some useful data (group IDs) have been supplied by the auth service. We could always go out of our way and write code to circumvent these restrictions, but doing so would amount to malicious or negligent behavior on the part of the developer. I realize that, having just read a bunch of code and SQL it might be hard to imagine how this approach could be better than what people have been doing. Let us not forget that learning new things isn't always easy- it took me plenty of time to study and figure this all out myself. We should also remember that current approaches aren't exactly working terribly well for a lot of applications. I want to advocate for capability-based security because I genuinely think it is simpler and more elegant. Even though we, in our examples, had to write a bunch of SQL statements that are arguably a bit more complicated than one might usually write, we isolated all of the complexity of handling access control to each query as necessary. That allowed us to say exactly what we mean, and no more, and not have to worry about other parts of the software. This approach takes the centralized, tangled web of access control logic common in most applications and breaks it apart into small pieces in a way that can be trivially reproduced in other services, keeping with our theme of building things that play nice with a distributed architecture. It's also built on very simple principles that, once understood, can be fairly easily abstracted. I hope to see more developers try adopting this kind of approach. It's an area from which some research and ORM support could greatly benefit.

Parting Words for Developers

If you're a developer and you're reading this, I'd like to offer a few points of advice summed up from this series of posts which I hope will help you make good decisions about building software securely.

  1. Take the time to design and specify things up front. Documentation and testing are your friends, so make sure to use them well.
  2. Understand that security is not a feature of a system. In all things you build, design with not only the worst, but even the malicious in mind.
  3. Be extremely wary of complexity and resist the urge to bolt features on top of each other.
  4. Cryptography is your friend, but not a panacea. Know why you need it, how to use it properly, and the implications of using it.
  5. Practice breaking things. Intentionally vulnerable apps exist for you to learn how software fails, and that knowledge is incredibly valuable.
  6. Keep learning and experiment often, but not in production.


We've finally reached the end of this series of blog posts. When I set out to start writing these, I expected I might need to write three posts. Then the number became "three if I squish some unrelated topics together" and after some deliberation, I finally settled on five. If you've read all of these posts (or honestly, any of them), I'm very grateful to you. It's been a lot of fun writing about all of the fascinating problems I've been looking into at work, and I sincerely hope that some of this has been valuable to you. We've covered an incredibly broad range of things here and there's still so much more I'd like to write about. In the future I may also write some more about Rust, my current programming language darling, or dive into more of the specifics of one of the topics we've already covered. If at any point during these posts you've found anything unclear, wanted to know more about something, or even just want to chat with me, I can be reached on Twitter.

Until next time!