Building a Declarative Permissioning Service

Motivation

For the first several years at Remind, our authorization model was straightforward. We had three distinct roles that dictated what you could do on the site: teacher, parent and student. For example, teachers could create classes while students couldn’t. While this role-based authorization system served us fine for a while, it eventually broke down as the complexity of our product increased.

Authorization Problems We Ran Into:

  • We wanted to start allowing students to create groups so they can manage clubs and student council, but not all students can create groups (for instance students under 13 years old).
  • As our graph of classrooms and schools increased in complexity, the rules around who can see and chat with whom became similarly complex.
  • We’ve grown to have paid school plans which gates certain advanced features to schools that have paid.

Without good guards in place, these rules devolve into spaghetti code, with conditionals and special cases all over the place. It’s hard to read, it’s hard to modify without side effects and it’s hard to reason about correctness. Especially when you have code that was written ages ago by engineers who’ve moved on from the company. We needed to simplify the way we were expressing permissions.

We wanted a way of describing arbitrarily complex authorization rules that any employee could grok from looking at the code. We wanted to be able to reuse common building blocks and separate rules descriptions from the underlying implementation and data fetching. We wanted to have authorization code that was easy to read and more importantly, easy to modify. So we built the permissions service—a declarative style of writing user permissions.

Permissions Service

The easiest way to understand the permissions service is to see a couple examples of permissions:

func CanCreateGroup(ctx context.Context) Predicate {
    return AllOf(
        Not(IsUnderThirteen(ctx)),
        Not(IsMuted(ctx)),
    )
}

// Admin console is a product for administrators at a school to oversee their school
func CanSeeAdminConsole(ctx context.Context, organizationUuid *string) Predicate {
    return AllOf(
        AnyOf(
            IsOrganizationPaid(ctx, organizationUuid),
            IsOrganizationInDemo(ctx, organizationUuid),
        )
        AnyOf(
            IsOrganizationAdmin(ctx, organizationUuid),
            IsRemindAccountManager(ctx, organizationUuid),
        )
    )
}

// Admin stats is one feature within the admin console
func CanSeeAdminStats(ctx context.Context, organizationUuid *string) Predicate {
    return AllOf(
        CanSeeAdminConsole(ctx, organizationUuid), // Defined above
        AnyOf(
            DoesPaidPlanIncludeStats(ctx, organizationUuid),
            IsRemindAccountManager(ctx, organizationUuid), // Add an override so account managers can demo the product
        )
    )
} 

The goal is that you should be able to understand who can create groups and who can see our admin console from the code above. You should feel comfortable modifying the above code even if you have never written a line of Go, which is what the service is written in. You might even be able to write new rules, repurposing the existing building blocks that you can see in the example.

There’s a lot going on in the code example above so let’s break it down.

Predicates

You’ll notice every permission returns a Predicate type. A predicate is what you’d expect:

type Predicate func() (bool, error)

What each permission is actually returning is a function that will evaluate to true or false, instead of having the permission directly return a boolean. Also you’ll notice that the Predicate above takes no arguments. By returning a Predicate instead of the boolean, we can separate the steps of binding arguments from actually evaluating the function. This makes it possible for our rules to have arbitrary function signatures, as well as make it possible to evaluate these functions however we want. For instance, we can evaluate these functions in separate goroutines and parallelize work. More on that later.

Leaf Rules

In the above example, IsUnderThirteen, IsMuted, IsRemindAccountManager and DoesPaidPlanIncludeStats are all Leaf Rules. That means they are rules that aren’t compositions of other rules (like CanSeeAdminConsole). In almost all cases, Leaf Rules are doing a fetch to another service or data store. The goal of the design is to be able to isolate these external fetches. They could all just be queries to a single PostgreSQL database or they could be doing HTTP or gRPC calls to a whole bunch of different microservices like a users service, a mute service and a paid accounts service. Even better, we can easily switch from an HTTP call to a gRPC call without touching any rules because the data fetching code is totally isolated from the rules definitions.

An example of one of our Leaf Rules:

func IsOrganizationPaid(ctx context.Context, organizationUuid *string) Predicate {
    return func() (bool, error) {
        organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
        if err != nil {
            return false, err
        }
        return organization.Paid, nil
    }
}

Where the actual data fetch is still abstracted away into a store package. I’ll dig into what that data fetch looks like later.

AnyOf, AllOf, Not

Leaf Rules are good as reusable building blocks, but the real value comes when there are clean ways of combining Leaf Rules. We currently have three combiners: AllOf, AnyOf and Not. A quick look at the code for Not:

func Not(predicate Predicate) Predicate {
    return func() (bool, error) {
        result, err := predicate()
        return !result, err
    }
}

That one’s really simple. A really basic implementation of AllOf might look like

func AllOf(predicates ...Predicate) Predicate {
    return func() (bool, error) {
        for _, predicate := range predicates {
            result, err := predicate()
            if err {
                return false, err
            }

            if !result {
                return false, nil
            }
        }
        return true, nil
    }
}

This isn’t how we implement AllOf though in the permissions service. This evaluates each predicate sequentially meaning the time it takes an AllOf to evaluate is the sum of the time if takes to evaluate all the predicates. But these Leaf Rules have no dependencies between each other so we can run them concurrently. This is the main reason we wrote the permissions service in Go.

Parallelization

As mentioned before, one of the benefits of using the Predicate type is to be able to evaluate functions in goroutines. Utilizing that, here’s what our AllOf really looks like:

func evalPredicate(pred Predicate, resultChan chan bool, errChan chan error) {
    result, err := pred()
    if err != nil {
        errChan <- err
    } else {
        resultChan <- result
    }
}

func AllOf(predicates ...Predicate) Predicate {
    return func() (bool, error) {
        childChan := make(chan bool)
        errChan := make(chan error)
        for _, predicate := range predicates {
            go evalPredicate(predicate, childChan, errChan)
        }

        for i := 0; i < len(predicates); i++ {
            select {
            case res := <-childChan:
                // Short-circuit
                if !res {
                    return false, nil
                }
            case err := <-errChan:
                return false, err
            }
        }
        return true, nil
    }
}

We heavily use goroutines to be able to parallelize evaluation of predicates. With this change, we now cut the amount of time it takes to evaluate an AllOf to be the maximum of the time it takes to evaluate the predicates. This is a huge improvement on the previous implementation. What’s really cool about this is that if we revisit our rules from above:

func CanSeeAdminConsole(ctx context.Context, organizationUuid *string) Predicate {
    return AllOf(
        AnyOf(
            IsOrganizationPaid(ctx, organizationUuid),
            IsOrganizationInDemo(ctx, organizationUuid),
        )
        AnyOf(
            IsOrganizationAdmin(ctx, organizationUuid),
            IsRemindAccountManager(ctx, organizationUuid),
        )
    )
}

func CanSeeAdminStats(ctx context.Context, organizationUuid *string) Predicate {
    return AllOf(
        CanSeeAdminConsole(ctx, organizationUuid),
        AnyOf(
            DoesPaidPlanIncludeStats(ctx, organizationUuid),
            IsRemindAccountManager(ctx, organizationUuid), // Add an override so account managers can demo the product
        )
    )
} 

Because every expensive fetch is kicked off in its own goroutine, evaluating CanSeeAdminStats will immediately start fetching IsOrganizationPaid, IsOrganizationInDemo, IsOrganizationAdmin and IsRemindAccountManager all in parallel, while also utilizing any early exit shortcuts we can. For example, if IsOrganizationAdmin is taking 100ms but isRemindAccountManager is a much faster 5ms and evaluates to true, the whole query won’t be blocked by the expensive organization admin call.

Avoiding Duplicating Work

While this is all great for evaluating permissions quickly, this results in cases where we’ll fetch the same data multiple times while evaluating a single permission. In the above example, IsRemindAccountManager will be evaluated multiple times. There are cases that are even more difficult to detect. If you look at how we check paid and demo, you can imagine something like:

func IsOrganizationPaid(ctx context.Context, organizationUuid *string) Predicate {
    return func() (bool, error) {
        organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
        if err != nil {
            return false, err
        }
        return organization.Paid, nil
    }
}

func IsOrganizationInDemo(ctx context.Context, organizationUuid *string) Predicate {
    return func() (bool, error) {
        organization, err := store.GetOrganizationByUuid(ctx, organizationUuid)
        if err != nil {
            return false, err
        }
        return organization.Demo, nil
    }
}

In which case we’re going to do a cold fetch for the organization twice in parallel, once in IsOrganizationPaid and then again in IsOrganizationInDemo. To avoid doing that, we need to keep some sort of “global” state that is shared between all rules. I put “global” in quotes because we don’t need this state to persist as long as the service is running, but instead just for the duration that a single top-level permission is being evaluated.

We use the context package for this purpose

Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes

Which is exactly what we want. In the context object, we keep a Store which facilitates deduplicating data fetches across goroutines.

type Store struct {
    ResponseCache    map[string][]byte
    RequestsInFlight map[string][]APIResponseChannels
    Mutex            *sync.Mutex
    Client           *http.Client
}

The Store

I alluded to the store package earlier that The store has:

  • A response cache so if we already got a response from a particular request, we can just serve that up instantly when it’s rerequested. It’s a map from a requestKey to a byte array response.
  • A map of requests inflight so that if we have a getOrganization call that process A calls, and then process B calls before the fetch finishes for process A (which is the common case since we heavily parallelize), we can just have process B append to the requests inflight. RequestsInFlight is a map from a requestKey to a list of subscribers. When process A’s call to getOrganization returns, it alerts everyone in the RequestsInFlight queue.
  • A mutex since we’re going to have multiple processes writing to RequestsInFlight, and need to handle concurrent writes
  • A shared http client since right now all the data fetches we do at Remind are happening over an http connection.

Finally, the last piece is to wrap the logic for all this deduplication into a clean and straightforward interface. For that, we just have a single function:

func Fetch(ctx context.Context, req *http.Request) ([]byte, error) 

Under the hood:

  • Fetch creates a requestKey which is just hash of req.
  • It then checks the ResponseCache and the RequestsInFlight map for that key.
    • If it’s in the cache, it’s easy to just return the cached value.
    • If it’s in the inflight map, it creates result and error channels and appends them to the inflight map.
    • If it’s in neither, creates result and error channels and initializes the inflight map. Then kicks on a goroutine to do the actual http fetch
      • This goroutine does an http fetch and then alerts all channels in RequestsInFlight with the results
  • If it didn’t early return from the response cache, we’ve got some channels that we can wait on that will give us the result of the http response regardless of whether this particular call kicked off the http fetch or not.

Putting It All Together

All of these pieces sum up to give us a cleanly composable set of permissions that are easy to read and modify, evaluate faster than traditional methods (through parallelization) and avoids some natural pitfalls, like duplicating work.

There are still ways the evaluation steps could be improved. For instance, short circuiting does result in lower latency, but it doesn’t prevent every leaf rule from actually doing the work it needs to produce a result, even if that result isn’t used in the end. That’s because we kick off all leaf rules in parallel so by the time we know that evaluating a leaf rule is unnecessary, it’s already probably waiting on I/O, and the costs are already incurred. This could be prevented by having a more intelligent system, where some Predicate combiners run in parallel, while some run in series, based on weights and empirical pass rates. But in trying to get an MVP out and prove out the concept, these are optimizations that add a great deal of complexity to a service that already has a lot going on.

One of our values is creating simplicity for others. While there is a lot of tech happening under the hood, the primary goal of this project has always been to create an interface for describing permissions that all developers can build on, even if they have no understanding of how the guts of the permissions service work. With this declarative style of describing permissions, engineers can confidently read, understand and modify what people can do on Remind.