Automating an un-automatable access issue

This is a “face-palm – why didn’t we think of this earlier?” situation that I thought I’d share. Maybe it will help someone else.

So here’s the problem…

Normally we manage access to various systems using “groups”. If you’re in the group, you have access, and if you’re not in the group, you don’t. Organizations standardize on names like: access_APPNAME_readonly and access_APPNAME_admin and so on. You get the point.

Typically these groups are an LDAP/ActiveDirectory group, or the analogous function in the cloud, or SAML, or in Okta. Whatever it is, any sane application will plug into one or more of those systems to regulate access.

This works just fine until you have one application that doesn’t work that way. Then you have to invent some kind of non-standard, exception process for handling that one app.

In my career I’ve seen a number of systems that didn’t support any of those standard mechanisms. I’ve seen specific user names compiled into the code; requiring a PR to make any changes. I’ve seen applications that implemented their own role and group system with a homegrown API which isn’t supported by ActiveDirectory, LDAP, Okta, or anything else in the planet.

This blog post is about a system which had access control that wasn’t as bad as “compiled into the code” but was worse than a non-standard API plus our half-step solution.

Why didn’t the problem get fixed?

For years we did updates to that system manually. Why?

All the standard reasons!

Such updates crossed many security boundaries, making automation a pain to implement.

A mistake would cause a cascade of other problems and outages, thus required an expert to do the change. The expert didn’t have time to automate the process, and didn’t trust anyone else to write the automation.

The work was invisible to management therefore improvements were never resourced.

The frequency of updates was minimal at first, but grew slowly over time. Thus, the ROI assumptions of yesterday were no longer true, but the work’s invisibility meant those assumptions would never be revisited.

Who was feeling the pain?

The manual process wasn’t just bad for my team, it affected multiple teams. The central IT team managed access control. When they received a request for access to this application, they would manually create a ticket with my team. This was a non-standard process for them. Then my team had the manual process of badgering the expert until they had time to do the task… though eventually the expert left the company and my team started doing the task themselves. Then the central IT team had to manually monitor the status of the ticket. Then requester had to read the email saying the ticket was complete, verify the work was done right, yadda yadda yadda.

While this was creating manual work for many teams, it didn’t seem to be creating any pain for the team that maintained the application. That’s probably why it wasn’t getting fixed. But I digress…

What was our solution?

We still haven’t fixed the application but we did make a big improvement.

We replaced the ticket-centric process with something more standard: We told the central IT team that the application was now controlled by a particular LDAP group. Now they could manage access control just like everything else. At least we eliminated one non-standard process in the company.

However, the application really couldn’t access the LDAP group. That’s how we got into this situation in the first place. I’m not saying we lied to corporate IT… because they agreed to the plan, but in the Hollywood film version of this movie… it would totally have been presented that way.

The application code hasn’t changed. It still doesn’t pay attention to LDAP groups. However, now we could write a script that would sync the LDAP group to the application. We could run this script hourly, daily, or whatever.

This solved the security boundary problem because our LDAP system took care of getting the information closer to the application. Our script simply did the last few inches.

And that’s it!

It works. The application team didn’t have to change their code. The central IT team has one less weird edge-case to consider.

Of course it works. We really just replaced the communication mechanism (tickets) with the replication that LDAP does so well.

Why didn’t we think of this sooner?

Why didn’t we think of this ages ago? I think it was a special kind of perfectionism. We had this idea in our head: applications should do their access via standard mechanisms and the only way to fix the problem was to get the application developers to add the code we had in mind.

TL;DR:

If you’re using tickets as a work-around for a process that can’t be automated, consider reframing the problem. Use the existing system not to control access, but to communicate about the access. Then a script can bridge the gap.

In other words:

Applications should manage access control via a group mechanism like LDAP.
If your process for exceptional (in the negative sense) applications involves tickets, consider having the process update a group anyway. Then sync the group information to the application.
If you hadn’t thought of this earlier, don’t worry… other smart people didn’t think of this either too.

The end.