Patching a zero-day vulnerability: Our steps and lessons learned

Confluence folder protected by a bug lock and two bodyguards

I wish I had not slept late

There are good and bad ways to wake up in the morning.

Good morning: Includes a pot of fresh coffee and/or a bacon sandwich.

Bad morning - as we found out on June 3, 2022): Waking up to find a 10.0 severity vulnerability affecting over a hundred clients in one of our main platforms.

At 2:20 in the morning (or night, depending on your personality type) Atlassian released the Confluence zero-day vulnerability labeled CVE-2022-26134. At the time of release, Atlassian had no patch available for Confluence, and no mitigation in place.

This meant that - during the first few hours - we only have two options:

Option 1

Shut down Confluence

Option 2

Remove it from the internet.

Either of these options would have a massive impact on our client’s systems.

While we explored our options for attack and assessed customer vulnerabilities (in over 100 affected systems), Atlassian managed to release a third “possible” mitigation. This was in no way bulletproof, but it would buy us enough time to communicate with our customers and help them get them through the business day.

Option 3 (and this is a better one)

Implement the following to your frontend proxy, if you have one.

# CVE-2022-26134 first response 20220603
  - deny 156.146.0.0/16
  - deny 198.147.22.148
  - deny 45.43.19.91
  - deny 66.11.182.0/24
  - deny 67.149.61.16
  - deny 154.16.105.147
  - deny 64.64.228.239
  - deny 221.178.126.244
  - deny 59.163.248.170
  - deny 98.32.230.38
  - location ~* ".*\${.*" { deny all; }
  - if ($request_uri ~ ".*\${.*") { return 403; }
  # End CVE-2022-26134 first response

The first part of this mitigation involved the restriction of known hostile IP addresses. These were IPs that the original discoverer of the vulnerability, Volexity, were able to identify during their investigations.

But IP addresses are dynamic, and as news of the exploit spread, we fully expected to see attack attempts from other addresses, too.

The second part of this mitigation involved rejecting potential attack requests based on their use of known injection strings. This mitigation could look effective at first glance, but there was no guarantee that this malicious-request filtering provides full protection against the vulnerability.

But since Atlassian thought it could potentially provide protection, it was worth a try while we explored the impact of this vulnerability. Especially since we, at the time, didn’t have any other controls that would not affect end users immediately.

This allowed us to prepare for customer communication. According to our log analytics, this method prevented all attacks on the systems running.

We had the mitigation implemented for all customers protected under the Eficode ROOT umbrella by 10 am.

Security vs. function - a fine balancing act

While Atlassian’s suggested mitigations were in place, no information was available to verify how effective they would be. In times like these, we have to strike a very fine balance between our customers’ requirements and their safety.

We collected what little information was available to determine the best path forward. By 13:32, we determined based on customer feedback that we should offer to shut down critical instances at the end of business hours (at 17:00). This meant we had to communicate with all critically affected customers so they could communicate with their end-users.

A better mitigation was released at 18:00. It involved manually changing certain Java files on the affected systems. But at this point, our systems had been shut down for safety, or were covered by perimeter mitigation. Therefore we decided to skip this (at 19:31) in lieu of patching when a fixed version became available.

Atlassian released the patch at 20:00. Fix: Deploy patch 7.17.4 to all affected instances.

Patching began about an hour after the patch was released, and at 21:00, the patch was applied to rootdemo. We started by patching our internal testing platform to ensure there would be no issues when rolling out the patches to the client systems.

All good.

We started upgrading client instances at 21:08. after verifying that the patch was safe. And at 21:15, the first customer environment patch was confirmed. So the first successful customer environment patch was deployed within 15 minutes after the test patch.

After this, like a row of dominos, the affected Confluence systems were fixed one by one. In the midst of all this, the very hands-on CTO of Eficode Managed Services - Kalle Sirkesalo (one of the authors of these very words) - spent 30 minutes wrestling with network issues after arriving back from his recent trip to Germany. Luckily the rest of our ‘strike team’ had no such network issues.

Patching concluded at 23:20 (EEST) with all affected core Eficode ROOT instances patched, except for a few special cases, and those not affected by the vulnerability.

The next morning (Saturday, June 4), patching resumed, and by 13:00 all our known affected Managed Services instances had been patched.

During Monday, we performed a rollback of our mitigations to allow public access to instances and evaluate the attack vector, for more recent information that had come to light. Such as example exploits.

What we learned on this adventure

There are two approaches to security notification:

Release the information initially to a subsection of trusted partners, distributors, suppliers, etc. slightly before the public release, to allow for reaction and mitigation to take place before the general public is informed.
Release the information into the wild all at once. We saw this second method here.

Both methods have flaws. In the first, information is siloed, and smaller partners or interested parties are easily overlooked.

In the second, however, something different happens. During a security notification, there are three interested parties.

Service users, those who purchase the service and are interested in its function.
Service providers (like Eficode) who make it their business to offer these platforms.
Those with a technical security interest, who want to develop the exploits for these problems, for legitimate purposes or otherwise.

When you “release to all at once”, it’s like you’re firing a starting gun. The service provider finds itself in a race to outpace any malicious developments while remaining on top of their clients’ legitimate concerns and inquiries.

In this scramble scenario, the best tools to lean on are a strong security response protocol and a team of capable experts who are able to keep ahead of the curve.

What we could improve

Our customers who have frontend proxy authentication enabled (SAML2.0/Oauth2.0 with Nginx+ or comparable) were not susceptible to the attack at all, so we could make Nginx+ more available to our customers.
We could have shut down instances more aggressively, but that would have meant larger impact on customers.
Consider having premade query examples for our experts to evaluate if instances are affected by this. These were now built on the fly.
Even better internal and external customer communication.
Better communication with Atlassian to have someone from them to help us assess the situation.
Provide a better situation overview for all our customers, not just Managed Services.
Consider sharing these retrospectives with our customers.
Have network configurations ready for all customers to turn off the public internet easily.
Because Confluence 7.17.x introduced sandbox, some people thought at the beginning we were affected due to the sandboxing, showing additional processes. Better communication of changes to the services internally could mitigate this.
These situations always seem to happen on a Thursday or Friday. Early alarms should be set on those days. Just in case.

What went well

Fast reaction from our side and our experts knew how to respond.
Very quick deployment of mitigations.
End customers were most likely not severely affected by all this.
None of our customers were affected.
We managed to test our centralized monitoring better.
All our core Eficode ROOT customers got the patch within 24 hours of the release of the attack.
Normal work was not affected too badly, as we managed to build a nice strike team for these.
Communication with customers was not done only by our security manager, but also by our service managers.

Final words

We are happy with our response to this issue, but no security process is perfect. We always want to improve ourselves and our processes in situations like these. Customer feedback is important. If you are a customer and have any feedback on our reaction, or if you had expectations that were not met, please let us know.

If you have any interest in the Nginx+ solution, feel free to contact your service manager, who can explain it further. And if you are interested in our Managed Services offering, contact us today for a quote.

Published: June 17, 2022