Part of the Secure Continuous Delivery series.

Secure Architecture and Design

Sunday, 11 December 2016

After that diversion on the Google bug bounty hijack, we’re back on our series on Secure Continuous Delivery.

So far we’ve introduced a reference delivery pipeline and talked about the need for education and awareness. In this article we’re going to dive into the important topic of producing a secure systems architecture and design.

For the purposes of this article, the distinction between architecture and design is not critical. We’re primarily interested in what happens before the system is implemented. That doesn’t imply a particular approach to building software - you could be using Waterfall, Agile, or something else entirely. It also doesn’t imply formal design documents - you might choose to use formal diagrams and documents, or perhaps you prefer sketches on a whiteboard or paper. However, it does imply that you know what you’re going to build before you build it.

There are a number of ways to classify software vulnerabilities, but the most common categories you will come across are design and implementation vulnerabilities. This article is primarily concerned with design vulnerabilities (we’ll discuss implementation vulnerabilities in an upcoming article). The authors of The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities define design vulnerabilities as:

… high-level issues with program architecture, requirements, base interfaces, and key algorithms.

Keep this in mind as you read the rest of this article, as it will focus the scope of the discussion. To help make this more concrete, a recent example of a design vulnerability is the AtomBombing code injection vulnerability in Windows (all versions). The mitigation section sums it up prety well:

AtomBombing is performed just by using the underlying Windows mechanisms. There is no need to exploit operating system bugs or vulnerabilities. Since the issue cannot be fixed, there is no notion of a patch for this.

The devil is in the details

I’m conscious to tread carefully when talking about ‘detailed’ designs, particularly when working in Agile projects. I’ve seen and experienced a lot of resistance to any form of design or architecture within Agile projects, which has quite consistently shown to be detrimental to the project. It’s unfortunate that the term ‘big up-front design’ is bandied about like it’s the only kind of design there is. We should use our experience and best judgement to determine how much design is suitable to the task at hand.

To avoid confusion, I’m not suggesting that every piece of software needs to be fully specified down to the finer details. However, what I’ve seen all too often is a tendency, particularly in Agile delivery, to rush to the opposite extreme and throw the proverbial baby out with the bathwater. It is up to each team or delivery group to determine where is best to draw the line between ‘too much’ and ‘not enough’ architecture and design. An excellent reference on this topic is the aptly named Just Enough Software Architecture: A Risk-Driven Approach.

I’ve seen many examples where teams have believed their design to be secure because they’re using some open standard, when an architecture / design review quickly exposes major weaknesses. One example of this was a system that was encrypting security tokens using AES, which the team believed to the correct approach. While they had made a good decision on the use of AES, they didn’t have any experience with cryptography (no knowledge of what a block mode is) and had simply used the defaults provided by the JVM. What the team didn’t realise was that a seemingly sensible use of Cipher.getInstance("AES") actually provides an AES cipher in ECB block mode, which provided practically zero security to the system. It only took two minutes to demonstrate how one token could be used to amend another successfully and without detection. In this case, the design required the use of AES but didn’t specify how to use AES; or even better, how to use suitable tools that abstract away the details of cryptography for a team that doesn’t have crypto expertise.

I could tell similar stories about projects requiring TLS, but not validating certificate chains; or many other situations where security is either misunderstood or applied without due diligence. This is a dangerous path, as it gives the illusion of security with few (or possibly none) of the benefits. It pays dividends having a suitably skilled security architect involved in your projects from the beginning.

Complexity breeds insecurity

In November 1999, Bruce Schneier wrote an excellent essay entitled A Plea for Simplicity: You can’t secure what you don’t understand. He ends with a number of predictions, but the one that particularly resonates with me is that ‘as systems get more complex, security will get worse’.

It’s surprisingly easy to end up with an overly complex design. You may not set out to create something unnecessarily complex, but software evolves over time - and in my experience, it generally doesn’t evolve neatly. Again, that’s not beceause everyone thought it wasn’t important. It just that there’s a lot of factors that drive software delivery, and sometimes there are constraints or choices made that limit your ability to deliver in the way you’d like to. Sure, we’d all like to deliver performant, secure, accessible, user-friendly, visually appealing software all the time, every time. But that’s not the norm, and that’s also a topic for another day!

The key point to remember is that complexity is not just a problem for software engineering in general, it’s a specific problem for security. Be careful not to conflate ‘simple’ with ‘easy’. In some cases, producing a simple design can be anything but easy. So if you have the opportunity to simplify the design, take it. And if you’re starting out from scratch, don’t underestimate the value that a simple design can bring.

Design for operations

I’m fortunate to be working with clients that more or less follow the Amazon model described by Werner Vogels as ‘you build it, you run it’. This is essentially what DevOps is about. The value this brings to software operations cannot be underestimated. The quality of software tends to be much higher when the team is responsible for both building and operating the system. One of the reasons for this is the operational awareness the team gains when they see the other side. Development teams that throw software over the proverbial wall to the operations team seldom (if ever) get to see what goes into running software in production.

Regardless of whether you work in this way, a key point to remember is the need to design for operations. This doesn’t come by accident and requires a close involvement with the people that perform this role. Much like security, performance, and many other quality attributes of a system, operability is most effective when designed in from the beginning. It is difficult to retrofit.

From a security perspective, operational vulnerabilities (also defined in The Art of Software Security Assessment) are:

… issues that deal with unsafe deployment and configuration of software, unsound management and administration practices surrounding software, issues with supporting components such as application and Web servers, and direct attacks on the software’s users.

Our industry unfortunately has a long legacy of building or buying inherently insecure software products that were expected to be deployed into ‘secure’ networks. What this fails to acknowledge is that security isn’t purely the domain of the infrastructure teams, nor is every infrastructure team suitably skilled in building secure environments. When designing software, we should strive to make the product secure independent of the environment in which it is deployed (to the extent possible, of course). Providing guidance and documentation to operations teams on how to securely operate the system is also critical.

Security requires visibility

If you’re building software, you presumably are already instrumenting it for monitoring, logging and auditing. If you aren’t, this is definitely something you should be investing in. Effective software operations requires runtime visibility. Not only is this is a prerequisite for adopting an experimentation culture, but delivers significant opportunities for improving security.

Intrusion detection systems rely on the ability to monitor the system for anomalous behaviour, such as unexpected audit events, file integrity violations, running processes, etc. Without visibility into these aspects of the system, it would be significantly harder (and in many cases practically impossible) to detect an intrusion. In a similar manner, emitting well-defined audit, log and metric data from your application allows you to have greater awareness of how the app is being used (and abused). This also requires a good system for collecting and aggregating the data for analysis, visualisation and further processing.

When designing for visibility, keep in mind that the metrics you gather may be useful to both humans and machines; which leads us nicely onto our next topic of adaptive response.

Design for adaptive response

We’re in a privileged position today with the ubiquity of APIs and opportunities for automation. Our ability to programmatically control things that were previously manual, physical tasks is incredible. However, many organisations find it difficult to adjust to the new tools they have at their disposal. Instead of innovating on top of these enablers, they simply automate what they already have. That is not innovation; that’s just doing the same thing faster. True innovation means building new capabilities, features, and products that didn’t exist before; solving problems in a way that differentiates you from your competitors. If you don’t run on highly automated infrastructure, there is still a lot you can do. Some of these ideas really do require a more modern, programmable infrastructure - but not all.

In this section we consider some of the more innovative uses of modern software features, including APIs, cloud-based infrastructure, automation, data analysis, etc. It’s worth pointing out that the features themselves are not new… we just want to leverage them in new ways. One of these ideas is to design applications that can adapt in real-time and respond to attacks by defending themselves without human intervention. The OWASP AppSensor project is a great example of how this can be done, and I’d strongly encourage you to investigate this further. If you’re using a decent cloud provider (or have good APIs to control your infrastructure), you can also consider integrating different parts of the system - for example, automatically updating a WAF in response to malicious traffic.

Adaptive response does not just apply to the technical, low-level security issues; it can be extremely effective within the business domain. Some of the projects I’ve been involved in adopt techniques such as behavioural analysis, machine learning and real-time risk scoring to dynamically alter the behaviour of the system in a way that is tailored to the individual user. The opportunities are endless; but they require creative thinking around the use of the tools at your disposal.

Summary

We’ve talked about a few different ideas here, but the key takeaway is that security doesn’t happen by accident. It is carefully designed. There is a lot that can be done if you want to push the limits and drive innovation in the security space, but there’s also a solid foundation in strong security design and architecture that should be adopted as a baseline for all projects.