It’s The Fastest System That Gets Used, Not The Most Appropriate.

red-telephone-boot-on-street-with-glowing-light-speed-4500x3181_37085A security expert once said to me, “It’s the fastest teams and the fastest systems that get used for new features, not the most appropriate, and this causes security breaches”. I never understood what he meant, until I saw it with my own eyes.

Years ago, I was working for a large company with many systems and teams. Our team was was a massively scalable data-aggregation and interrogation platform, sitting behind a dedicated separate Identification and Authentication service.

As you would expect, this ID service had gone through all the security processes, was hosted on-site etc, and was industry standard for PII (Personally Identifiable Information). It was rock solid, a bastion of security for the wider project, and pretty much a done deal in terms of data protection.

It was also extremely slow.

Slow in performance due to it’s architecture, and slow in adding new features due to the tricky nature of the project. Our system, on the other hand, was blisteringly fast, scalable, and we had a track history of delivering new features quickly.

One day, it was decided that the company needed to store users postcodes/zipcodes. This was  PII (as in some sparsely populated parts of the world a single person’s address actually has it’s own code), was part of a users core identity and so would naturally sit within the obligations of the ID system. Except the ID system wasn’t going to be able to scale to reach the demands of this particular usecase, and besides the team behind it were already behind implementing other features.

So they decided that our system would store these postcodes. Problem was, our system (and team) was security cleared for ‘Amber’ level data, that is, data that was anonymised, non identifiable and non personal. This wasn’t just a case of a bit of extra vetting; the whole architecture of the system was based around this ‘secure but not PII level secure’ concept, as were our internal processes and ways of working.

So we had to implement a whole new raft of security procedures, adopt new ways of working, construct highly detailed threat models and go through a rigorous infosec grilling, to ‘sign off’ on a level of security we were not designed as a system to achieve, while under intense pressure to deliver the new postcode feature. This retroactive upgrading of a systems security level is never as effective as building that security level in from the grass roots, and while I think we secured it well, it was far from ideal.

So the experts words were born out: the most security-appropriate system (the highly secure but slow ID system) was ignored in favour of our less secure but faster data-aggregation system. No breach that we know of resulted from this, but it was still a risky move and one that placed a lot of strain on our security apparatus.

So look out for these situations when planing a multi system platform; just because one system seems like the most natural fit for potential functionality, it doesn’t mean that it’s the one that will eventually be used. This can apply to other sorts of concerns too, not just security.

Runtime Vs. Checked Exceptions

skull5Java is strange, in that it makes a distinction between runtime exceptions (which don’t need to be declared, and can bubble all the way up to the top of the stack), and checked exceptions, which need to be either wrapped in a “try-catch” block, or declared in the method signature so that the calling method will have to deal with them.

If you know Java, then you know this (or should -it’s one of the standard phone interview questions to weed out the fakes). But do you know why this distinction exists? Or when one type should be used and not another? A lot of developers don’t; if you do, then you are at an advantage, especially in jobs where architecting core Java is required.

Let me start out by saying that this distinction is controversial, and a lot of developers say it is unnecessary; I on the other hand like it, as I think it is an elegant way of enforcing the contract between your code and the code that might call it.

But first, the canonical, theoretical answer:

  • A checked exception is an error condition that a program should be able to recover from. Examples of this would be the connection to a downstream server timing out or failing, or resource suddenly becoming unavailable. All these cases should in theory be able to be handled either by your code or by the calling code (I’ll get onto this in a bit)
  • A runtime exception is an error condition so bad that the application itself should be considered broken, and that further execution is unviable. Examples of this include NullPointerExceptions (which should never happen in a well constructed program) or ArrayIndexOutOfBoundsExceptions (again, these should never happen). In the event of one of these being thrown, the JVM should terminate as soon as possible to stop further damage being done.

You might not agree with the above definitions, but you should at least know them. Modern frameworks have their own exception handling paradigms, and modern coding involves a lot of gluing different libraries together rather than creating complex Object structures with clear exception handling strategies, so the theoretical underpinnings of Java might not be that important to you in your role. However, if you want to create a well architected, easily usable library or system with more than a few object levels, you might want to bear the above in mind.

So, assuming you’ve drunk the kool-aid and want to up your exception game, what should you do? Here are some tips

  • Work mainly with checked exceptions. Never do ‘new RuntimeException (“my problem here”)’ unless a condition occurs where the program itself is broken (it is rare that you’d have to do this; most of these conditions are covered by Java’s own set of runtime exceptions and the JVM).
  • Whenever you are calling a method that throws a checked exception, ask yourself whether your code should handle the problem, or the code calling your method. Generally it boils down to ‘who makes the decision what to do about this, this method or the method calling it?’
    • For example, your class might be a database utility class that calls a JDBC driver; if the query times out and throws an exception, you might want your code to retry the query a few times, in which case you would handle it so you’d add a try-catch block.
    • If, following on from the above example, the query still kept timing out, then you’d want to throw a new (checked) checked exception back up the stack to the calling code (or rethrow the original), so that it can decide what to do with this query that can’t be run.
    • Another example, if your class was a file system utility class for an application that allowed users to upload files, and it was calling File object methods that throw IOExceptions when the disk is full, you might want to pass those exceptions up the stack to the calling code so that it can decide what to tell the end user, or whether or not to retry with a different form of storage such as cloud.
  • Remember that runtime exceptions extend the RuntimeException class; everything else is checked.

So those are my thoughts on Java exception handling. Let me know yours in the comments section below!

‘Synchronize’ Gotcha: The Thread Locks The Object, Not The Method!

watch-cogs-wheels-parts

This is one of the fundamental aspects of Java synchronisation, but I have seen so many people get it wrong when I’m interviewing them. It’s very simple, but slightly contrary to how you might think synchronisation works:

So let’s take this code:

package com.example.shane.myapplication;

/**
 * Created by shane on 27/12/16.
 */
public class Foo {
 
 public synchronized void methodA(){
 /* Do potentially thread-unsafe stuff here */
 }

 public synchronized void methodB(){
 /* Do more potentially thread-unsafe stuff here*/
 }
 
}

Continue reading

Six Pillars Of Security, #6: Appropriate Escalation and Containment

 

  • In the event of a breach or an infringement of your companies responsibilities, timely and appropriate escalation is required.
  • During one breach I witnessed at a company I used to work for, inappropriate and untimely escalation made the situation a lot worse; the dev team and their managers failed to escalate a serious issue (users credentials being logged in a log file) quickly and appropriately, and as result the situation escalated.
    • access to files is often logged. In the case of a breach, the lower the number of people who accessed the compromised resources the smaller the aftermath (e.g in the case of sensitive data being logged to a file, it’s easier to deal with five people who accessed the compromised file, than thirty). Reducing initial propagation helps this.

Continue reading

Six Pillars Of Security, #5: Controlling External Risk

Once data leaves your system, your ability to control it rapidly diminishes. However there are steps you can take to mitigate risks:

  • Only giving clients the data they require
    • For example, with a centralized service application this would involve analyzing what each client needs, and applying logic so that they only receive that information.
  • Actively engaging with client teams, asking them about security, guiding them. Even though the data has left your system, it is still your data and you need to ensure others are being careful with it.
  • You can’t allow other teams to rely on you for validation, especially clients providing data; this would effectively cripple any attempts by them to validate new changes on their front end, which can lead to them not validating at all.
  • Sometimes, getting a client to understand what data is ‘toxic’ and what data isn’t, is more effective than trying to validate everything.
  • Basic Action Points For A Team:
    • Identify which parts of your data are actually sensitive. This might be more than you initially thought.
    • Identify what parts of your data are ‘toxic’ (eg, can’t be considered trustworthy), make sure that clients understand that.
    • Investigate what data your clients actually need, especially with regards to sensitive data.
    • Talk to other teams, see how they are validating etc.
    • Apply filtering if appropriate.

Six Pillars Of Security, #4: Not Helping The Bad Guys

  • Systems often accidentally reveal their workings and vulnerabilities:
  • Revealing the server/component type and version in their error response -this allows the attacker to search for known vulnerabilities against that version number.
  • Logging full stack traces, or worse showing them in the error response -again, this tells the attacker what libraries the system uses, and if any of those libraries have known vulnerabilities, the attacker can exploit this.
  • Just because your client is internal and not ‘user facing’ don’t assume that your error responses won’t filter through to the outside -in a highly distributed system, you never know where your response will end up.
  • Same goes for logging; you don’t know who will end up reading your error logs.
  • Basic Action Points For A Team:
    • Never log full stack traces, unless absolutely necessary.
    • Confirm that non of your responses contain information about the server products you use or their versions.

Six Pillars Of Security, #3: Detection Of Breaches

firefighters-live-fire-training

  • Good visualization and graphing can expose suspicious activity.
  • Tools like Datadog are invaluable when coupled with appropriate queries/visualization.
  • For example, if a user or client is generating a lot of backend usage from a comparatively small number of requests, this can be a sign of a breach. If the front-end to back-end activity ratio per user is plotted, you can see this happening in real time.
    • In this example, monitoring and graphing database read/out volume would be a quick and easy measure; simple data breaches normally show up as a ‘swell’ in outbound data. More sophisticated attackers will extract the data slowly over a period of time to avoid such detection, so more specialized graphing and alarms would be needed e.g database usage aggregated over time, against average usage for that user.
  • Likewise, response size is a good indication of a breach/extraction; if someone is trying to steal information, then the amount of data per response is likely to be higher.
  • Looking for large numbers of bad requests, 404’s, ‘bad search parameters’ etc. is a good metric as well, as it’s a sign that someone is trying out different things with your API.
  • Basic Action Points For A Team:
    • Set-up basic monitoring for unusual activity volumes/frequencies/sizes etc, especially relating to other metrics eg. amount of database activity generated by a particular request or user (in the case of a database export). Tools like Datadog would be good here.
    • Set-up monitoring for bad requests/404’s/bad search parameters; these can be a sign of someone trying to guess a resource-id or probing how to access your system.
    • Investigate other options/possibilities for monitoring/visualization, specific to your project.
    • Understanding the threat model of your project, and what kind of attacks you are likely to encounter is key here -the belief is that commercial attackers are our main threat; is this accurate?

Six Pillars Of Security, #2: Configuration, Internal Processes And Human Error

 

  • 1-mistakeCare with configuration. Misconfiguration is one of the top 5 reasons behind companies getting hacked.
  • For sensitive data and functionality, consider incorporating a per-role based permissions system to reduce risk, and help track what happened in the event of an attack.
  • Principle of least privilege again helps secure a system; a leaked password isn’t any use if there is no way to invoke important processes with it.
  • Team must follow the internal processes for any key handling. Security developers should be completely familiar with and other team members have read
  • Review any security vulnerabilities or concerns for third-party libraries. Some well known libraries have massive flaws. eg. XMLDecoder (which is core java, but the point still stands) allows XML external to trigger system processes and execute java code: http://blog.diniscruz.com/2013/08/using-xmldecoder-to-execute-server-side.html
  • Basic Action Points For A Team:
    • Look at existing configuration, can it be made more secure? Talk to those responsible for it.
    • Implement a policy of fine-grained roles/permissions, and least privilege if possible.

Six Pillars Of Security, #1: Secure Application Development

5660-security-lock-and-key

Infosec requirements should be determined during requirements gathering and where relevant integrated into JIRA tickets as infosec non-functional requirements.

  • Each system should have a comprehensive threat model documented with all relevant attack vectors called out (key compromise, network compromise, DDOS).
    • This would include understanding the business model of those commercial attackers who would target us.
  • Security focused dev should be familiar with the OWASP Top 10 vulnerabilities and the Common Weakness Enumeration Top 25 and validate all tickets with a security element against these:
  • Un-validated/unstructured inputs into your system should be determined and risks presented to product stakeholder. These are vectors for injection attacks.
  • Attacks may still exploit product weaknesses without directly targeting the product itself itself.
    • eg. inserting an injection attack into one of your incoming requests, which then ends up being read by other systems, which then compromises a component in those systems, which then opens the door to further attacks etc.
    • Sounds far-fetched, but commercial attackers (who are more proficient than your usual Anonymous, Hactivist, ‘Script Kiddie’ DDOS-er) will know how to perform complex, multi-system attacks like this.
  • Action Points:
    • Infosec review process integrated into workflow.
    • Comprehensive threat model for project/product, including up and downstream components.
    • Product stakeholders need to do one of the following: a) accept risks arising from threat model, b) escalate c) add tickets to remove risk
    • Review of incoming data schema, seeing which parts are sensitive, which parts are toxic, and which parts are open to exploitation.
    • Security developers should be completely familiar with and other Team members trained in OWASP Top 10 and CWE Top 25 security flaws.

Thoughts On The IBM-Watson-Conversation Hackathon, London 2016

This October I had the privilege of participating in the IBM-Watson-Conversation Hackathon in London, as part of a five person BBC team. We eventually won out of the fifteen teams that participated, with our combination of technology, ‘human interest’ and humour being noted.
img_1012

These guys, right here….

The brief was simple: use IBM’s Watson powered Conversation engine to create a chatbot, integrating with Watson’s other Artificial Intelligence based APIs (e.g tone analysis, image recognition, context based news etc).
Conversation is a Natural Language Processing (NLP) engine, that allows the construction of non-linear, non-brittle dialogs. It’s integrated into a wider eco-system of IBM and Watson based products, using the IBM BlueMix cloud platform as its bedrock, so getting off the ground is as easy-as-pie. It also enables integration with select external services such as Foursquare and Twilio.

Continue reading