Safety Governance Systems

Once upon a time, I went around the countryside auditing aerodrome safety management systems and dutifully asking SMS-related questions of all and sundry. It didn't matter who they were, I asked them what they knew about the aerodrome's SMS, how they managed risks, and what did they do to make sure everything was being well managed. I didn't ask everyone the exact same questions, like asking the guy mowing the grass how he ensured enough resources are available to manage safety, but I did bang the SMS gong at/to anyone who was around or would listen. I'm not so sure that was the right approach.

Noun-based Regulation

The modern world is definitely in love with its noun-based activities. Each week, a paradigm-shifting approach to some human endeavour is announced with a title like value-based health care or outcome-based education. When I delve into the details, I am generally left either confused as to what they are selling or how they are different at all. Regulation is no different. Just plugging "based regulation" into Google yields, on the first page alone, principle-basedresults-basedperformance-basedoutcomes-based and output-based regulatory approaches.

A World without Reason

Recently, I have felt like I'm in danger of becoming complacent with the bedrock of my chosen field. I'll admit that in the past, I've been fairly vocal about this bedrock's limitations and mantra-like recitation by aviation safety professionals the world over. But the recent apparent abandonment of this concept by one of the first Australian organisations to go "all-in" on it, gave me cause for reflection. I am, if you haven't guessed it, talking about the "Reason Model" or "Swiss Cheese Model".

Unnecessary Segregation or Pragmatic Isolation?

I've been out in the "real" world for the past six months or so and in that time, my thinking on risk management has changed a little bit. So here it comes, a confession... I have being using a PIG recently and I have felt its use has probably helped with effective management of overall risk.

No Man is an Island

I've been a bit out of the loop over the past couple of months as I try to get a handle on my new job and the (almost overwhelming) responsibility that goes along with it. But I can't ignore the action over at the Federal Senate's Rural and Regional Affairs and Transport References Committee's inquiry into Aviation Accident Investigations

BTIII: Assessing Uncertainty

I can't lie to you. I have been turning myself inside out trying to get a handle on risk evaluation in the aviation safety sphere for close to five years now and I still don't feel any closer to an answer. And I say "an" answer and not "the" answer. Since you are always assessing risk in terms of your objectives, there can and will be multiple approaches to assessing the risk of the same scenario depending on whether you are considering your safety, financial or legal objectives.

Systems Modelling

When I joined the aviation safety regulator I was introduced to the concept of systems-based auditing (SBA). Before this I had been carrying out aerodrome inspections and I thought becoming an Aerodrome Inspector for the government was going to be more of the same. How wrong I was! Even after four years, my concept of systems-based auditing is still evolving. I coming to discover, and it seems everything I read will attest, that most things in life tend to be more complex than we initially think - SBA is no different.

Regulation, The Final Frontier?

The week before last, I finished a 4-year stint with the aviation safety regulator. Even though I'm heading back to industry, I'm not going to stop writing this blog. I believe that the role of the national regulator is the next safety frontier (not the last ;)) and I like the idea of exploring new territory. As the industry continues to explore concepts like safety management, systems-based this, risk-based that and outcome-based whatchamacallit as well as safety culture, we are all going to come to the realisation that safety can be greatly affected (more than we ever imagined) by the approach and actions taken by a national regulator.

BTII: Control-freak

As a follow-on to my first post on the Bow-Tie risk assessment method, I thought I'd concentrate on controls (or barriers or whatever else you would like to call them). This is, after all, where all the action happens. Risk controls are how we spend most of our time - they are the practical aspect of managing risk. Quick Review

Our typical bow-tie model consists of one or more threats leading to a single top event which results in one or more consequences. The idea is to insert your controls into these connections in such a way as to reduce the level of risk associated with the scenario. Controls may also be subject to defeating factors which affect their ability to reduce risk. Here's my overview picture from a couple of weeks ago:

The Components of a Bow-Tie Risk Assessment

Skinning Cats

You can categorise controls a multitude of ways. Risk professionals would be familiar with the standard hierarchy of controls and other ways of breaking them up. Now, I'm not sure if you're getting to know me yet but, you may have guessed, I've got a slightly different approach.

The first concept I'd like to introduce is that bow-ties are made up of primary lines and  secondary lines. The primary lines are those that link threats to the top event to consequences while the secondary lines are those connecting defeating factors to controls - see my new diagram below. The reason for the distinction is that I believe there are fundamental differences between the controls required on the primary line and those used on secondary lines.

I only noticed this phenomenon the other day when I was putting together a bow-tie on mid-air collision within a very specific context. I had a good piece of technical analysis in front of me but I wanted to create a picture of the risk to assist in evaluation. This analysis contained a list, in no particular order, of existing and potential controls and as I slotted them into the diagram, I noticed that certain types of controls went on the primary lines and other types ended up on the secondary lines.

Now, I've been racking my brain on how to describe these differences and I'm still not fully there but here goes.

Within my approach to creating a bow-tie, the primary line consists of events closely related in time. Maybe not a short time but at least a progression from threat through top event to consequence. Therefore, controls along the primary line must also exist along that same temporal line - not necessarily within it though, as we shall see in a moment. This means that controls here must be things that interact with the events that occur along the line. I noticed that front-line operator actions, equipment and facilities tended to fall along this line.

Secondary lines, on the other hand, may not relate to events which occur at the time or may be situations or conditions which lay dormant until the right set of circumstance arise. I noted last time the similarity between defeating factors and latent conditions. As such, controls on these lines must address these latent conditions and should have been implemented prior to the events of the primary line taking place. In the bow-tie I was working on, controls on these lines tended to be things like education and promotion related to the primary line controls.

Not all defeating factors, however, are latent conditions. I can think of a few that are events or situations related in time to the primary line. As an example, I tend to think that low visibility is a common defecting factor to many aerodrome related controls - visual markers etc. - and this is definitely something which needs to exist at the time of the top event to have an impact.

However, I have began to distinguish these two types of controls as action controls on the primary lines and capability controls on the secondary lines1. That is not to say that capability controls don't involve action. Of course they do but their objective is to ensure the capability of the action control to achieve what it aims to achieve. I'm not exactly sure how to operationalise this concept - I would like to turn it into some form of advice or guidance on what type of controls go where or how to word controls on each line. That level of understanding still eludes me.

Squares or Triangles?2

Regardless of this shortfall, I have also been categorising controls according to where they act on their line. I think it is very important to consider this a part of categorising your controls because a bow-tie diagram has the potential to overly constrain your thinking.

As I mentioned previously, a bow-tie is or should be limited in its focus. I may have created that constraint but I stand by it, for now. When it comes to controls, however, you may need to identify things that impact on the situation to the left of the threats, to the right of the consequences and outside of the defeating factors. Before we get into all that, let's have another picture.

In this diagram, I've identified four types of controls categorised according to where they impact on the links between the bow-tie's components. Let's go through them, one by one.

Prevention - These controls act outside the bow-tie diagram as they attempt to prevent the existence of the threat or defeating factor. An example of such a control might be maintenance on an aircraft breaking system designed to prevent the system from failing.

Intervention - These controls intervene after the threat or defeating factor has occurred or manifest and seeks to stop that situation from becoming a top event or impact on the capability of a control. A sufficiently wide runway would be a good example in the case of runway excursions - this control can't prevent threats from occurring but it may stop a runway excursion from occurring if sufficiently wide to contain the aircraft's lateral deviation during landing or take-off.

Mitigation - These controls don't stop the top event from occurring but they seek to mitigate the consequence. Continuing on from the last example, a sufficient runway strip would be such a control as it only comes into play once the runway excursion has occurred.

Recovery - These controls also act outside the bow-tie diagram. This time they impact the scenario after the consequence has occurred. Any form of response, for example - emergency response, is a good example of a recovery control.

You can cut the control-pie other ways. In fact, you have to if you want to conduct analysis of the risk picture or turn it into a consolidated action plan. The more complicated the picture, the more important the structure as this helps to break it up into manageable junks. For example, you might want to think about what type of activities are involved in your controls. As examples, which ones involve training and which ones involve inspections of facilities?

Where to from Here?

I think I'll be doing a lot more bow-ties in the very near future. So, this group of posts is going to develop, grow, change, morph, what have you. Next on my list might be evaluation methods, maybe. I'll be back with more as soon as I figure it out. Actually, I might be back before then.

1. I really have trouble naming things. As proof I offer my high-school band name - "Hot Pink Combi" - I know, right? :oops:

2. I ask this question of my kids nearly every weekend morning when I go to cut their toast. For me, it has become to mean the same thing as slicing up a pie.

Lessons from Taleb's Black Swan

Having just finished reading Nassim Taleb's The Black Swan, I initially thought about writing a not-so-in-depth assessment of the book's positive and negative points - but I'm not much of a book reviewer and a comprehensive critique is probably beyond my capabilities (at this stage). So, instead I thought I would focus on just a couple of the book's significant concepts and explore how they may apply in the aviation context. Background

The crux of the book, if it can be boiled down to a single paragraph, is that in this modern, complex world we are unable to predict the future when that future involves Black Swan events. Black Swans are those events previously thought extremely rare, if not impossible. The term comes from the standard assertion that all swans are white made prior to the discovery of black swans in Australia.

Taleb's specific definition for a Black Swan has three attributes: it lies outside of regular expectations, it carries an extreme impact and it is subject post-hoc explanation making it appear predictable.

This third attribute is the first significant talking point that I'd like to address.

Retrodiction

When humans look back at a past event, the tendency to create a narrative is strong. It helps us make sense of the world and assists with recall. But in doing so, especially in a complex world, we are likely to introduce a few errors and fall into a few bear-traps.

The big one is over-simplification. The complexity of the typical human's operating environment is growing. Even aviation, which was pretty complex to begin with, has become a close-coupled, global transport system practically unfathomable to the individual. In dealing with this complexity, people tend to identify a limited number of factors and over-attribute their causal influence. Often, this over-emphasis comes at the cost of environmental influences which are outside the control of the event's main players.

Taleb, being an economist, cites examples from the finance world but I couldn't help thinking of accident investigation while reading this. Generally, I felt rather positive to the aviation industry's approach to post-hoc analysis of aircraft accidents - a type of black swan event.

While the development of a narrative is typical, most accident investigation bodies do go beyond the basic "what happened in the cockpit" and look at the latent conditions which contributed to the operational environment. We have the widespread use of the Reason model to thank for this. Some accident investigation bodies, like the ATSB, shy away from the use of the word cause and instead opt for contributory factor or something similar. This is in recognition of the fact that direct causal relationships between identified precursors and the accident cannot always, if ever, be proven in a post-hoc investigation.

Prediction, Shmidiction

Taleb has a real problem with prediction and he puts up quite a few arguments against it. One of my favourites is the "nth billiard ball" - so let me butcher it for you.

The level of accuracy required to make predictions increases significantly with only small increases in system complexity.

For example, let's say you want to calculate the movement of billiard balls. The first couple of collisions aren't too much of a problem but it gets really complicated, very quickly. I won't profess to understand the maths behind these calculations but Michael Berry has apparently shown that:

  • in order to calculate the ninth collision, you need to include the gravitational pull of the man standing at the next table, and
  • in order to calculate the fifty-sixth collision, you need to consider every single particle in the universe in your calculation.

And this is a simple problem! Now consider the dynamic and socio-technical aspects of aviation to really make your head hurt.

Scalability

The third significant concept I wanted to touch was scalability. I'll probably also murder this nuanced concept like those above but here goes.

A scalable is something in which the scale of the outcome is not limited by the nature of the act.

The concept was introduced to Taleb in terms of employment so let's start there. A non-scalable job is one where you are paid by the hour or according to some other unit of work. For example, a barber gets paid per haircut. There is no way for he or she to get paid a total amount that is more than the physical limitation of performing the required service. A scalable job is one where pay is not directly linked to the unit of work performed. In this case consider an author, he or she writes a book but they may receive in return $1 or they may make $1,000,000.

It took me a while but I started to see aviation accident contributory factors in the same light. Some acts, errors, mistakes, etc. will only impact on the single activity being undertaken at the time - a pilot forgetting to put the landing gear down will only contribute to his or her own accident. But others may have a scalable impact and could contribute to many - a poor policy decision relating to training may result in all crew carrying the same deficient knowledge, which in the right circumstances, could contribute to many accidents.

Pulling it Together

Taleb brings together these and numerous other concepts and outlines his approach to financial investment - he calls it the Barbell Strategy. In recognising the problems with predicting outcomes in complex, dynamic socio-technical systems, he takes both a hyper-conservative and hyper-aggressive approach. He invests significantly in low risk investments and then places numerous small bets in extremely speculative opportunities that carry a significant pay-off - he tries to catch as many positive black swan events while minimising his exposure to negative ones.

So what's our Barbell Strategy for aviation safety?

We need to invest in things that we know are closely related to bad stuff happening - say, runway safety, CFIT, etc. - and we need to invest in things that can have a scalable impact on safety - e.g. poor training standards, inappropriate regulations, etc.

How much we should invest in each is an open question but the basic concept sounded pretty good to me. Actually, it almost sounded familiar...

Confirmation Bias? You Betcha!

The more I thought about Taleb's strategy in the aviation safety context, the more I thought it sounded a lot like scoring risk according to proximity and pathways. My still incomplete concept of risk evaluation sought to identify more critical risk conditions according to either their proximity to the ultimate outcome of death and destruction or the number of pathways the risk condition could result in catastrophe.

Proximity is important to those non-scalable conditions that contribute to accident occurrence and ranks them higher the closer they are to that ultimate condition. This avoids those nasty prediction problems Taleb keeps talking about. Pathways considers the scalable conditions that may contribute to accident occurrence but where prediction of direct causal relationships is impossible. Instead, you simply consider the scale of potential contributory pathways as a measure of criticality.

I have a few threads of thought coming together at the moment in this area. I'm excited to find out how they all tie together and whether I can get them out of my head and on to this blog.

BTI: Dressing up for Risk Assessments

I've been doing a lot of pondering on the Bow-Tie method of risk assessment for a project at work. Bow-Tie is a tool used by many, especially in the oil & gas industry, to create a picture of risk surrounding a central event. It's got a few positives and a few negatives but these can be overcome if you understand the limitations of the model being used. The Basics

Let's stick to the high-level stuff here because the deeper you go, the murkier it gets. The names change depending on the source or software one uses but I think the overarching concepts remain the same. In short, don't shoot me if I use different names, hold your fire if I describe the concept differently, and fire away if you think I am way off track.

Bow-Tie is a graphical view of risk.

As stated above it centres on a central event - sometimes called the top event or hazardous event. On the left of the diagram are precursor items - sometimes called safety events, contributory factors or threats. While on the right, there are consequential items - often simply called consequences but may also be labelled outcomes, effects or losses. For this post, I'll use threats, top event and consequences as my standard terms.

On the lines connecting the threats, top event and consequences, you insert controls (or treatments, barriers, etc.) to address the risk. Those controls may be subject to defeating factors (or escalation factors or barrier decay factors, you get the picture) which seek to reduce the effectiveness of the control. You can insert more controls to address the defeating factor which may introduce more defeating factors and so on and so on.

Here's a pretty picture:

The Components of a Bow-Tie Risk Assessment

The Model

The Bow-Tie is a selective picture of a risk scenario. It focusses one's attention on a top event which, I think, frees up the thought processes to identify the other aspects of the issue at hand and allow for better analysis of the scenario.

It contains an implicit consideration of time as it progresses generally from left to right but not in an overly restrictive way. Defeating factors fall outside this and can often be thought of as latent conditions of the Reason-esque variety, existing for an indeterminate amount of time before manifesting during the risk scenario - even after the top event.

Starting on the left, you have the precursors to the top event. Some might call these causes but that can be a dirty word in accident modelling. Usually, you can't really make a causal link between precursors and consequences especially in post-hoc accident investigations. That's why the term "contributory factors" tends to be favoured.

I favour the term threat. To me, it's a little more than a hazard. It's a specific manifestation of a hazard or more like a situation - it could be an event in its own right but often they are hard to track or discretely identify. I haven't really come up with succinct definition yet - these posts tend to be my incomplete thoughts "out-loud".

In the middle we have the top event. The selection of a top event can be tricky but it is typically a significant event which is easily identifiable with at least some occurrence data available. Historically, the top event has been characterised as the release of energy. I'm not convinced that this works in the aviation context. While we do have contained energy in various systems - fuel, hydraulics, the cabin environment, etc. - the essential operation of aircraft, i.e. flying, doesn't so much contain energy as it rides an enormous tsunami of potential and kinetic energy. I, sort of, touched on this when I posted about the inherently perilous nature of aviation.

To the right, you have the consequences - the things that happen after the top event. This area can be tricky as well but I'll talk more about that in a moment.

Once you've built your picture then it is time to start inserting your control measures on the connections between your threats, top event and consequences. There are plenty of ways of categorising these controls and I'll go through some of my thoughts on these in a subsequent post.

I think one of the best parts of the bow-tie model is the defeating factor. The inclusion of this level in the structural model really gets you to think about active and latent failures which occur further upstream from the top event. In most of the fairly simple bow-ties I've developed to date, I've had lots more defeating factors than direct threats. I'll also elaborate on this in a subsequent post - in the meantime, don't be afraid to identify lots of defeating factors.

That's the basic model. It sounds pretty good but there are limits. Of course, a model is a simplified version of reality. It's vital that users of this model understand those limitations - otherwise unsupported decisions could be made.

The Edge of the Bow-Tie World

I said above that a Bow-Tie is a selective picture. Once you identify the top event, you've essentially set your focus with things in the periphery falling out of frame.

Aviation is a complex, dynamic system. It can't be boiled down to one or a few diagrams. We tend to think about accidents as long chains or networks of events. While the defeating factor dimension helps model this, sometimes we need to consider contributory factors of contributory factors or consequences of consequences - but this can severely complicate the analysis of a Bow-Tie. Some software have allowed the linking of Bow-Ties but the implementations I have seen are fairly superficial.

I'm still searching for an approach or implementation that supports the higher level of complexity of the aviation system. One that would allow analysis to be carried out by mere mortals. The Bow-Tie model is a trade-off. Therefore, you still need a parent system to manage the transition from real-world to Bow-Tie model, if your purpose is risk management within the aviation sphere.

So, don't push the model beyond its limits. As I'll get into in a moment, you've selected a top event for a reason, your analysis should remain within the frame of direct relationships. The longer and more tenuous the connections between your threats, top event and consequences are, the weaker the picture becomes.

Keeping Focus with Direct Links

The Bow-Tie model is rather simplistic. It can't handle complex, multi-stage causal pathways which include a mixture of conditional and independent factors. If you need to do this sort of analysis you would be better off with a fault-tree or event-tree type of tool. As I said above, the focus of the Bow-Tie is set on the top event and its best to keep the links direct.

I have seen on some bow-ties a long list of threats, some of which are actually precursors to other threats. For example, most runway excursion bow-ties include an unstable approach threat but some also include common contributors to unstable approaches such as inappropriate vectoring or pilot distraction. If you want to analyse these things it would be better to create a second bow-tie centred on the unstable approach threat.

The same thing happens on the consequence side. An example on this side would be a bow-tie with collision, crash and death in the same list. These three things don't occur at the same distance (read: time) from the top event and they are not necessarily independent - the picture starts to get muddy.

I also have a general feeling that consequences like death and crash are not specific enough to actually assist in the analysis. That sometimes cited cliché, "there is only one cause of death - a lack of oxygen to the brain", tends to dissuade me from using it as a consequence. Usually, lots of things need to happen before someone dies.

I'll admit that it is hard to put together a succinct bow-tie diagram. We tend to have lots of things we want to get into the risk scenario but it is important to get the structure right. Otherwise the diagram can become confusing quickly.

In order to keep things on track, I've been using a little narrative trick to check the appropriateness of my threats and consequences. Essentially, I create a little story running along each line from threat to consequence. If the story makes sense with just the threat, top event and consequence then I think I'm on the money. If I need to elaborate or clarify something then I am probably off the mark.

Let's look at another pretty picture, this time a simple runway excursion bow-tie:

Simple Runway Excursion Bow-Tie Diagram

So after drawing this, I would go along each line and make up a little story. Something like:

The unstable approach lead to a runway excursion with the aircraft departing the runway strip.

That seems to work. The story is complete, succinct and centres on the top event. Yes, I could elaborate on the precursors to the unstable approach and what happened after the aircraft left the strip but if I did that, where would I have to stop? The power of the Bow-Tie is its simplicity but, remember, that comes at a price.

Let's try the next one:

Low visibility lead to a runway excursion with the aircraft crashing.

Well, its succinct in this form but I don't think its complete. I'd want to know how low visibility ended up as a runway excursion. Did the pilot line-up wrong? Did he fail to see an object on the runway? I'd also want to know how and where the aircraft crashed. Did it depart the runway strip? Did it fail structurally? Did it collide with something? And so on.

Therefore, I don't think that low visibility is a suitable threat and I don't think that aircraft crashes is a suitable consequence in this risk scenario.

The last one would be:

A braking system malfunction lead to a runway excursion with the aircraft re-entering the runway.

I think that is back on track.

This simple diagram is not perfect. I think the wording of the good threats and consequences could do with some more work. I might need to separate some of them into more specific direct threats and consequences. But that will be the topic of another post...

I've said a couple of times above that these are some incomplete thoughts on this subject. I would greatly appreciate your feedback, comments, ego-boosters, rants - anything. Thanks.

Edit: I just added a BTI to the title to tie it in with a follow-on post.

Crowd-sourced Certifications

I've just been mucking around with a new Internet service called Smarterer. That's not a typo, it really is Smarter-er. I guess, in a nutshell, it's an online quiz creator which is meant to help you quantify and showcase your skills. The twist in this implementation is that the quizzes are crowd-sourced. Anyone can write questions for the quiz and thus over time, the group interested in the topic defines the content and the grading of the quiz.

There's a whole pile of things going on under the hood that I haven't gotten into but it does look interesting.

The fruit of my tinkering was that I kicked off an Aviation Safety Management System quiz. It has 20 questions to begin with and is based on ICAO's Safety Management Manual. There's nothing too obscure in the questions but I would love to see the test grow - the only downside is that I can't take the test!

Anyway, check it out at http://smarterer.com/test/aviation-safety-management-systems and let me know what you think.

Under Thinking Just Culture and Accountability

I am definitely capable of over thinking, of tying myself up in knots and being lost in the detail. And other times, I probably haven't thought enough. Recently, I identified just culture as a concept I hadn't really thought about in-depth.

In my mind, I thought I knew what a just culture was. I knew it was more than a simple no-blame policy. I knew it involved establishing what is acceptable and not acceptable behaviour. But that had been the limit of my thinking.

That big void of knowledge started to weigh heavily on my mind. So, I set out to read Sidney Dekker's oft-cited Just Culture. Now that I have finished reading it, that void hasn't been filled - it's a swimming mess of questions, thoughts and more questions.

Although, I think I've got a grasp on that little hard nut called accountability.

I used to talk about how accountability was different to responsibility - "you can delegate responsibility, but you can't delegate accountability". I used to make the distinction that responsibility involved doing things but accountability didn't - "the responsible person performs the action, the accountable person just, ah, is accountable for it". However, I don't think I really ever defined accountability in a meaningful way (save for one occasion, by accident1).

I guess I understood that accountability meant knowing about what was going on in the area for which you were accountable but I never fully digested why that was important and what one would do with that knowledge.

While reading Just Culture had definitely helped me to understand the bigger picture but I don't think my knowledge had synthesised until I begun analysing an uncomfortable incident in which I was involved this weekend.

The Incident

I coach my son's U6 football (soccer) team. We're eight games into the season and I've been slowly gaining confidence in this role with a fair amount of trial and error. There hasn't been a lot of support for newbie coaches but I've been forging my way forward.

During matches, I've been on the field guiding the kids around and encouraging them along as they too fumble through their first year. It has been fun - especially the high-fives I get from four sets of hands for any manner of achievement, everything ranging from scoring a goal right down to not touching the ball with their hands.

But yesterday, things did not go so well.

A couple of minutes in the match, I was setting the ball on the goal line for a kick-in when a man (with no official identification) approached me and advised that I was not permitted on the field during the match. That didn't mesh with my understanding and since I was already concentrating on the match, I brushed  the guy off and told him I was staying on the field. He responded by telling me that he was going to get the ground official.

Not long after, two men wearing high-vis official vests entered the field and instructed the referee to stop the match. One of the men was the man from before, meaning the other must be the ground official. I approached the two men to find out what's going on.

I'll spare you the he said, I said stuff - the final ultimatum was I had to get off the field or he would cancel the match. Not much of a choice really, so I quickly explained the situation to the kids and coached from the sidelines for the rest of the game.

The Post-Incident Analysis

Now, I'm not going to get into all the grubby details of this incident - this post is not about the incident, it's about accountability and just culture.

Since I'm a life-long-learning type of guy, I ran through the incident in my head about a million times yesterday afternoon. I explored issues like:

  • what do they rules actually say? - for the record, I was wrong - I am not permitted on the field during the match;
  • why did I think the way I thought? - primarily a case of confirmation bias;
  • in what ways did these other men act inappropriately or in contravention of policy, etc.?; and
  • chiefly, what can I do better next time?

I thought about all these things as I prepared my incident report for my club president. My incident report, my account by another name. I was accountable for what ever happened on that field, especially incidents directly involving me, and here I was, providing my account.

I imagined the other men were doing the same with both of our club presidents taking these reports and providing their own accounts up the chain, as appropriate.

Okay, now what?

That's a really good question and this is where my past thought process tended to stop.

Dekker makes the point a number of times that sometimes, providing the account is enough. He says that families of patients lost on the operating table are often just wanting to know how it happened from those accountable for the event.

He also mentions the importance of data to learning but I didn't find the connection between learning and accountability that strong in the book. It was only yesterday that that neural pathway was opened.

The push to make people accountable is to increase learning.

Accountable doesn't mean identifying people for punishment, sanction or retribution. It simply means setting an expectation that they will be able to provide an account of what occurs within their sphere of accountability.

And it doesn't relate to just the accountable executive. It relates to everyone. In the above incident, I'm accountable, the other two men are accountable, our club presidents are accountable, the administrators of our local football association are accountable and so on.

This doesn't mean that the president of Football Brisbane should be able to describe the events which took place yesterday off the top of his head. It means that as each of us involved must analyse the incident and identify contributory factors coming from other parties. Those other parties provide accounts of those factors.

For example, why did I think I was allowed on the field? There are a range of contributory factors from inconsistent use of terms surrounding the coaching role both by my club and Football Brisbane right through to never having been corrected during the past eight matches2.

And just as each of us provides an account, each of us must take the accounts of others and learn from the incident. I have obviously learnt that I am not permitted on the field as well as not to trust confirmatory data. I hope that the other men involved learn better techniques for approaching newbie coaches who are concentrating on their teams' enjoyment and I hope that clubs and associations learn a few more ways of providing support for newbie coaches and ground officials.

Justice Served

As fired up as I got yesterday, I don't really think anyone should be punished for what happened. While I would like an apology for the manner of the approach, I am also happy to provide an apology for my frosty reception of the other men's intervention.

Overall, we just need to learn from the incident, move on and see to it that a similar incident doesn't occur again.

How effective that learning is will depend on how far the accounts go. In this instance, it appears that the power of a party to effect wide-spread learning is inversely proportional to the proximity of that party to the original incident3.

What I mean here is that the party furthest away from the incident, say Football Brisbane, has the greatest ability to prevent a repeat of this incident. I'm not going to be involved in this incident again because I now know that I'm not allowed on the field. I hope this incident won't occur at that ground again because the officials involved will adjust their behaviour and I hope that my club will let other coaches now about the incident to minimise it reoccurring within our club. Football Brisbane, on the other hand, has the power to see it and similar incidents prevented with its power to reach all coaches and officials within the Brisbane area and so on.

It's quite amazing what is possible just by providing an account without fear of reprisal. Here's hoping for some communication, some learning and some justice in the very near future.

Now I just wish aviation was as simple as U6 football.

1. I was presenting an SMS course in Indonesia and I had used Google translate taking English into Bahasa Indonesia to try to make the accountability/responsibility discussion more relevant. What I discovered was than in Bahasa Indonesia responsible means to "bear answer" which is pretty close to what I took from Dekker's book as the definition of accountable.

2. Mr Taleb will track me down and spank me for that one. I just read about confirmation bias and the asymmetry in data when it comes to confirmation versus contradiction.

3. I'm not sure if that's original. I can't recall reading it anywhere and it just came to me as I was writing this but that's not to say that I haven't read it before and I'm channelling some great thinker. If that is original, can we please call that the Parsons Rule?

SMS Considered

While in Bali talking Runway Safety with a wide range of industry personalities, I found myself at the hotel bar talking SMS with Bill Voss from Flight Safety Foundation. The topic was obviously on Bill's mind because upon my return, I found his latest president's piece in FSF's AeroSafety World to be a good overview of his main SMS points. Some of these points have been on my mind too. Since I'm not one to recreate the wheel (providing it works and is fit for purpose), I'll use some of Bill's well-formed words to kick this off.

Guidance Material

Back when the international standards for SMS were signed out at ICAO, we all knew we were going to launch a new industry full of consultants. We also knew that all these consultants couldn’t possibly know much about the subject and would be forced to regurgitate the ICAO guidance material that was being put out.

The title of the piece is SMS Reconsidered but I'm a little bit more critical of how SMS has been implemented in some places and would argue it was never really considered in the first place. The "regurgitation" of guidance material has been a big problem.

ICAO guidance material touting the "four pillars" was, as I saw it anyway, what the title suggested - guidance material. The industry was meant to consider the material and apply it within their operational context, corporate structure and organisational culture. The level of complexity within the operator, the existing systems in place, the attitudes of everyone involved were/are meant to be considered and a tailored SMS developed.

The reasons behind the current state of SMS are many, varied and probably not worth going over. It is more important to get the concept back on track. That's a big task and bigger than this little blog post. Instead, I wanted to discuss Bill's "four audit questions".

Levels Revisited

Bill's piece outlines four seemingly simple questions designed to test the operation of an SMS:

1. What is most likely to be the cause of your next accident or serious incident? 2. How do you know that? 3. What are you doing about it? 4. Is it working?

When posted on the FSF discussion forum on LinkedIn1, a fifth question (taken from the text) was added:

5. Can you show in the budget process where resources have been re-allocated to manage risk?

Interestingly, it was initially assumed that these were questions posed to the safety manager or some other safety professional as part of discussion between like-minded professionals. However, later comments did swing around to my first initial understanding that they could be asked of anyone within the organisation.

In fact, they should be asked of multiple people at different levels of the organisation.

A couple of weeks ago, I discussed the need to find the right solution at the right level and that the same tools may not be appropriate at different levels.

When thinking about SMS as a whole, there are an infinite number of ways of implementation but all must permeate all levels of the organisation with systems, processes and tools suitable to the needs of each level with communication channels between the various levels.

Bill's five questions, being agnostic to any specific SMS approach, can be applied to every level of the organisation. They should be asked of the safety manager, the operations manager, the training manager, the maintenance manager, the line supervisor and, probably most importantly, the CEO.

They aren't the only questions which need to be asked, but they are a good starting and ending point. Having all the "bits" of an SMS is required from a regulatory point of view but system effectiveness is vital to maintaining an ongoing level of assurance in an operator's ability to manage safety.

Pearls

I've audited or reviewed quite a few SMSs - only a few have showed any real consideration of the SMS concept and were tailored to suit the operator's needs. These were often the better performing systems and they bore little resemblance to the "four pillars".

At the Bali conference, I spied the completely different approach taken by Bombardier. It was mentioned a number of times that it is copyright, so I haven't included a picture here but you can find a presentation outlining their approach on the Transport Canada website. I can't comment on the effectiveness of the system but it is definitely food for thought and a ray of hope that the SMS concept is being considered, digested, pondered, manipulated, tailored, and so on.

1. It's a closed group, so I'm not sure who is able to see the discussion.

Logical Fallacies in the Safety Sphere

Sometimes I feel like I really missed out by not receiving a "classical" education. While I can probably live without the latin and greek philosophy, one area I've been keen to pick up is formal logic. The forming of a coherent and valid argument is a key skill which is, in my opinion, overlooked in safety management. Which is disappointing since making such an argument is at the heart of making a safety case.

I'm not going to tackle the subject of logic today. To be honest, I don't know enough about the overall concept. Instead, I'm going to focus on the typical failings present in a logical argument - the logical fallacies.

A logical fallacy is essentially an error in reasoning leading to an invalid argument.

Firstly, it is funny that most definitions I saw on the web described them as "errors". A term which carries a certain definition in aviation safety circles regarding intent. I just want to be clear that fallacies are not restricted to unintentional errors - they can be made deliberately.

More importantly, I should define a valid argument.

A valid argument is one in which the truth of the conclusion flows from the truths of the premises.

Now, there are a lot of specific types of fallacies. So many, in fact, that people have even developed taxonomies of them. Recently, I found a good primer in this area thanks to a team from Virginia.

But I've got a bit of problem with one aspect of this paper. The authors seem to have a higher opinion of safety professionals than I do. These are some of the offending sentences:

We assumed that safety arguments do not contain emotional appeals for their acceptance or willful attempts at deception.

For example, wishful thinking was excluded because it concerns arguments in which a claim is asserted to be true on the basis of a personal desire or vested interest in it being true. Such an argument is unlikely to appear explicitly in a safety argument.

That second one really grates my nerves. Safety tends to cost money and money is the most basic "vested interest".

I have sat through quite a few presentations on aviation safety that have deliberately pulled on the heart-strings to promote their agenda. This is a type of fallacy known as an emotional appeal.

Under the emotional appeal category, there are a few different types. Each is based on a different emotion - fear, envy, hatred, etc. But it is probably the appeal to pity (or the argumentum ad misericordiam) that I've seen the most. Here is a run-through of the most vivid of my encounters - de-identified, of course.

This presentation was on a certain type of approach to operational safety. I'll at least say that it wasn't SMS but let's leave it at that. The majority of the presentation was, what I assume, was a fairly accurate outline of this approach and how it was to be applied in the operational environment of the presenter.

What I had a problem with was the introduction and regular reference back to the, what I considered, grossly inappropriate emotional appeal made at the start. The commentary came on top of series of personal photos, backed up with a lamenting ballad and outlined the heart-wrenching plight of "Jane".

Jane was happily married for a few short years...was the centre of her husband's world...had recently welcomed her first child into the world...until one day here world was torn apart by an aviation tragedy which claimed the life of her husband...

I'm a generally emotional guy and this story got to me. I'm passionate about safety and on some level, I want to minimise the number of "Janes" out there.

But her story and the thousands like it, had absolutely no bearing on the case put forward in the rest of the presentation. In fact, I felt like it detracted from the substance of the information presented. After overcoming my tears and quivering chin, I probably bounced back into a super-critical stance as a reaction to the manipulation which had just occurred.

It is very tempting to employ cheap tricks such as these in an effort to increase the impact of one's safety case. But in the long run, it will only hurt it. Either by casting doubt on the truth of your conclusion or turning people against the argument regardless of its overall validity.

I might be getting a little bit more philosophical in the coming months as Mr Dekker and Mr Taleb continue to blow my mind with just culture, complexity, randomness and the black swan - more to come.

Integrating Runway Safety Teams with your Safety Management System

I've just spent an amazing week in Bali1 workshopping with operators and regulators from the Asia-Pacific region (and some from further afield) on the issue of runway safety. We got a lot of good information from the Flight Safety Foundation, ICAO and COSCAP as well as airlines, airports and regional regulators. The primary objective of the week was to provide information on and practice in the establishment and conduct of Local Runway Safety Teams (LRSTs). To this end, the seminars and workshop were great but I left feeling like one connection had been missed. The final question on my mind and many others, I am sure, was:

How do these runway safety initiatives integrate into my SMS?

I discussed this with a few of the other attendees and felt compelled to flesh out a few of my initial thoughts.

LRSTs are airport-based teams of representatives from runway safety stakeholders - the airport operator, the air traffic services provider, the airlines, the ARFFS provider and so on. The objective of this team is to collaborate on runway safety matters and coordinate responses to identified hazards or concerns. Much emphasis was placed on the inter-organisational and inter-disciplinary approach required when dealing with runway safety.

So how does this fit in with an operator's SMS?

The obvious relationship is through the committee arrangements found in most SMSs. In the ICAO approach to SMS, it is easy for me to imagine the LRST as a Safety Action Group (SAG).

According to the Safety Management Manual (SMM), a SAG is a "high-level committee, composed of line managers and representatives of front-line personnel" that "deals with 'grass roots' implementation issues pertaining to specific activities to ensure control of the safety risks of the consequences of hazards during line operations".

The language paints the SAG as an internal body but I see no reason why such a SAG of inter-organisational representatives cannot be convened as required when a safety issue requires it. The diagram on page 8-7 of the SMM suggests that multiple SAGs can be established and at Australian aerodromes, a safety committee of stakeholder representatives has been common thanks to some early advisory material.

A SAG sits under the Safety Review Board for that particular organisation, be they airport, airline, etc. The SRB is a higher-level committee tasked with strategic-level safety policy direction and safety assurance.

Graphically, the relationship could look something like this:

For complex environments, separate SAGs would be required and for smaller, less-complex environments, perhaps one committee is all that is needed with the various safety issues becoming working groups or even standing agenda items. It would be up to the operators involved to find the sweet spot - somewhere between the being so specific that there isn't enough work to do and being too general and having too much to do.

For airlines, and in some states, the air traffic service provider, there will be multiple LRSTs and other committees for them to attend. For these and large, complex airports, there maybe additional "mediator" committees required to coordinate and filter the numerous SAG-level committees outputs for input into that organisation's SRB.

So what are these inputs and outputs in terms of SMS functions?

If we look at the good ol' four pillars of SMS, then these inputs/outputs are the various elements of safety risk management, safety assurance and safety promotion.

Safety Risk Management

While each stakeholder's SMS will consider the risk associated with runway safety from their individual viewpoint and tend to identify treatment strategies within their sphere of influence, the real power in the LRST is the bringing together of these viewpoints to get a much more comprehensive picture of risk.

With this picture, the team is able to identify a range of treatment options designed to address the various aspects of the risk picture is ways that work together and cover the many causal and consequential pathways which exist within such a complex safety issue.

Safety Assurance

Again, each SMS in isolation would tend to measure only those aspects of safety performance within that stakeholders activities. As a bare minimum, the sharing of assurance information and at best, co-assurance activities, would greatly enhance the level of confidence each SRB would have that runway safety risk is being addressed.

Safety Promotion

Sharing a room, a team, an objective promotes safety much more than a safety poster. The safety training and communication systems within each stakeholder will be strengthened with the additional perspective provided by the other stakeholders. The possibilities here are endless.

Since I like drawing pretty little diagrams, here is another one describing the above:

Now, I don't want to diminish the progress one would make by establishing an LRST and getting some of the above going. These are very important steps and well worth the effort.

(here it comes)

But...

for those looking to the future, here are some challenges.

Amalgamating risk assessment methods - each stakeholder may have different approaches to risk analysis and they most certainly will have different risk criteria - pulling these together will a challenge.

Sharing assurance information - each organisation is going to need a strong just culture to achieve this one as airing your own dirty socks in public is never easy.

The answers to these challenges are...well, if I had definitive solutions, I probably wouldn't be sitting here blogging about them your free!

What I can suggest however, is that each stakeholder remains open with respect to risk assessment techniques and consider solving the problem on a common level - separate from the higher corporate level that a lot of SMSs operate on. With respect to sharing information, the suggestion at the RRSS Workshop was that if you want someone to share potentially embarrassing information with you, share some of yours first. I'd add to that, that it would be a good idea to establish agreed protections on the safety information to be shared.

Runway safety is a big, complex issue and there is a lot of work to be done on many levels. The LRST is one level, state runway safety groups are another. I am looking forward to some of the technological, operational and regulator advances that will be made in the future and with advances in safety performance monitoring being made, we might very well be able to monitor the effectiveness of progress in this area like never before.

1. I know. I have a tough life, right?

Levels. Levels? Yeah...

Seinfeld fans may remember this short exchange. Kramer might have been on to something and it had nothing to do with interior design. In my research and work, I've been butting up against a few theoretical roadblocks. But I am starting to think that these roadblocks are actually different levels. Internet guru1 Merlin Mann often observes that people need to solve the right problem at the right level. And now, I'm starting to think that is exactly what I need to do.

Identifying the different levels has been my task of late, and it is a task in need of completion.

This is where I'm at so far...

I was initially running with a military-style strategic/operational/tactical taxonomy. Specifically, strategic being the highest level and involving long-term, executive-level decisions through to frontline, troop-level decisions at the tactical level.

But these terms come loaded, so I've been looking elsewhere. Although, I don't think there are any terms left which don't carry some form of baggage.

So I've started down this road:

  • Executive - the highest level; involving the executive oversight or governance of the organisation; typically strategic although may be concerned with lower level issues from time to time.
  • Management - obviously, somewhere between the executive and the shopfront; probably characterised best as the level where enabling work gets done - things like personnel management, information management or hardware management.2
  • Operations - the real do-ers; practical actions taken in the extremely dynamic, real world.

I've been visualising this arrangement as something like this:

Different Levels

So what does this mean?

I believe the point of recognising the existence of the different levels is to accept that within each level, different objectives exist. As such, different tools and techniques may be required.

In thinking about this problem, I realised I posted something related to this before. In that post, I used different risk evaluation techniques at the different levels. While the overall risk management process should be consistent across all levels, the details differ because the objectives, contexts, and decisions differ.

At the highest/executive level, the context was related more to assurance with the decision about whether to accept the determined level of risk or to do more. As the risk picture changed, the executive decided to do more and directed the management level to produce a plan. At this level the risk evaluation methodology was quite different and quite tailored to the wildlife management context and the set of decisions required at that level - what to do about the various bird species.

Different Levels of Risk Assessments

I hinted at a third level of risk management but, to be honest, I haven't really seen that level employed in the real world in this context. OHS practitioners would be familiar with Job Safety Analyses (JSAs) which are a very operations-level activity which I thought would be similar to what I was thinking here.

I guess the moral of this rather rambling post is that I am becoming more and more convinced that an all-encompassing "enterprise risk management system" is not a simple case of having the same small set of tools for all levels. Instead, you need a framework that recognises the different levels (the different contexts, objectives and decisions) and creates linkages between these levels. My immature thoughts at this stage centre around the decisions and their resulting actions being those connections.

For example, the risk management being carried out at the lowest level may itself be a risk control measure for the next level up and so on. This becomes a bit circular but we might as well accept that it's turtles all the way down, people!

There may be more to come on this one, but right now, its bedtime!

1. He would so hate that title ;)

2. Safety management? I'm not too sure. I've been pondering this lately as well and when that thought is half-finished, I'll post it here too.

Work-Me & Blog-Me

AUGUST 2012 UPDATE: I've changed jobs since I posted this. However, I think it still works as a fair assessment of the relationship between this blog and my current job, which is not with the regulator. In this, the Web 2.0 world, connections can be made easily. There is no practical way to disconnect completely my blogging from my work.

And while my little disclaimer on the right is designed to create a barrier between the two, it probably doesn't address what has the potential to be a complex relationship.

This blog is my thinking brain in text. Primarily, it's an academic endeavour and therefore, by definition, it is about learning and discovery. I think it is very important to note that I am not blogging any final answers here. These are my thoughts and while I hope they are well considered and rooted in rationality, they are, more than likely, incomplete.

This blog is a personal endeavour with inspiration taken not only from my working environment but also personal activities and encounters. Nothing on here should be considered as necessarily relating to my employer or any specific aviation organisation of which I do get a privileged view. In an effort to ensure that this is the case, I tend to use news items, academic papers and training materials as my main sources of inspiration.

Overall, my blog is about me.

My work on the other hand is not about me. I work as part of a team which is part of an office which is part of a larger office which is part of an organisation. I'm a small piece in a very complex puzzle. I try to fill out my piece to the best of my ability and I hope that I do.

I have applied some of the thinking posted on this blog to projects I'm working on. In doing so, I have realised that some of these posts do indeed need more thought and I will probably bring that thought back here for posting.

I have put forward the arguments posted here in discussions with colleagues. They have listened (I hope) and then synthesized this information with their understanding and their objectives in mind. Like I said, I am part of a team and the viewpoints of many tend to provide a better answer than the opinion of one, especially when the full nature of the environment may not be apparent to each individual.

In short, you can't read this blog and ask why my employer hasn't implemented my ideas and nor should you consider my posts here to be the policy of my employer. In the first instance, as much as you may agree with me (if so, I am flattered), we may share a limited view of the overall situation. In the second instance, I am not omniscient (don't tell my wife), omnipresent (don't tell my boss) or omnipotent (don't tell my kids).

There is Work-Me and there is Blog-Me.

This here, is Blog-Me.

Just to be clear, questions have been raised regarding my blogging activity. No specific issue, just a heightened level of concern over what is still a relatively new form of personal activity. A social media policy is probably forthcoming from my employer and when it arrives, I will abide by its provisions 100%.

As Low As Reasonably Practicable

It's another staple of the risk management diet but while I believe this one to be a completely valid concept, I can't help to feel that its being served up underdone. This time I'm talking about ALARP - As Low As Reasonably Practicable. To define ALARP, at least how I do, would probably negate the need to write the rest of this post. So let's just say that ALARP is the point at which any further reduction in risk would require resources significantly greater than the magnitude in the benefit gained1.

It is often described graphically. Here are a few examples of the types of diagrams you may see helping to explain the concept:

The left diagram is the one I see the most although I am seeing, more and more, other representations including the other two. Rather than link any specific instances on the web, feel free to find such diagrams using Google Images.

So what are the problems that I see with most of these graphs? Thanks for asking...

The ALARP Region

In the left diagram, it is shown as an orange trapezoid and in the centre diagram, it is a line but in both cases the point of this area is to identify the level of risk acceptable if ALARP is achieved. Sometimes, the diagram is missing some commentary so it looks like that this region is simply the ALARP region - whatever that means.

Going hand in hand with the former definition though is that risks falling in the green area need not be treated at all and we'll come back to this.

Axes (as in plural of axis)

Often the nature of the axes is confusing. Take exhibit A (the one on the left), it has a y-axis but not x-axis. Sometimes you see risk magnitude shown as an x-axis but isn't risk level and risk magnitude the same thing?

Anyway, the diagram on the right has a bigger problem than that. It has no label on the x-axis but it does have two y-axes. The two plotted lines intersect at a point identified as the ALARP point.

But what is the significance of the intersect when different scales are used? I would argue that unless you identified the exact relationship between the two scales, there is no significance - not to ALARP or acceptability of the risk.

Two Questions

I see ALARP as not a question relating to acceptability - i.e. risk evaluation - but a question relating to risk treatment. Two different questions, but do both have to be answered?

If we follow the standard ISO 31000 RM process, the question of acceptability appears first and allows for the decision to not treat the risk, instead relying on existing controls. The standard does start to talk about cost-benefit considerations but stops short of requiring the achievement of ALARP at either the evaluation or treatment stages.

It appears to me that ALARP tends to be enshrined in regulations or case law. CASA aeronautical studies often include the following quote from an Australian High Court decision.

Where it is possible to guard against a foreseeable risk which, though perhaps not great, nevertheless cannot be called remote or fanciful, by adopting a means which involves little difficulty or expense, the failure to adopt such means will in general be negligent.

So, it seems that regardless of the inherent acceptability of a risk, it must still be treated to ALARP2. Meaning that you need to answer both questions separately.

  • Have I treated this risk to a level ALARP?
  • Is the residual level of risk acceptable?

My ALARP Diagram

In conceptualising my take on ALARP, I'm going to steal from the UK HSE department:

“‘Reasonably practicable’ is a narrower term than ‘physically possible’ … a computation must be made by the owner in which the quantum of risk is placed on one scale and the sacrifice involved in the measures necessary for averting the risk (whether in money, time or trouble) is placed in the other, and that, if it be shown that there is a gross disproportion between them – the risk being insignificant in relation to the sacrifice – the defendants discharge the onus on them.”

Those seem like some pretty clear directions. Risk on one axis and cost on the other. In order to make the slope of that line mean something, the cost scale needs to be calibrated to the risk scale but I have no idea how one would actually do this - maybe we'll tackle that one later. See below for a very rough, hand-drawn diagram. The ALARP point is rather hard to identify but it is the point where the slope of the line exceeds the cost-benefit limit.

Too often, I think we incorrectly lump related concepts into the same bucket and this leads to a blurring of the objectives of the process. In this case, ALARP fell in with risk evaluation when, I think, it should have remained separate and contained in the risk treatment part of the RM process.

Those risk professionals out there who possess ninja-like RM skills, can certainly short-cut the process to achieve the desire outcome but us grasshoppers3 should probably keep these concepts separate to ensure we cover off all requirements.

1. Adapted from ALARP's wikipedia page.
2. What this means for the standard, I'm not sure. I honestly hadn't thought about the implications of this thought process until I typed it just now.
3. I think I just mixed up kung-fu and whatever martial art ninjas do - no emails or dark-clad assassins please.