18 rules of complex system failure
This list below was on ZDnet and is a cut and paste job from this Brief Paper "How Complex Systems Fail" by Richard I. Cook. I suggest reading Normal Accidents if this field of extreme risk interests you.
In finance complex failure at the largest level is called systemic risk and it is what we are currently facing.
How many items below relate to a recent failure or proposed solution? (real estate, CDS and AIG, US DEBT bubble, stimulus...) the comments in parenthesis are my opinions. My own solution is a lo-fi finance approach.
1. Complex systems are intrinsically hazardous systems. The frequency of hazard exposure can sometimes be changed but the processes involved in the system are themselves intrinsically and irreducibly hazardous. It is the presence of these hazards that drives the creation of defenses against hazard that characterize these systems.
(Finance is nothing more than risk allocation).
Especially in the context of inflation-adjusted capital preservation.
2. Complex systems are heavily and successfully defended against failure. The high consequences of failure lead over time to the construction of multiple layers of defense against failure. The effect of these measures is to provide a series of shields that normally divert operations away from accidents.
(robust checks and balances need to be in place at all levels. Innovation is frequently a form of subverting these for the sake of efficiencies (capital, tax or otherwise). Risk eventually shows up.
We have recently witnessed the largest mobilisation and deployment of government resources in history dealing with the GFC (global financial crisis) system failure. Were the prevailing defences enough? Clearly no. Are the defences put in place over the last year enough to prevent another GFC or the current one turning into something worse? The Austrian School argues not - summed up nicely last week in The Man Who Predicted the Depression by the WSJ.
3. Catastrophe requires multiple failures - single point failures are not enough. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure.
(The media reflecting audience desire seeks to identify point failures. There is rarely just a bad guy or broken part. )
Indeed, and risk usually doesn't take the form of a linear process. And as it's now generally accepted, nor does probability (which supposedly measures risk as opposed to uncertainty) adhere to a bell-shaped distribution. Even one with fat tails. Reality has proved far less formulaically prosaic than this. Nassim Taleb's Black Swan theory, which illustrates this, has (rightly so) received much publicity in the last year. But he's not the only one who has proved prescient in identifying our restricted view of risk - not forgetting that risk is not only of the semi-variance kind - multi-sigma events go both ways. Four other theories that take an unconventional view of risk and reinforce point 3. are Munger's Lollapalooza Effect, Soros' Reflexivity, Frederic Bastiat's What is Seen & What is Not Seen, and of course Mandelbrot's fractal view of risk.
4. Complex systems contain changing mixtures of failures latent within them. The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations.
(people and organizations fail all the time. All things fail eventually. Systems need to allow for system failure. To paraphrase someone else: Capitalism without failure is like religion without hell. It doesn't really function as social tool.)
This is the beef Jim Rogers has with how the GFC has been 'remedied'. He believes the system should have been allowed to fail - letting the over-leveraged and men-of-straw counterparties vapourise - thereby letting the system start afresh, unencumbered by zombies and without the after effects of the bail outs.
5. Complex systems run in degraded mode. A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence
(Optimization comes at the expense of safety: 100:1 leverage or some other "innovation" may seem optimal, but may be unstable.)
6. Catastrophe is always just around the corner. The potential for catastrophic outcome is a hallmark of complex systems. It is impossible to eliminate the potential for such catastrophic failure; the potential for such failure is always present by the system’s own nature. of many flaws.
(Only the paranoid survive, find a culture or organization that is fat and happy and you will find un-acknowledged risks.)
Sounds an awful lot like Apparent and Actual risk, as explained by Seth Godin. Risk doesn't have to look like you think it should.
7. Post-accident attribution accident to a ‘root cause’ is fundamentally wrong. Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessary insufficient in itself to create an accident. Only jointly are these causes sufficient to create an accident.
(See earlier comment in regards to the media, this also applies to congressional committees, which are typically witch hunts.)
Likewise, see earlier comments from 3. on unconventional views of risk. Oversimplification and subsequent deduction from which remedies are formed can render the cure worse than the ailment - a la Ludwig von Mises via the WSJ link above in 2.
8. Hindsight biases post-accident assessments of human performance. Knowledge of the outcome makes it seem that events leading to the outcome should have appeared more salient to practitioners at the time than was actually the case. Hindsight bias remains the primary obstacle to accident investigation, especially when expert human performance is involved.
(This is the forehead slap effect, that accompanies after a bubble event.)
9. Human operators have dual roles: as producers & as defenders against failure. The system practitioners operate the system in order to produce its desired product and also work to forestall accidents. This dynamic quality of system operation, the balancing of demands for production against the possibility of incipient failure is unavoidable.
(Unfortunately positive feedback in the form of earnings, bonuses and industry recognition, amplifies this bias. A fair assessment of skill rarely gets in the way of ego amplified by culture. Cultures be they national, organizational or group that assume instant monetary reward directly equates to insight or skill almost always fail due to this bias.)
In this case it is balancing the never-ending search for return via financial innovation and investment decision making optimisation, against market regulation.
10. All practitioner actions are gambles. After accidents, the overt failure often appears to have been inevitable and the practitioner’s actions as blunders or deliberate willful disregard of certain impending failure. But all practitioner actions are actually gambles, that is, acts that take place in the face of uncertain outcomes. That practitioner actions are gambles appears clear after accidents; in general, post hoc analysis regards these gambles as poor ones. But the converse: that successful outcomes are also the result of gambles; is not widely appreciated.
(Few do post-mortems on successful outcomes that deviate from the norm. These are better than post mortems on failures as they paid for themselves.)
And when investors do analyse successful outcomes ex post, it often results in self-affirmation of skill, rather than the recognition that the outcome was driven by luck or systematic market movements.
This point advocates for sensible diversification across asset classes, investment horizons/ maturities, currencies and geographic locations.
11. Actions at the sharp end resolve all ambiguity. Organizations are ambiguous, often intentionally, about the relationship between production targets, efficient use of resources, economy and costs of operations, and acceptable risks of low and high consequence accidents. All ambiguity is resolved by actions of practitioners at the sharp end of the system. After an accident, practitioner actions may be regarded as ‘errors’ or ‘violations’ but these evaluations are heavily biased by hindsight and ignore the other driving forces, especially production pressure.
(See CDO's, black boxes and any obfuscation. Wall street is excellent at packaging and selling things, fairly mediocre at purchasing them. See the mutual fund industry performance among others in these regards.)
And yet investors keep putting cash into developed markets managed funds - when endless evidence points to benchmark under performance after fees.
12. Human practitioners are the adaptable element of complex systems. Practitioners and first line management actively adapt the system to maximize production and minimize accidents. These adaptations often occur on a moment by moment basis.
(wrong incentives, confusing short term motivations versus long term risks are part of the system. We need to design, organizations and regulations with this in mind.)
Some human practitioners adapt more quickly than others - even at a larger group level. New Zealanders' ongoing insistence on investing in domestic housing and shunning other asset classes is a great example of this. Learn, adapt, survive.
13. Human expertise in complex systems is constantly changing. Complex systems require substantial human expertise in their operation and management. Critical issues related to expertise arise from (1) the need to use scarce expertise as a resource for the most difficult or demanding production needs and (2) the need to develop expertise for future use.
(Expertise is a false notion in complex systems due to their changing nature. For this reason anything seeking to "optimize" a system can make it unstable.)
A good reason to keep one's knowledge current.
Regarding Nick's comment - I wonder if the optimisation in this sense would include new taxes and market regulation?
14. Change introduces new forms of failure. The low rate of overt accidents in reliable systems may encourage changes, especially the use of new technology, to decrease the number of low consequence but high frequency failures. These changes maybe actually create opportunities for new, low frequency but high consequence failures. Because these new, high consequence accidents occur at a low rate, multiple system changes may occur before an accident, making it hard to see the contribution of technology to the failure.
(see above and regulatory reform etc. The law of unintended consequences runs deep in complex systems. The ratings agencies offering stamps of approval to structured products is a classic case of this.)
15. Views of ‘cause’ limit the effectiveness of defenses against future events. Post-accident remedies for “human error” are usually predicated on obstructing activities that can “cause” accidents. These end-of-the-chain measures do little to reduce the likelihood of further accidents.
(Establishing boundary conditions etc. that assume point failure, human error (greed,stupidity and crowd blindness) need to be built in.)
The re-regulation of some markets and the activities of some market participants by current governments has to be a good example of this. The overly-restrictive nature of SarbOx implemented post-Enron et al. is also a good example.
16. Safety is a characteristic of systems and not of their components. Safety is an emergent property of systems; it does not reside in a person, device or department of an organization or system. Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system. The state of safety in any system is always dynamic; continuous systemic change insures that hazard and its management are constantly changing.
(see above large stable systems are the results of small stable systems. Consider the role of audit integrity and other sub functions of the financial system.)
17. People continuously create safety. Failure free operations are the result of activities of people who work to keep the system within the boundaries of tolerable performance. These activities are, for the most part, part of normal operations and superficially straightforward. But because system operations are never trouble free, human practitioner adaptations to changing conditions actually create safety from moment to moment.
(System participants need to look out for changes, be they over or under performance of a system normal behaviour. Keeping an eye out for "innovation" in finance is highly recommended.)
18. Failure free operations require experience with failure. Recognizing hazard and successfully manipulating system operations to remain inside the tolerable performance boundaries requires intimate contact with failure. More robust system performance is likely to arise in systems where operators can discern the “edge of the envelope”. It also depends on providing calibration about how their actions move system performance towards or away from the edge of the envelope.
(More work needs to be done discussing multiple points of failure in a system, including the hubris or collective myopia that lead to the failure. My own belief is that a culture or group which reflects hubris, is obsessed with over optimizing or believes that today's profit means they are "right", are the things to watch for. For quants out there: beware of geeks baring gifts.)
No comments:
Post a Comment