Australian Medical Student Journal

It’s often said that surgery is more art than science. Rubbish. Too much emphasis is placed on surgeons’ technical skills and not enough on the decisions behind them.

Any good surgeon can operate, better surgeons know when to operate and the best surgeons know when not to. Knowing when to operate and when to hold off relies on weighing up relative probabilities of success and failure between alternatives.

Good decision makers (and therefore good surgeons) base such decisions on quality evidence, and this is where science comes in. The evidence we seek is evidence of the true effectiveness of an intervention, and it is the scientific method that provides us with the most accurate and reliable estimate of the truth. Faced with alternatives, surgeons can sometimes make the wrong choice by being unscientific.

Surgeons often decide to do certain procedures because it’s what’s usually done, because it’s what they were taught, because it sounds logical, or because it fits with their own observations. If the surgeon’s perception of effectiveness and the evidence from scientific studies align, there is little problem. It’s when the two conflict that there’s a problem: either the surgeon’s opinion or the evidence is wrong. Worse, sometimes there is no good quality evidence and we are left with the surgeon’s opinion.

There is abundant evidence that surgeons overestimate the effectiveness of surgery, and considerable evidence of seemingly effective operations (based on observational evidence) turning out to be ineffective on proper scientific testing.

So what evidence should we rely on? Put simply, when you are trying to determine true effectiveness, the best method is the one that is least wrong, i.e., the method that has the least error. The scientific method is constructed to reduce error – we rarely know the truth, but we can increase the likelihood of our estimates containing the truth and we can make those estimates more precise by reducing error. In other words, we can never be certain but we can reduce uncertainty.

There are two types of error: random error and systematic error. Random error is easy to understand. If you toss a coin ten times, you may get seven heads, but that doesn’t mean the coin is unbalanced. Toss it 100 times and if you get 100 heads then you have reduced random error (the play of chance in generating such a result) and it is now very likely (and we are more certain) that the coin is unbalanced.

Systematic error (bias) is when we consistently get the wrong answer because we are doing the experiment wrong. There are many causes of bias in science and many go unrecognised, like confirmation bias, selective outcome reporting bias, selective analysis bias, measurement bias, and confounding. Systematic error is poorly understood and a major reason for the difference between the true and the apparent effectiveness of many surgical procedures.

The best way to test the effectiveness of surgery and overcome bias (particularly when the outcome is subjective, such as with pain) is to compare it with a sham or placebo procedure and to keep the patients and those who measure the effectiveness ‘blinded’ to which treatment was given. Yet such studies, common in the drug world, are rare in surgery.

In a study that summarised the research that has compared surgery to sham or placebo procedures, it was shown that the surgery in most such studies was no better than pretending to do the procedure [1]. And in the studies where surgery was better than placebo, the difference was generally small.

It’s not always necessary to compare surgery to a sham – sometimes comparing it to non-surgical treatment is sufficient. This is particularly the case for objective outcomes (survival, recurrence of disease, anatomic corrections) where blinding is less important. But you still have to compare it to something – to merely report the results of an operation with no comparator provides no reference for effectiveness beyond some historical control (of different patients, with possibly different conditions, from another place and another time). Journals are littered with case reports showing that most people got better after receiving treatment X but such reports tell us nothing about what would have happened to the patients if they did not receive treatment X, or received some other treatment. These types of non-comparative studies continue to sustain many quack therapies as well as common medical and surgical therapies, just as they sustained the apparent effectiveness of bloodletting for thousands of years.

However, even when comparative studies are done, they are not always acted upon. In a study looking at the evidence base for orthopaedic surgical procedures, it was found that only about half of all orthopaedic procedures had been subjected to tests comparing them to not operating [2]. And for those procedures that had been compared to not operating, about half were shown to be no better than not operating, yet the operations were still being done. The other surgical specialties are unlikely to be much better.

So there are two problems in surgery: an evidence gap in which there’s a lack of high quality evidence to support current practice, and an evidence-practice gap where there’s high quality evidence that a procedure doesn’t work, yet it’s still performed.

Part of the problem is that operations are often introduced before there’s good quality evidence of their effectiveness in the real world. The studies comparing them to non-operative treatment or placebo often come much later – if at all.

Surgical procedures should not be introduced or funded until there’s high quality evidence showing their effectiveness, and it should be unethical to introduce a new technique without studying its effectiveness. Instead, the opposite is argued: that high quality comparative studies (placebo controlled trials) are unethical.

Often, procedures that surgeons consider to be obviously effective are later shown to be ineffective. In the US in the 1980s, a new procedure that removed some lung tissue was touted for emphysema. Animal studies and (non-comparative) results on humans were encouraging. So the procedure became commonplace. A comparative trial was called for but proponents argued that this would deprive many people of the benefits of the procedure, the effectiveness of which was obvious.

Medicare in the US decided only to fund the surgery if patients participated in a trial comparing it to non-surgical treatment. The trial was done and the surgery was found wanting. This cost Medicare some money, but much less than paying for the procedure for decades until someone else studied it. This type of solution should be considered in Australia – only introduce new procedures if they are being evaluated as part of a trial.

The current practice of surgery is not based on quality science. If you got a physicist from NASA to look at the quality of science supporting current surgical practice they would faint. But it is getting better. It is getting better because of advancements in our understanding, because of the spread of evidence based medicine (in teaching and in journal requirements, for example), and because surgeons are understanding science better. The trials are getting better, but the incorporation of the results of those trials into practice is slow and often meets resistance because of suspicions that stem from a lack of understanding of science and the biases that drive current practice.

Billions are spent worldwide on surgical procedures that may not be effective because in many areas of surgery we still rely on surgical opinions based on biased observations and tradition. It is time for surgery to be a real science and to rely on the kind of evidence on which other scientific endeavours rely; the kind of evidence that we demand of other medical specialties and of non-medical practitioners. It’s not too hard. It’s not unethical. It’s right, and it’s time.

References

[1] Wartolowska K, Judge A, Hopewell S, Collins GS, Dean BJF, Rombach I, et al. Use of placebo controls in the evaluation of surgery: systematic review. BMJ. 2014;348:3253.

[2] Lim HC, Adie S, Naylor JM, Harris IA. Randomised trial support for orthopaedic surgical procedures. PLoS One. 2014;9(6):96745.