Podcast: The UK’s Air Traffic Meltdown

Tune in as Aviation Week Network editors discuss the air traffic control failure in the UK that led to the cancelation of thousands of flights, which the  UK’s ATC services provider, NATS, says was caused by a “one-in-15 million” event.

Routes Editor-In-Chief David Casey and ATW Europe and Africa Bureau Chief Victoria Moores discuss the initial findings of a report into the incident and what measures have been taken to ensure a repeat of the chaos does not happen again.

Don't miss a single episode. Subscribe to Aviation Week's Window Seat Podcast in Apple Podcasts and Spotify.


Rush Transcript

David Casey:

Hello everyone, and thank you for joining us for Window Seat, our Aviation Week Air Transport podcast. I'm David Casey, Editor-In-Chief of Routes, and I'm delighted to be joined by my colleague, Victoria Moores, ATW's Europe and Africa Bureau Chief. Thanks for being with us and welcome on board. On this week's episode, we'll be talking about a major incident of passenger disruption in the UK that saw thousands of flights cancelled and many more delayed. But unlike the summer of 2022 when the faster than expected rebound in air travel, coupled with labour shortages caused chaos at several airports, this incident was due to air traffic control issues.

In the UK, about quarter of a million passengers were affected following a problem with the UK's air traffic control on August 28th, a public holiday in the country and one of the busiest travel days of the year. NATS, the UK's air traffic service, was out of action for several hours during the day, and although airspace was not closed, the number of aircraft in the sky was severely restricted while the automated system was down. The outage was said to be the worst in more than a decade and led to the cancellation of over 1,500 flights.

Now, Victoria, we're both here in the UK at the moment, and it really was a chaotic situation. From speaking to friends and family, I know someone who started their journey in Barcelona, they had to reroute to Rome, then they flew to Paris, and then they had to take the Eurostar back to London. I had another friend who had their flight cancelled as they got to their gate, and then they had to queue for three hours just to get out of the airport again. Now, obviously passenger safety is of paramount importance when it comes to ATC, but it is understandable that there are questions as to why a meltdown of this scale was able to occur. Since the incident we have heard from NATS who released a report on the 6th of September to the UK government. What is NATS saying actually happened?

Victoria Moores:

Thanks, David. The very quick answer to that, which I'll go into in a bit more detail is it's a little bit like when you try to save a file on your computer and that file already exists. So that's a quick and easy summary. The more detailed version is that when an airline wants to operate a flight within European airspace, they have to file their flight plan. So the details of what their planned routing is, the information that's required to organize that flight in terms of air traffic control. They have to file that to EUROCONTROL, which is the centralized body for European air traffic navigation. EUROCONTROL takes that information and it forwards it on to any national air traffic providers that are involved in that routing. So they'll take the information, and in this case, they will have sent it on to NATS, the UK air traffic provider.

Now, when this lands into NATS's system, what happens is it's processed by a very catchy named flight plan processing system, which is called the Flight Plan Reception Suite Automated - Replacement, the FPRSA-R. And what that did was it's supposed to basically extract the elements that were important for the UK part of the air traffic routing. Now, along the way of these flight plans, you have what's called waypoints, and they're basically markers along the journey that signal the progress through the airspace. What happened in this specific incident is basically there were two waypoint markers that had the same name. And it's interesting looking into the NATS report that says that basically there's an awareness of the fact that there are duplicate waypoint names, but in most cases nowadays they're located a very long distance apart. And I think in this case, the two waypoints were about 4,000 miles apart, so there's significant distance.

What happened was one of the entry points at the beginning of the journey had this same name as well as one of the points at the very end of the journey that had the same name. And the system basically said, "I can't compute because how can the journey begin and end at the same point." And because it couldn't compute, it couldn't reject the report because the system didn't understand the problem. But also it couldn't forward the report onto the air traffic controllers to navigate the flight because the system wasn't sure that it was going to be safe to pass on to the controllers. So the system collapsed and dropped out as it's meant to do. It's a safety precaution that it switches to manual override if there's a problem like this. A backup system kicked in. That backup system experienced exactly the same problem and also reverted to manual override.

So the whole of this sequence of events happened within 20 seconds at 8:32 in the morning, basically pushing NATS over to manual flight planning. So what happened in terms of the airline perspective is that NATS had enough information to keep on going for about four hours from the initial failure. So their automated systems have this four hour buffer and that four hour buffer meant that things weren't going to get really bad until about 12:30 because that gives some time to fix the problem. So from 8:30 to 12:30. By about 11 o'clock, they told the airlines, we've got a problem. Actually, they didn't tell the airlines. That's something we'll come back to later, but the airlines realised that there was a problem and then there was disruption from 11 o'clock through until three minutes past six UK local time when all of the restrictions were eventually removed. But the remove of the air traffic control restrictions doesn't mean that there wasn't disruption because obviously there was a ripple effect from there onwards. So in a nutshell, it was trying to save a file with the same name, kind of.

David Casey:

Thanks for that explanation, Victoria. I think you've put it far more succinctly than I could do after reading that 17 page report. So NATS had said that the system has processed more than 15 million flight plans over five years that it's been in service, and this is the first time that something like this has happened. So in one sense, it's positive that the primary and the backup systems performed in the way they did, meaning that no incorrect information was passed to air traffic controllers. But it's clearly understandable that questions have been raised about NATS operational resilience. So what steps do they say have been taken to prevent a recurrence like this in the future?

Victoria Moores:

The main thing that they've done is they've noticed the fact that this kind of situation, which like you say, is incredibly rare to have this specific chain of events. They now know that that would cause the system to switch to manual control. So they've identified that and basically the air traffic control systems provider, which is Frequentis, is coming up with a solution, which means that this specific situation won't trigger a critical incident, which is where it switches over to manual control because basically it should identify and reject that one flight plan and not take the entire system down. So that's the main piece of learning from this is that they've had to change the systems to mean that this cannot happen again.

David Casey:

Okay. And what has been the response then from airlines to this report? There has been some discussion, I think particularly from Ryanair, that some of the points raised might not be accurate.

Victoria Moores:

So in true Ryanair style, they're saying that they reject the “whitewash” report, which has been released by NATS. They reckon that there are numerous inaccuracies, and part of that is about the estimation of the impact on flight operations in terms of cancellations and also delays. But Ryanair does raise one very good point, and they say that NATS knew about the systems collapse at 8:32 in the morning, so that was that 20 second window when both systems went down, but the airlines themselves weren't told until three hours later. And then Ryanair says even then it was EUROCONTROL, not NATS that let them know at around 11 o'clock. So Ryanair's question there is ‘how come NATS didn't tell us and how come they didn't tell us sooner?’

Obviously, there's other questions in there about how come one flight plan was able to bring the whole system down. Again, in true Ryanair style, they say that clearly the NATS backup system is useless, but obviously NATS has taken precautions to make sure that the backup system is independent from the original system. So it runs on different servers, it runs on different hardware. It's a completely siloed system, but if that system hits the same problem twice, then it brings it down as we've experienced in this case. Ryanair's also criticizing NATS about the fact that they're not talking about costs. Who's going to cover the bill for this?

David Casey:

There's potentially more questions then to answer for NATS over the coming weeks. And we are expecting an independent review to report back by the end of September, which will set out lessons learned about the future and outline any action that the UK CAA may take against NATS if it's found to have breached its licencing obligations.

Victoria Moores:

Yeah, I think that's an interesting point, David, because it did seem like quite a strongly worded response from the UK CAA, and they have said that they will take action against NATS if they find that they didn't meet their licence requirements. So that's going to be an interesting one to watch.

David Casey:

Okay, thanks. One thing I found interesting in that NATS report is in the last line, which actually says that it's not within NATS remit to “address any wider questions arising from the incident such as cost reimbursement and compensation for associated disruption.” So of course, airlines have had to bear the brunt of the cost of care and assistance charges because of these cancellations, and that's on top of the cost of disruption to crew and aircraft schedules. So what's happened there, Victoria?

Victoria Moores:

Yeah, there's several threads to the response to that, David. So I've seen conflicting reports as to whether or not NATS is going to have to pay up at all for this. So on the one hand, I think it's IATA who's saying that the buck is going to stop with the airlines. They're liable for unlimited care and assistance under passenger compensation rules. We might talk about that separately. But NATS is also, they have a contract with the airlines about minimum service provision, and there was some suggestion in one article that I read that if they haven't met their minimum service targets as a result of this outage, they might need to pay some compensation or reduce their costs back to the airlines. But also at the same time, if NATS says that this is a safety issue, obviously that's number one concern. You've already mentioned that. So whether or not they'll be required to contribute to the cost is unknown.

The passenger compensation rules do contain a clause. That means that if airlines are having to pay out for care and assistance, they can seek to reclaim against other players in the value chain. The problem is it never works. As far as I know, I'm not aware of any cases where any provider has actually had to pay up to compensate the airlines, even though it exists in the regulation, which is another reason why there's been calls for the reform of the rules.

David Casey:

It's understandable then that there's such calls for reform of passenger rights, and it's not a new issue, is it?

Victoria Moores:

This has been a bit of a contentious issue for a while, David. So basically the EU 261 regulations, there's been calls for reform for a long time, which really got amplified back in 2010, so 13 years ago, around about the time of the eruption of the Icelandic volcano, Eyjafjallajökull. So what happened there was that airlines weren't able to fly. In that situation airspace was closed, but airlines were still liable for unlimited care and assistance. You have to break down the two. There's care and assistance, which is hotel and meals, and then there's compensation. If airlines can demonstrate that it was extraordinary circumstances, and the definition of that has been a bit unclear, then they don't have to pay compensation, but they do have to pay for care and assistance still. So airlines have been saying, "We cannot be the insurer of last resort. If it's illegal for us to fly, you can't expect us to then pay out unlimited amounts of money."

So basically there was a call for reform going back a long time. The European Commission proposed in March, 2013 that these rules would be changed, but that got stalled. I won't go into the detail of that, but it was to do with a sovereignty dispute between the UK and Spain over Gibraltar. The UK left the European Union, which means that technically that bit of dispute should have been unlocked. The European Commission conducted some research which really said that the need for reform has become even more urgent, and that the cost of passenger compensation has been going up by about 13.4% per year, the cost for airlines of compliance, that was up until 2018. And so even though it's a known issue that these rules need to be looked into, it's just not really moving anywhere.

And in the meantime, because of Brexit, the UK now has its own passenger rights rules, which are separate from the European rules, but they do mirror them for now. And at the moment, I think the UK government is looking into reforming its own passenger rights rules. So there's a lot going on there, but also not a lot of immediate change coming out of that. And in the meantime, we're seeing a lot of reports in the mainstream media urging passengers to claim the compensation that they're entitled to. While airlines are saying, "Hang on a second, we're not responsible for situations that are beyond our control, how can we have to pay for that?"

David Casey:

Thanks, Victoria. So essentially, the need for reform is pressing, but it seems like it isn't moving fast enough. Well, we're just about out of time for this week's episode, but Victoria and I will be back next week when we'll be joined by ATW senior editor, Aaron Karp, who's based in Washington, DC to talk more about ATC disruptions in Europe and the U.S. So, thanks for joining me, Victoria. Thanks to you, our listener, and thanks to our producer Cory Hitt. If you enjoyed this podcast, make sure you don't miss us each week by subscribing to Window Seat on Apple Podcasts or wherever you listen. This is David Casey, signing off from Window Seat.

David Casey

David Casey is Editor in Chief of Routes, the global route development community's trusted source for news and information.

Victoria Moores

Victoria Moores joined Air Transport World as our London-based European Editor/Bureau Chief on 18 June 2012. Victoria has nearly 20 years’ aviation industry experience, spanning airline ground operations, analytical, journalism and communications roles.