DARPA: AI’s Dogfighting Win Shows Potential For Teaming With Humans
Falco is an artificial intelligence agent, a machine-learning algorithm that is barely a year old but has the equivalent of 30 years’ experience flying the Lockheed Martin F-16.
- AI’s machine-precision control of F-16 wins the day
- Reinforcement learning involved billions of dogfights
Banger, identified only by his call sign for security reasons, is a graduate of the instructor course at the U.S. Air Force Weapons School with more than 2,000 hr. flight time in the F-16.
On Aug. 20, in the final event of DARPA’s AlphaDogfight Trials (ADT), Falco beat Banger 5-0 in air combat between two simulated F-16s. Falco had already beaten artificial-intelligence (AI) agents from Lockheed and six other finalists to win the right to fight the human.
With a winning tactic of maneuvering hard from the outset to take high-angle gunshots against its opponent, the “hyperaggressive” Falco showed “superhuman aiming ability,” according to DARPA’s competition co-commentators, Chris DeMay and Justin Mock—fine motor control that was honed by deep reinforcement learning over at least 4 billion training examples, says Heron Systems, Falco’s developer.
That was not true at the first trial, in November 2019, where the AI agents struggled to simply fly the aircraft. But progress was rapid, and by the second trial in January, “they were doing things that our pilots really thought looked a lot like basic fighter maneuvers,” says Col. Daniel Javorsek, manager of DARPA’s Air Combat Evolution (ACE) program.
“Even a week before Trial 1, we had agents that were not very good at flying. We were able to turn that around, and since then we’ve been really in first place,” says Benjamin Bell, lead developer of Falco at Heron, a small business based in California and the Washington, D.C., area.
The AlphaDogfight Trials were a precursor to ACE, which will culminate in live flight tests of AI-enabled automated dogfighting between full-size aircraft. The rapid progress in AI agent capability over the trials has given DARPA more confidence the algorithms will scale from simple 1v1 dogfighting to more complex, campaign-level air combat.
“The AlphaDogfight part of the program will increase the performance and trust of local combat autonomy, these individual, 1v1, tactical behaviors,” Javorsek says. “Then we’re going to expand that to team tactical behaviors, 2v1 and 2v2. Our hope is we’ll be able to scale these trusted algorithms to more complex campaign levels with multiaircraft operational behaviors.”
DARPA chose dogfighting because it is “a closed-world problem that an AI algorithm can learn really well,” says Tim Grayson, director of DARPA’s Strategic Technology Office. “At the same time, there are higher-level cognitive problems, the battle management, more strategic things, the intuitive decision-making that for machines are still a long way off.”
Quoting a former Air Force Warfare Center commander as saying “I’ve got to stop spending so much time training fingers and more time training brains,” Grayson himself says: “Imagine a skilled fighter pilot who can move from aircraft to aircraft without having to go through laborious training and recertification every time because the AI is doing the hard part—how to control the aircraft and do the tactical maneuvers. That intuitive battle management skill that the pilot has can then transfer from system to system.”
The outcome of the ADT comes with several caveats. The simulated dogfights took place between two unclassified JSBSim open-source models of the F-16. The AI agent had perfect-state information on its own aircraft and its opponent’s, which enabled it to exploit its fine-precision control. The pilot had a chair, replica controls and a virtual-reality headset but did not have to endure the physiological effects of the sustained high-g maneuvers that ensued.
Engagements were limited to simple 1v1 basic fighter maneuvers (BFM) and gun attacks. But instead of a gun, damage was inflicted by maneuvering a 3,000-ft.-long 1-deg. cone onto the target. This avoided the need to train the AI agent when to pull the trigger, data for which is “really sparse,” admits Bell.
Crucially, the simulations did not include the “bubble” around each aircraft required by training safety standards to avoid collisions. These rules do not allow pilots to pass within 500 ft. of each other and restrict gunshot angles to no more than 135 deg.—limits both aircraft “were violating routinely,” says Javorsek.
While such limits would not apply in real air combat, adhering to the training rules builds habits into human pilots, Banger contends. “I may not be comfortable pulling my aircraft into position where I might run into something else or take that high-aspect gunshot, and the AI would exploit that.”
Also the AI agents were not allowed to learn during the trial events. But the pilot was. This was clear in the fifth and final engagement, when Banger tried a different tactic: taking combat down to the minimum altitude or “hard deck.” In earlier trials, “our agents were hard-decking almost 50% of the time in defensive situations,” says Bell. Heron’s focus for Trial 3 was on zero hard decks. “You see the pilot trying to take advantage of that in the final example, and thankfully we didn’t hard-deck,” he says.
Heron credits Falco’s fine-pointing of the F-16 to a control strategy that emphasized smoothness. “We’re controlling it around 10 Hz. It looked like a lot of our competitors were controlling at 50 Hz,” Bell says. That limited update rate required the AI agent to know its trajectory for the next 3 sec. to keep its opponent within the 1-deg. cone of the “gun.”
“We saw that a lot with Lockheed, where we’re both nose-on, we’re both doing damage, but for whatever fractions of a second that they don’t have us in their 1-deg. cone, that’s when we’re racking up damage and they’re not,” he says. “That’s how we won a lot of those engagements.”
Bell also credits Falco’s success to Heron’s approach to reinforcement learning, a technique in which an AI agent is trained by being rewarded for certain actions. Falco was trained over a total of about five weeks through billions of dogfights against a league of 102 unique AI agents.
“We started off early with a league of agents,” he says. “We wanted to create multiple different agents that are all flying in certain patterns. They have different reward structures, different ways of controlling the plane and different neural network architectures. The league gave us the robustness so that, across the board, we were able to beat any opponent, including the human, that we went against.”
Heron used model-free reinforcement learning. “There’s no model of how the environment’s going to run. We’re not predicting the future state of the other plane or our own. It’s much easier,” says Bell. By avoiding the complex problem of modeling, Heron was able to start training Falco on Day 1. “It’s hard to do and if your model’s bad, then your agent’s going to end up worse,” he says.
Over in just 10 min., the human pilot’s one-sided loss to a machine unleashed a spate of comment and speculation online, ranging from “end of an era” to “just one more overhyped AI demonstration.” But DARPA has not set out to replace the human pilot. Instead, its ACE program aims to build trust in AI so that the pilot can focus on battle management while the machine flies the aircraft.
“If we convinced even a couple of pilots that what they were seeing out of this Heron autonomous agent looked like something that was intelligent and creative and making smart decisions in this dynamic BFM engagement, then I’m considering it a success because those are the first steps I need to create trust in these sorts of agents,” Javorsek says.
“If I were to walk away from today and say, ‘I don’t trust the AI’s ability to perform fine motor movements and achieve kills and [damage] that I’m uncomfortable with,’ I’d have a lack of integrity,” Banger says.
The mystery fighter pilot also joined in the speculation on how AI-controlled autonomous aircraft could change the face of air combat. “If I have an autonomous system out there, and we’re in combat against a singular adversary, I would love to have it take that high-aspect gunshot on the enemy,” he says.
There is also potential for “developing a wingman that has learned my assumptions so well that it’s able to predict with 98-99% probability what I’m going to do, and so we as a combat pair or four-ship become even more lethal,” he says. “For that reason, I don’t think you’re seeing the end of a human fighter pilot, I think you’re seeing the refinement into a human weapon system.”