Unrecognized by most who have written on Stuxnet, the malware contains two strikingly different attack routines. While literature on the subject has focused almost exclusively on the smaller and simpler attack routine that changes the speeds of centrifuge rotors, the “forgotten” routine is about an order of magnitude more complex and qualifies as a plain nightmare for those who understand industrial control system security. Viewing both attacks in context is a prerequisite for understanding the operation and the likely reasoning behind the scenes.
Both attacks aim at damaging centrifuge rotors, but use different tactics. The first (and more complex) attack attempts to over-pressurize centrifuges, the second attack tries to over-speed centrifuge rotors and to take them through their critical (resonance) speeds. […]
But how does one use thousands of fragile centrifuges in in a sensitive industrial process that doesn’t tolerate even minor equipment hiccups? In order to achieve that, Iran uses a Cascade Protection System which is quite unique as it is designed to cope with ongoing centrifuge trouble by implementing a crude version of fault tolerance. […]
The cyber attack against the Cascade Protection System infects Siemens S7-417 controllers with a matching configuration. The S7-417 is a top-of-theline industrial controller for big automation tasks. In Natanz, it is used to control the valves and pressure sensors of up to six cascades (or 984 centrifuges) that share common feed, product, and tails stations. Immediately after infection the payload of this early Stuxnet variant takes over control completely. Legitimate control logic is executed only as long as malicious code permits it to do so; it gets completely de-coupled from electrical input and output signals. The attack code makes sure that when the attack is not activated, legitimate code has access to the signals; in fact it is replicating a function of the controller’s operating system that would normally do this automatically but was disabled during infection.
In what is known as a man-in-the-middle scenario in cyber security, the input and output signals are passed from the electrical peripherals to the legitimate program logic and vice versa by attack code that has positioned itself “in the middle”.
Things change after activation of the attack sequence, which is triggered by a combination of highly specific process conditions that are constantly monitored by the malicious code. Then, the much-publicized manipulation of process values inside the controller occur. Process input signals (sensor values) are recorded for a period of 21 seconds. Those 21 seconds are then replayed in a constant loop during the execution of the attack, and will ultimately show on SCADA screens in the control room, suggesting normal operation to human operators and any software-implemented alarm routines. During the attack sequence, legitimate code continues to execute but receives fake input values, and any output (actuator) manipulations of legitimate control logic no longer have any effect. […]
The detailed pin-point manipulations of these sub-controllers indicate a deep physical and functional knowledge of the target environment; whoever provided the required intelligence may as well know the favorite pizza toppings of the local head of engineering. […]
The attack continues until the attackers decide that enough is enough, based on monitoring centrifuge status, most likely vibration sensors, which suggests a mission abort before the matter hits the fan. If the idea was catastrophic destruction, one would simply have to sit and wait. But causing a solidification of process gas would have resulted in simultaneous destruction of hundreds of centrifuges per infected controller. While at first glance this may sound like a goal worthwhile achieving, it would also have blown cover since its cause would have been detected fairly easily by Iranian engineers in post mortem analysis. The implementation of the attack with its extremely close monitoring of pressures and centrifuge status suggests that the attackers instead took great care to avoid catastrophic damage. The intent of the overpressure attack was more likely to increase rotor stress, thereby causing rotors to break early – but not necessarily during the attack run.
Nevertheless, the attackers faced the risk that the attack might not work at all because it is so overengineered that even the slightest oversight – or any configuration change – would have resulted in zero impact or, worst case, in a program crash that would have been detected by Iranian engineers quickly. It is obvious and documented later in this paper that over time Iran did change several important configuration details such as the number of centrifuges and enrichment stages per cascade, all of which would have rendered the overpressure attack useless; a fact that the attackers must have anticipated.
Whatever the effect of the overpressure attack was, the attackers decided to try something different in 2009. That may have been motivated by the fact that the overpressure attack was lethal just by accident, that it didn’t achieve anything, or – that somebody simply decided to check out something new and fresh.
The new variant that was not discovered until 2010 was much simpler and much less stealthy than its predecessor. It also attacked a completely different component: the Centrifuge Drive System (CDS) that controls rotor speeds. The attack routines for the overpressure attack were still contained in the payload, but no longer executed – a fact that must be viewed as deficient OPSEC. It provided us by far the best forensic evidence for identifying Stuxnet’s target, and without the new, easy-to-spot variant the earlier predecessor may never have been discovered. That also means that the most aggressive cyber-physical attack tactics would still be unknown to the public – unavailable for use in copycat attacks, and unusable as a deterrent display of cyber power.
Stuxnet’s early version had to be physically installed on a victim machine, most likely a portable engineering system, or it could have been passed on a USB stick carrying an infected configuration file for Siemens controllers. Once that the configuration file was opened by the vendor’s engineering software, the respective computer was infected. But no engineering software to open the malicious file, equals no propagation. That must have seemed to be insufficient or impractical for the new version, as it introduced a method of self-replication that allowed it to spread within trusted networks and via USB sticks even on computers that did not host the engineering software application. The extended dropper suggests that the attackers had lost the capability to transport the malware to its destination by directly infecting the systems of authorized personnel, or that the Centrifuge Drive System was installed and configured by other parties to which direct access was not possible. The self-replication would ultimately even make it possible to infiltrate and identify potential clandestine nuclear sites that the attackers didn’t know about. All of a sudden, Stuxnet became equipped with the latest and greatest MS Windows exploits and stolen digital certificates as the icing on the cake, allowing the malicious software to pose as legitimate driver software and thus not be rejected by newer versions of the Windows operating system. Obviously, organizations had joined the club that have a stash of zero-days to choose from and could pop up stolen certificates just like that. Whereas the development of the overpressure attack can be viewed as a process that could be limited to an in-group of top notch industrial control system security experts and coders who live in an exotic ecosystem quite remote from IT security, the circle seems to have gotten much wider, with a new center of gravity in Maryland. It may have involved a situation where the original crew is taken out of command by a casual “we’ll take it from here” by people with higher pay grades. Stuxnet had arrived in big infosec. But the use of the multiple zero-days came with a price. The new Stuxnet variant was much easier to identify as malicious software than its predecessor as it suddenly displayed very strange and very sophisticated behavior at the IT layer. In comparison, the dropper of the initial version looked pretty much like a legitimate or, worst case, pirated Step7 software project for Siemens controllers; the only strange thing was that a copyright notice and license terms were missing. Back in 2007, one would have to use extreme forensic efforts to realize what Stuxnet was all about – and one would have to specifically look for it, which was out of everybody’s imagination at the time. The newer version, equipped with a wealth of exploits that hackers can only dream about, signaled even the least vigilant anti-virus researcher that this was something big, warranting a closer look. […]
The new attack works by changing rotor speeds. With rotor wall pressure being a function of process pressure and rotor speed, the easy road to trouble is to over-speed the rotors, thereby increasing rotor wall pressure. Which is what Stuxnet did. Normal operating speed of the IR-1 centrifuge is 63,000 rpm, as disclosed by A. Q. Khan himself in his 2004 confession. Stuxnet increases that speed by a good onethird to 84,600 rpm for fifteen minutes, including the acceleration phase which will likely take several minutes. It is not clear if that is hard enough on the rotors to crash them in the first run, but it seems unlikely – even if just because a month later, a different attack tactic is executed, indicating that the first sequence may have left a lot of centrifuges alive, or at least more alive than dead. The next consecutive run brings all centrifuges in the cascade basically to a stop (120 rpm), only to speed them up again, taking a total of fifty minutes. A sudden stop like “hitting the brake” would predictably result in catastrophic damage, but it is unlikely that the frequency converters would permit such radical maneuver. It is more likely that when told to slow down, the frequency converter smoothly decelerates just like in an isolation / run-down event, only to resume normal speed thereafter. The effect of this procedure is not deterministic but offers a good chance of creating damage. The IR-1 is a supercritical design, meaning that operating speed is above certain critical speeds which cause the rotor to vibrate (if only briefly). Every time a rotor passes through these critical speeds, also called harmonics, it can break. […]
The most common technical misconception about Stuxnet that appears in almost every publication on the malware is that the rotor speed attack would record and play back process values by means of the recording and playback of signal inputs that we uncovered back in 2010 […]. Slipping the attention of most people writing about Stuxnet, this particular and certainly most intriguing attack component is only used in the overpressure attack. The S7-315 attack against the Centrifuge Drive System simply doesn’t do this, and as implemented in the CPS attack it wouldn’t even work on the smaller controller for technical reasons. The rotor speed attack is much simpler. During the attack, legitimate control code is simply suspended. The attack sequence is executed, thereafter a conditional BLOCK END directive is called which tells the runtime environment to jump back to the top of the main executive that is constantly looped on the single-tasking controller, thereby re-iterating the attack and suspending all subsequent code.
The attackers did not care to have the legitimate code continue execution with fake input data most likely because it wasn’t needed. Centrifuge rotor speed is constant during normal operation; if shown on a display, one would expect to see static values all the time. It is also a less dramatic variable to watch than operating pressure because rotor speed is not a controlled variable; there is no need to fine-tune speeds manually, and there is no risk that for whatever reason (short of a cyber attack) speeds would change just like stage process pressure. Rotor speed is simply set and then held constant by the frequency converter. If a SCADA application did monitor rotor speeds by communicating with the infected S7-315 controllers, it would simply have seen the exact speed values from the time before the attack sequence executes. The SCADA software gets its information from memory in the controller, not by directly talking to the frequency converter. Such memory must be updated actively by the control logic, reading values from the converter. However if legitimate control logic is suspended, such updates no longer take place, resulting in static values that perfectly match normal operation. Nevertheless, the implementation of the attack is quite rude; blocking control code from execution for up to an hour is something that experienced control system engineers would sooner or later detect, for example by using the engineering software’s diagnostic features, or by inserting code for debugging purposes. Certainly they would have needed a clue that something was at odds with rotor speed. It is unclear if post mortem analysis provided enough hints; the fact that both overspeed and transition through critical speeds were used certainly caused disguise. However, at some point in time the attack should have been recognizable by plant floor staff just by the old ear drum. Bringing 164 centrifuges or multiples thereof from 63,000 rpm to 120 rpm and getting them up to speed again would have been noticeable – if experienced staff had been cautious enough to remove protective headsets in the cascade hall. […]
Summing up, the differences between the two Stuxnet variants discussed here are striking. In the newer version, the attackers became less concerned about being detected. It seems a stretch to say that they wanted to be discovered, but they were certainly pushing the envelope and accepting the risk. […]
Much has been written about the failure of Stuxnet to destroy a substantial number of centrifuges, or to significantly reduce Iran’s LEU production. While that is undisputable, it doesn’t appear that this was the attackers’ intention. If catastrophic damage was caused by Stuxnet, that would have been by accident rather than by purpose. The attackers were in a position where they could have broken the victim’s neck, but they chose continuous periodical choking instead. […]
Reasons for such tactics are not difficult to identify. When Stuxnet was first deployed, Iran did already master the production of IR-1 centrifuges at industrial scale. It can be projected that simultaneous catastrophic destruction of all operating centrifuges would not have set back the Iranian nuclear program for longer than the two years setback that I have estimated for Stuxnet. During the summer of 2010 when the Stuxnet attack was in full swing, Iran operated about four thousand centrifuges, but kept another five thousand in stock, ready to be commissioned. Apparently, Iran is not in a rush to build up a sufficient stockpile of LEU that can then be turned into weapon-grade HEU but favoring a long-term strategy. A one-time destruction of their operational equipment would not have jeopardized that strategy, just like the catastrophic destruction of 4,000 centrifuges by an earthquake back in 1981 did not stop Pakistan on its way to get the bomb. […]
Somebody among the attackers may also have recognized that blowing cover would come with benefits. Uncovering Stuxnet was the end to the operation, but not necessarily the end of its utility. It would show the world what cyber weapons can do in the hands of a superpower. Unlike military hardware, one cannot display USB sticks at a military parade. […]
Stuxnet will not be remembered as a significant blow against the Iranian nuclear program. It will be remembered as the opening act of cyber warfare […].
Ralph Langner, “To Kill a Centrifuge: A Technical Analysis of What Stuxnet’s Creators Tried to Achieve”, November 2013, The Langner Group
Added to diary 21 May 2018