MITRE ATT&CK for ICS is a standard framework for understanding the diverse tactics that adversaries use to compromise and pivot through ICS/OT networks. Unlike ATT&CK for Enterprise, ATT&CK for ICS focuses on adversaries whose primary goal is disrupting industrial control processes, stealing intellectual property, or causing safety incidents by attacking industrial control systems.
Listen to our webinar with SANS to learn about:
- The 11 classes of tactics described in the MITRE ATT&CK for ICS Framework.
- How to use the framework to improve your ICS security posture.
- How a real-world ICS attack would be detected by CyberXs purpose-built IoT/OT security platform and how to map the attackers tactics to the MITRE framework.
Phil is VP of IoT & Industrial Cybersecurity at CyberX, a Boston-based security firm founded by blue-team experts with a track record of defending critical national infrastructure. Prior to CyberX, Phil held executive roles at IBM Security/Q1 Labs, Symantec, Veracode, and Guardium. Phil began his career as a Schlumberger engineer on oil rigs in South America and as an engineer with Hydro-Quebec. He has a BSEE from McGill University, is certified in cloud security (CCSK), and has a 1st Degree Black Belt in American Jiu Jitsu.
Joe has over 20 years of both leadership and hands-on experience with enterprise security leaders including Algosec, IBM, Guardium, and Checkpoint Software. At Algosec, he established and led the company’s technical sales engineering function for the Americas and was later promoted to lead the function worldwide. At IBM, he was director of sales engineering and IBM InfoSphere Data Governance Center of Excellence Leader. He previously led worldwide sales engineering for Guardium, which was acquired by IBM for $220 million. Prior to IBM, he was Checkpoint’s first sales engineer and later rose to the position of Director of Systems Engineering. Joe holds a Master’s Degree in Computer Science, a Masters of Arts degree, and a Bachelor’s Degree in Mechanical Engineering.
Today’s speakers are Phil Neray, VP of IoT & Industrial Cybersecurity, and Joe DiPietro, VP of Customer Success, both at CyberX.
Thank you, Carol, and welcome everyone. Good afternoon, good evening, and good morning depending on where you are. Thank you for joining us before the weekend. We have a lot of content. I’m going to start with a quick introduction to set the stage for Joe, who’s going to do the deep dive. I just want to start with a description of the problem that we are addressing here, which is that devices that you find in ICS environments, whether they’re OT devices or what some people might call IoT or IIoT devices, are typically unmanaged. They’re unseen. The IT department doesn’t know that they exist, doesn’t know what pipes are there, would not know if they were compromised. You can’t put agents on them. They were designed either with legacy systems many years ago, without security in mind, or even with the newer IoT devices, often without security built in, they have weaker default credentials, typically assembled from open source components that are riddled with vulnerabilities, ideal for attackers that are often connected directly to the internet. I’m thinking here now more about cameras or the newer IIoT devices, and as a result, the CIO or CISO that previously was just considering protecting desktops and servers now have an attack surface that’s three times larger than it was just a few years ago. That translates into business risk, and I’m going to give a couple of examples. We’ve seen ransomware shutting down factories, NotPetya being a very notable example, but more recently LockerGoga shutting down Norsk Hydro. Even earlier this year, we saw ransomware shut down a major US port as announced by the DHS. So that leads to a business risk of downtime and lost revenue.
We’re going to go into a little more detail on the Triton attack on a safety controller in a petrochemical plant that was unique in that they were going after the safety controllers, the very things that are supposed to shut down the plant when unsafe conditions are reached, and there was ICS-specific malware used in that attack. We’ve seen the VPNFilter malware that was launched to attack VPN routers – ideal targets, again, because they’re internet-facing on one side and connected to corporate networks on the other side – and that malware specifically could be used for man-in-the-middle attacks, packet sniffing, and compromising endpoints on the corporate network as well. Finally, in August of 2019, Microsoft announced a campaign that used a week of voiceover IP phones to install backdoors into the systems that could then be used to access the corporate network.
The Verizon data breach investigations report came out just a few days ago, and one of the interesting things was the sector-specific stuff on manufacturing, and intellectual property is yet another risk that shows up with these devices. According to the report, more than one out of four breaches in manufacturing are motivated by cyber espionage as compared to financial motivations, which are the bulk of the other ones. 38% of the breaches in manufacturing come from nation-states. Here we’re talking about theft of formulas in the case of pharmaceutical or chemical firms; innovative designs, if you’re an automotive parts manufacturer or designing anything else; and proprietary manufacturing processes.
What Gartner is saying is that most organizations aren’t even aware of the cyber-physical systems they have, so asset discovery becomes very important, and increasingly we’re going to see laws that hold C-suite leaders personally liable if they aren’t focusing on safety and security in their enterprise as much, as we saw with Sarbanes-Oxley a few years ago. Combining responsibility and accountability for IT, OT, and CPS into a single function is really the only way to go, not only in terms of a governance point of view, but also because many of these attacks may start on IT and move to OT, or start on OT and move to IT. So, you really need to have a unified view of what’s going on. And that, typically from we’ve seen in our client base, is the CISO’s organization.
Just a quick update on how things have changed in the last few months – of course, many employees and third-party contractors, such as the folks maintaining your OT devices, are working remotely. That’s led to a 5x increase in the traffic to corporate networks by Remote Desktop Protocol (RDP), but also remote access methods. What’s interesting about that is that RDP is the number one attack factor for ransomware, because often those RDP servers have weak or easily stolen credentials, and also because there are a number of vulnerabilities, like DejaBlue, that attackers can use to go over RDP. These are just another way for adversaries to get in, and they’re hoping to blend into the sea of legitimate traffic that’s going in over RDP during these times.
Now I’d like to hand it over to Joe, who’s going to do the deep dive. Joe, it’s all yours.
Thanks Phil. For the technical portion of the presentation today, we’ll go through a short background on industrial control systems (ICS). We’ll look at the MITRE ATT&CK framework specifically for ICS, and then we’ll look for how you can improve your security posture. We’ll provide a little demo, we’ll talk about NotPetya and Triton specifically, and then we’ll provide a summary and some recommendations.
As part of the background, networks have been around for a long time. I’m showing you an enterprise network, and in it are many different components. A lot of times, you’re going to have a core switch, distribution switches, edge switches, and you have people with printers and so forth. This is really part of the IT infrastructure, and when we look at operational technology (OT), they use a thing called the Purdue model. The Purdue model takes that whole IT network and moves it up into what’s called level 4 and level 5. From there, there’s a DMZ that’s built between the enterprise network and the operational technology network. So that’s kind of a demark. When you hear people in the OT world say something is on level 3 or level 3.5, they’re talking about the industrial DMZ between the OT network and the enterprise network. From there, when you build OT networks, you have a supervisory control portion, and that’s called level 2. Where the process PLCs and RTUs live, that’s in level 1. Then you have the field control units, and that’s in level 0.
Let’s take a little bit more of a drill into the OT portion of this, and you may hear terms about operational technology, industrial internet of things, industrial control systems, or SCADA. Those are all of those shaded sections in the OT portion. One big thing we’ve learned is that in order to get alignment between IT and OT, this is really critical for success. If you’re going to do any kind of security, the IT security team and the OT team need to work together in different groups, need to provide different insights and visibility and information sharing in order to be successful. Understanding the political boundaries between the organization is really key to success, and we’d highly recommend doing an organizational alignment workshop to make sure that we understand what the goals are for any type of OT security project.
With that, let’s take a look at a brief introduction to some of the ICS components. We have programmable logic controllers (PLCs), and these things are responsible for the connected sensors and to open valves, relays, things like that. The other term is a remote terminal unit (RTU), and this is used mainly in wider geography, whereas a PLC would be for a local plant, for example. Some of the vendors are ABB, GE Grid, Honeywell, Schneider, Siemens – all excellent vendors for helping with our industrial control processes. From there, in order to run the system and make sure that everything is working correctly, you have a human machine interface (HMI). If we look at what happened with Stuxnet, this is where the information was spoofed to say that everything looks green for this particular tank, while the centrifuges were actually spinning out of control. The human machine interface is sometimes very targeted to be compromised with the industrial control malware. Next, the engineering workstations (EWS). This is where you program so that this ladder logic is pushed down into the PLC, so it knows when to operate the unit successfully and listen to all of the input devices and things like that. Lastly, you have the historian, which is used to pull all the data from the OT network and provide insights into how the systems and processes are run and how it can be improved. There’s a wealth of information inside the historian.
This is a brief overview of the components, and now let’s map those components into the Purdue model. We talked about the field control level. This is where the actuators and sensors are located, and these are the PLCs and RTUs that are responsible for opening and closing those particular components. Then in the supervisory layer, you may have your HMI and engineering workstations and historians. Some people will put a historian in a DMZ and so forth.
When you talk about MITRE, they have two different type of frameworks, the enterprise matrix, which we will not cover today, but that covers the IT network. What we’re going to focus in on is the ATT&CK for industrial control systems. This framework is specifically in the Purdue model, looking at levels 2, 1, 0, and a little bit of level 3.
So, what is the MITRE ATT&CK framework for ICS? It’s a common language that allows people to understand where the threats are happening, what assets are involved in the threats. They give a higher-level tactic – and everybody’s probably familiar with TTPs, the tactics, techniques, and procedures on how attackers go about their business to try to infiltrate the networks. This gives us a common language, and you have an applied timeline of how close the adversary is to their objectives.
So that’s the 11 tactics that we’ll cover today. The next piece of this is the techniques that are associated with them. MITRE has done an excellent job and has a technique matrix, where they take these 11 tactics and break them down into the columns, and now all of those 81 techniques drill down under there, so you can have the common framework. We look at software used by ICS threats, so in the MITRE ATT&CK framework for ICS, they’ve identified some software, like Conficker, LockerGoga, Triton, NotPetya, and so forth, and also the adversary groups that are associated with these.
Let’s take another look at a timeline, and then we’ll get into more details. If we look at a timeline of growth from the threats to critical infrastructure, you can see Stuxnet was the first industrial control malware that was in 2009. Black Energy hit the Ukrainian grid, and that was specifically targeted at SCADA for the backup drives, batteries, and access to the controllers. From there, we went to Industroyer, which took down the Ukrainian electrical grid and shut down the Siemens relay. In 2017, we’re talking about EternalBlue, which was stolen from NSA, and that was part of the infrastructure that was used for WannaCry and NotPetya. DHS in 2017 confirmed Russian threat actors were inside of the US critical infrastructure, because we saw screenshots of the HMIs. In 2018, the world learned of Triton and the attack on the petrochemical facility, and then VPNFilter was in 2018, as well as LockerGoga. So these give you just a rough timeline for all of the different attacks.
What I’ll do now is go into a threat matrix. What you’re looking at here is CyberX advanced reporting dashboards. We can map many different components, whether they’re internal to the data that CyberX collects, or even pull in external information.
One of the elements that the MITRE ATT&CK framework maps out is the software by the threat actors. As you see over here, like Dragonfly, they did Backdoor and Havex. Lazarus Group did WannaCry. If we look at Sandworm, they did Industroyer and NotPetya. XENOTIME did Triton. The interesting thing is now we start to map part of these into the MITRE tactics. So what groups, for example, are involved in collection? You can see, at a very easy glance, those particular actors. Next, if you look at command and control or execution, this gives you a way to start looking at who may be attacking your environment so that you know what their behaviors are. And here is initial access. If you look at this, everybody is doing initial access into the environment. So be aware that you already may be compromised, and the goal is to identify that very early in the lifecycle. Once you get into the environment with initial access, then you’re starting to move laterally within the environment. This is a component of what we’re talking about with threat groups.
Now let’s look and map the 11 tactics over a timeline. I’ve grouped these into four components, and right now we’re talking about the initial access. When you look at the initial access, what you’re looking at is the adversary trying to get into your ICS environment. From here, once they get in, they’re going to try to discover information within your environment. So, by discovering this information, they’re trying to figure out what components specifically – is it ABB, GE, different controllers within your environment?
Then their collection in mapping it. So, the first initial access – they get a spotlight on a small view of your network, and then what they try to do is correlate and categorize all of the components within your environment. From there, they’re going to try to do some lateral movement. Here we have lateral movement, evasion, and command and control. Lateral movement is trying to identify how your network is connected, and if you’ve compromised what component, what’s my next hop into the environment in order for me to get to my ultimate objective of your industrial control system? When you look at the evasion, the adversary is trying to be avoided from detection, so this is where you want to put as many landmines in your environment, different controls so that alerts will be tripped as they start to move laterally and they’re caught once they hit one of those triggers.
From here, we move on to execution and persistence. With execution, the adversary is trying to run their malicious code within the environment. Persistence is when they’re trying to maintain their foothold so that they don’t get kicked out from some of the systems that they’ve already penetrated. It’s very hard. Typically, this is a long phase between these couple of cycles. Lastly, you can see the impact process, which is the adversary trying to manipulate or interrupt or destroy part of your ICS components, the data to surrounding environments. The impair process is when they’re trying to manipulate or disable part of your physical controls and the processes associated with that. Then in inhibit response, they’re trying to prevent you from seeing what’s actually going on or being able to access your safety systems, quality assurance, and things like that.
So, these things have a far-ranging ripple effect once they get into this particular component of the tactics and the timelines. I know there are a lot of IT security folks on the call, and so if we make the ICS MITRE tactics and map them into a kill chain, that might be a little bit helpful, and they’re very similar. What you can see from an intrusion is first you do reconnaissance, and that maps out very well with initial access and discovery, and then you’re going to want to weaponize your information. This is pairing your remote access malware with the exploits that you intend to deliver, and when you deliver them, that’s when they’re actually involved into the environment, and from there you’re going to look to exploit them. Once it’s weaponized, then you can start to run these exploits on the vulnerable applications or systems.
From there, you’re going to install more of the components, the backdoors that allow you to have persistence access, and that sets you up for your command and control of functions. From there, you can have actions on your objectives. This is the actual goal that the attacker is trying to achieve. You can see the color scheme here. When you get down into the red area, you really need to act very quickly and hopefully you catch things early in the timeframe before they have a chance to move within your environment. Let’s take a look at NotPetya as an example. This caused wide-ranging malware and plant shutdowns that dramatically affected a lot of the industry. So the first technique used was external remote services, and the tactics that are involved in this are initial access and lateral movement.
From there, they had exploitation of remote services. Nowadays with everybody working from home, you really need to monitor your remote desktop connections, your SSH, team viewer, and things like that, because these are really used to do your lateral movement. If you look our CyberX alerting system, this is when you’ll get different alerts. Here’s an alert specifically for NotPetya. The interesting thing here is if you go into the details of it, this involves SMB version 1, EternalBlue and EternalRomance. You can see it right here – Windows server, that these are the underlying tactics that are used for that particular event. From there, the lateral movement, because this worm spreads very quickly within the environment. It used remote file copy as part of the technique, and so you can look for different alerts and events on file transfer.
Here’s a file transfer with an SMB protocol, which could be used in this type of attack. Lastly, you see the loss of productivity and revenue when plants are shut down. Especially in the pharmaceutical market that got hit by WannaCry and NotPetya, it was devastating for part of their bottom line. The impact is really dramatic. If we look at that particular attack – and now we’re going to map it into that Purdue model so we can see the assets that are affected – it initially affected the IT networks. In some cases, you’re going to come in with the initial access and lateral movement, and this is your technique, 822 for external remote services. So really have a key to monitor your remote services, and have jump servers so that they are consolidated on a single point and you don’t have access to everybody on your VPN. From there, the lateral movement happened where they did exploitation of remote services. So, this went from the enterprise network, crossed some firewalls into your DMZ, and threw it into the OT network. These are the systems from the engineering workstations, historians, and HMIs, where this was remote file copied into these systems to prevent them from doing their job. From a timeline perspective, this thing happened dramatically fast. We were talking with one client that’s in the manufacturing world, and they said that within four minutes of getting compromised, their plant was shut down. Then it was starting to work on the remote connections, because they had remote facilities in remote plants, and it was starting to affect their wider network. They reported slowdowns on their wide area network as well, within about 12 minutes.
The timeline happened very quickly, but also notice here that we only used a couple of the tactics from the MITRE framework – the initial access, the lateral movement that was used in order to compromise the environment, and then persistence and things like that were because all the systems were infected. You don’t have to go through all 11 tactics in order to be compromised, so bear that in mind. This slide goes over some of the details as I mentioned before. Please look at your remote access connections, remote desktops, anything like that with your remote services. This shows you specifically the lateral movement in the assets that were affected with this, for the HMIs, data historian, and engineering workstations. Here’s the procedural example of all of the other malware that uses this type of exploit.
Next we’ll look at the ICS matrix, and this goes over all of the techniques that are associated with it. From here, let’s go into how we’re going to map all of these things and put it into a common framework. I highly suggest you look at MITRE framework, and it needs to be identified and unified with your SIEM platform. We’re going to look at a couple of different examples of that. We’ll start off with Splunk, and then we’ll talk about QRadar, Sentinel, and so forth.
If we look at Splunk, what you see here is CyberX sensors sending the information and the additional information which maps with the MITRE ATT&CK tactics and techniques. Over here you see the tactics that are associated. Here’s the technique and here’s a port scan, and we add on additional information. In our case, we can give you what protocol it is. Sometimes it’s generic, sometimes it’s like Modbus or Siemens or so forth. Then the type of alert – in our case, we have five separate analytic engines that help you understand if this was an anomaly, a protocol violation, a policy violation, or specifically malware associated with that. You can see a lot of good details, and we’re able to categorize this in your Splunk systems and then pull off different information to make it more visual. You can really get your insights from impairment of process, if you have a Roadmaster device within DNP3, for example, or commonly used ports was a command and control technique that was used. We really want to see and avoid, especially if you have device restarts and shutdowns. So, this is how we can integrate with Splunk.
Now let me go look at QRadar – I’m going to pull up a quick video for it. As I play this video, this is showing you the 11 attack techniques or tactics within the MITRE framework right from the MITRE webpage. You have a lot of information there, so I’d highly encourage you to take a look at that after the session. From there, we look at the matrix that is involved. From the initial access and moving laterally, those are probably the most common, but here I’ll show you how we’re going to actually configure the CyberX system to forward this information to QRadar. It’s a very simple process where we configure the system. Now what I’m going to do is I’m going to play a couple of packet captures (PCAPs) to simulate an attack on the network. You see I have zero alerts, and as I play these PCAPs, it’ll happen very quickly. And you can see all the PCAPs being paid played, and they’re finished at this point, and now alerts start to jump up – so you can see five alerts and there’ll be a few more coming in over time. Let me click that button and get access into the system. So here, those PCAPs identified that there was an outstation configuration change, parameters were sent, there was illegal DNP3 operation, I have a master slave authentication issue.
So now let me jump over to the QRadar system and run a quick query so that we can filter on those alerts and also see what we’ve added onto it from the MITRE ATT&CK framework. Now we see the sender’s name. So, this is from a Power Plant1 sender. The tactic and the techniques are all paired right there – inhibit response function, because there was a modified control logic. And this is in the CyberX engine. That’s a protocol violation. We also have device restarts and shutdowns, and when this happens, it means your outstation was restarted in the specific CyberX alert title. So, the next one that we’ll look at is a master rating, for example, or commonly used ports. All of this information can be analyzed and integrated in with your SIEM. So, this is one layer in order to unify part of your integration between IT and OT.
The next piece of it that I’d like to cover is we can also look at this information within the CyberX advanced reporting dashboards. We have a MITRE ATT&CK dashboard where you can see all of the different categories. What I’ve done is played a whole bunch of PCAPs, as you see here. The number below the categories is showing you how many events of that type. So, for initial access, I have 16 events. If I’m interested in what techniques are associated with that, then I click a button and I can see I have 16 alerts with internet accessible devices. As we move through the food chain and I look at what discovery techniques were used, we have a lot of them – service scanning, connection enumeration, where they’re trying to identify what services are available on a particular host network. Sniffing is another technique, as well as control device identification. This tells me I’m trying to figure out what type of systems – is it ABB, Schneider Triconex, things like that, right? Then I’m also going to collect information. So, when you start to collect information, what is the monitor process state? Is it in run mode or program mode? We have automated collections that we can use as well, and all of these techniques are grouped under this labor.
Notice in our dashboard, the further down you go, the further they are in your environment. So now we’re looking at a lateral movement, where somebody is trying to use valid accounts or living off the land to try to access other systems. In the CyberX world, we’ll trip this if they have valid accounts but are coming from a different type of system. So, if they’ve compromised one system, engineering workstation, and now they’re trying to penetrate the HMI and the engineering workstation never talks to the HMI, this would trip an alert. You may have other devices, like printers, and other things in your environment. Next you look at evasion, and with evasion you can have the operating mode. So, if you change the operating mode from a PLC, right? They want to rechange it so they could reprogram the device, or you’re looking at rogue masters and so forth.
This gives you a whole breadth of information on how the dashboard can help you out, but a lot of times you want to look at a little bit more detail. Let’s go to another screen that we have, and this shows all of the alert techniques by the tactics that they’re associated with. You can see here that discovery is by far the most alerts that are being tripped because the attacker is in your environment. They don’t know what’s there. They’re trying to discover all that information. So, if we click on discovery, you can filter on that information and then you can scroll down below and you can see abnormal uses of Mac addresses. Maybe they’re doing some IP school thing over there. If I click on initial access, my dashboard automatically filters for all of the initial access, and you can see I have unauthorized internet connectivity detected. From there, maybe I look at the lateral movement. So here what you can see is exploitation of remote services with a PsExec. These are PowerShell commands that you should really be concerned about if you see these alerts within your network. And then there’s unauthorized SMB logins and so forth. If you look at persistence as well, this is another way where somebody has valid accounts and they’re logging in multiple times. So, these are things that you should be concerned about. From that side of it, the next thing that we can do is a little bit of threat hunting.
What do we mean by threat hunting? Let’s look at command and control, and let me drill down into the tactic or technique that’s associated with that. For this technique of 869, what are the threat actors that use this particular technique? This maps out the perpetrator groups that are using this. And if you use threat hunting or lateral movement, things like that, with invalid accounts, so you can really drill down into the environment or persistence, like with firmware and things like that. If you want to get more detail, you can double click and look at SMB logins. So how many occurrences did I have? When did it happen? What are the specific IP addresses that are associated with it? And so forth. This information is really easy to use, and the goal is to give you the information at the click of a button so that you know not only what the alerts are, but who is trying to use them and where they are in the attack cycle of the environment.
So, we covered QRadar, and I didn’t cover Sentinel yet, so that’s my clue. Let me go over to Azure, and again, we can map all of our events into Azure. Here we add that extra information. I pulled off the tactics, we created a pie chart – very easy to do. Again, notice that the discovery of this simulated attack is the majority of the different alerts that were generated. From here, you can see a small amount of command and control or impairing the process control network. When you get into inhibit response functions, what they’re trying to do is mask your ability to control the safety of your system. So, this is one screen within Azure. I love this, because you can also look at the different tactics and what elements of the techniques they use. I can highlight remote system discovery, inhibit response functions, persistence. Did they do any firmware changes? Look at program downloads or system firmware. Initial access was from external remote components. Or if we look at this – this is an engineering workstation compromise, and when you get an engineering workstation compromise, that is what happened with Triton. We’ll talk about that in a minute as well. So, integration with all of your systems is really important. Again, we’ve got full details of granularity of what you want to see independent of the SIEM system that you select. So now we’ve covered Splunk, QRadar, Sentinel, and I gave you an overview of the CyberX advanced reporting dashboards. Again, the depth that you go indicates how critical and how close they are to getting into your industrial control systems to really impact your environment.
From here, let’s take a look at Triton. In the original breach, there’s a number of different components to it. Some people theorize they stole some OT credentials, they compromised the corporate IT network, they got into a workstation from here, they went through a firewall to actually compromise the engineering workstation where they wanted to deploy PC malware, and then once they got access to the engineering workstation, they were allowed to program the actual PLC. They did this with the TriStation Protocol, so in this particular case, they installed a RAT and their goal was to disable the safety system. Let’s take a little bit more of a deep dive into that level within the attack. Our Section 52 team did an awesome job at reverse engineering Triton a while ago, and there’s a web link that you guys can see: https://cyberx-labs.com/en/blog/triton-post-mortem-analysis-latest-ot-attack-framework/. I’d highly encourage you for the technical people on the call to go look at this. Essentially what they found was they added some information. So, when you install a remote access Trojan (RAT), you’re doing it because you want to stay involved and be able to control the components in the system. In this particular case, if there was no special identifier, then the system would run as normal – so execute your original code. However, if you look at the packet headers, what they did was they added a special opcode. Then if I see a special identifier as opposed to no identifier, then I get to control it even further. I can either read from memory, write the memory if I want to write something different and execute another function or do execution. So, in these are the different function codes that were associated with that special identifier. From here, they were able to disable the safety session. It’s interesting because when they were going through their own testing, they shut down the plant a couple of times before they actually got this working. But the goal is to disable the safety session and then launch a second attack. Now, if we look at this, this was a tremendously complex and long cycle of what happened, and as we map this to the MITRE framework, we can see all the four different areas of what happened. So, if we look at the initial access, the discovery, the collection – these are the different tactics. From detecting the program state – can I reprogram the PLC or not? Is it in operating mode or run mode? Give me the control device identification. I need to compromise the engineering right over here. That’s the key in order to do anything bad to the controllers.
Then you get remote system discovery. So, all of these components were used in the initial access or the discovery or the collection aspects. Then from the lateral movement, you see the exploitation of evasion. You have indication of removal on the host, on the engineering workstation itself, commonly used ports, and change operation modes. In order to stay in the environment for long time, they masqueraded their files, so they made it look like they were normal operating system files on the= actual engineering workstation and so forth. So, if you go into execution and persistence, they had to reload the firmware. They did this through scripting. There was a Python script that was used in the environment in order to push down the code. They did execution through API, program downloads, and they changed the program state.
The program state can be in run mode where you can’t make changes. They changed that, so you can reprogram. Lastly, you have the impact in the system firmware, authorized commands, and so forth. What I’d like to do is just look at these specifically – so if we look at remote system discovery and engineering workstation compromise, these are in the initial phases. And then masquerading and commonly used ports, change program and program download, modify control logic and system firmware.
So, let me jump back over here. This shows you of the attack that we simulated, those specific events within Sentinel. Here’s your engineering workstation compromise. You can see your modify control logic, program downloads, remote system discovery, masquerading information, unauthorized commands, reprogramming with system firmware, utilizing your change in the operating mode, understanding what the program state was in commonly used ports – so a lot of good information there.
What I’d like to do now is if you’re interested in more details, we’ve got a couple of great blogs on our website. I highly encourage you to go look at them for the technical details. This goes into our Section 52 and how they reverse engineered it, and then Carissa did an awesome job in mapping all of the MITRE framework within this blog, and you can see the timelines and things that are associated with that. The other component is there’s a lot of good information out there. From S4 was some great information, and then this Dark Reading link provided awesome details as well.
So, if we look at Triton and take a quick step back, there were actually a couple of incidents. The first incident was a couple of months earlier in June where the plant shut down for a week when the safety controller tripped. Everybody that tests code knows that eventually you got some bugs, and this was one of the bugs that was true, but the automation vendor concluded that it was a mechanical failure. It’s really hard to understand the intricacies of what’s going on. The second incident affected six safety controllers, not just two. It cost another week of shutdown and hundreds of millions of dollars in cleanup and downtime, not to mention the safety of the environment hazard that was going on. The incident response uncovered multiple red flags – misconfigured firewalls that the attackers moved from the IT network to the DMZ to the OT network, antivirus alerts that identified Mimikatz, which is a technique to steal your credentials. And when you steal the credentials, you’re living off the land. These are authorized credentials that allow you to move laterally, so keep that in mind. Ongoing alerts about RUN/PROGRAM key in an unsafe position, so those were ignored as well as the attacker uploaded malicious backdoor into the safety controller, and we showed you how they did that. Then suspicious RDP sessions to the plant’s engineering workstation in IT.
We live in the world of COVID. Everything is done remotely. Really be concerned about part of this. The true lessons learned from this is a lack of clear roles around who’s responsible for the security controls and how are they being implemented and if they are effective. This is a tough challenge. This is an organizational and political challenge within every customer. Is it IT, is it OT, is it the automation vendor? So, this is something that I’d highly encourage you to start to think about and debate internally.
When we talk about being proactive, what I mean is – MITRE is a post-event component, and all of the alerts and events have to be triggered in order for you to see that. This is an example of an attack vector within the CyberX system, and what we do is we collect all the information, all the assets that you see here, and we organize it and build for you an attack chain to tell you if somebody were to compromise your most valuable crown jewels, how they would go about doing it. In this particular case, you can see that an attacker came in from the internet. This is a simulation, so this is being proactive and saying somebody could come in from the internet to compromise your technician. From there, I know that there’s a known CVE that’s associated with my OSI PI system. That’s my historian, so I can compromise the historian. From there, there’s another CVE that can get me to my HMI. And then from there, I can finally get to my PLC. So, this is one way that you can use CyberX to simulate what attacks and what people could do just by the data that we’ve already gathered. You can be a little bit more proactive. This is what we’re trying to get to so that you can identify those issues before the attackers, and you can identify the specific sources or the specific targets. You can modify things. This gives you all of the different vectors that you’re concerned about and allows you to prioritize. Maybe I can break this one, because I can fix and remediate this CVE. Then this path would not be a valid path to get down into that PLC. This allows you to prioritize your mitigation.
So, to summarize some of these, never plug your laptop or USB into an OT network. Never share credentials with the third-party vendor. Create unauthorized connections between IT and OT. Install unauthorized wireless access points or dual home a PLC for convenience. Sometimes maybe you have the engineering workstation dual homed – one to the OT network and one out to the internet, because maybe you have to get a patch or something. So, take these to heart and try to create a company-wide culture of OT security.
Then early detection is critical. You have to assume that people are already in your environment. They’ve already got into that initial access. Start laying your landmines for lateral movement of command and control so you can get them before they get further down the cycle. RDP, SSH, Telnet – these are really critical with CyberX. These groups are automatically created for you, so when you highlight a group, you can easily see remote desktop or SSH or team viewer, for example. So just click a button and it automatically filters what the devices they’re using. From there, if you want to, you can also create specific rules for different components. Here is another rule that’s used for RDP access, and then as people start to move laterally, CyberX has a great visualization dashboard where you click a link and you identify specifically right here, how many devices this can communicate with. You can see here that is way too many for that to be normal in an OT environment. So, the other piece is to prioritize your ICS alerts. If you see ICS alerts that are later stages in your kill chain, really focus in on those, make those a priority. And what do I mean by that? In the event timeline, you can see things like PLC reads. If this is coming from an unauthorized device, take a look at it. We give you PCAPs and other information so you can get down into all the granular detail for an investigation that you need to deal with. If somebody does an unauthorized PLC upload or PLC mode change – these are critical as you saw within Triton, all the different event timelines of what you can see. Phil, I’d like to do is turn the session back over to you.
Thank you, Joe. That was awesome. We covered a lot of content in a short amount of time. I just want to wrap things up with a brief overview of how CyberX can help in terms of addressing the following challenges that we often hear from our clients.
Number one, what devices do I have? How are they communicating? And a big reason for asking that question is that it helps you very easily implement better segmentation and zero trust policies in your environment that you would otherwise have to build manually, looking at logs, and you can’t use scanners in these environments. So how do you figure out which devices you need isolate, how you build microsegmentation, asset discovery, and the profile that we provide is the best way to do that.
Secondly, what devices do I have that are missing patches, and what are the most critical patches that I need to apply? You saw that in the attack vector simulation that Joe was showing before, you can’t fix everything. So how do you prioritize addressing the risks and the missing patches in your environment? That’s risk and vulnerability management. Next, continuously monitoring the network. We have patented behavioral analytics, so we’re not looking for static signatures or IOCs. We’re looking at the behavior, as Joe showed in the vast majority of those tactics that he showed from the MITRE ATT&CK framework. And that’ll help you answer the question, do we have any threats in our network right now? Most organizations are typically not able to address this without a product like ours.
There’s a side benefit of the continuous monitoring, which is it helps your OT team quickly identify misconfigured or malfunctioning equipment and do a root cause analysis on it. That’s shown up in a number of situations with our clients where they had the plant network going up and down intermittently. They thought it was a security issue. The OT engineers took a look and they realized it was a misconfiguration of a device that was scanning the network when it shouldn’t have been. So that’s a side benefit. Finally, I think what Joe has shown very clearly is it’s all about unifying IT and OT security monitoring and governance, because you already have a SOC, you already have a number of people who’ve been trained, and you already have tools, workflows, and runbooks. There’s some adaptation that’s required for OT. For example, if you detect a new PLC logic download, your typical SOC person may not understand what that is. So, you need to do a little training and explain to them who they should call to find out if it was a legitimate download or a malicious download.
So, the key way of demonstrating that you have a unified approach to security and governance across both environments. Key elements of the architecture – we had a number of questions about this in the chat. It’s a passive monitoring, agentless approach to connect to a SPAN port or to a TAP. There’s no need to configure rules or signatures. The vast majority of our ICS environments are on-premise, but since we also monitor IoT devices that you might find on the IT side of the network, we also have a cloud-based version available. It uses what we call triple layer threat protection, which is comparing devices to a device profile database using the behavioral analytics. We have to identify unusual or unauthorized behavior, and then the system is also continuously fed by our in-house intelligence team called Section 52.
Finally, we’ve shown you that a key aspect is the out-of-the-box integration with your existing IT security stack. Joe didn’t show it, but we also integrate with SOAR systems, ticketing systems like ServiceNow, firewalls, and NACs, so you can implement automated response and protection, not just detection. We believe that automation integration is important. It’s important now more than ever with fewer people available to be on site, you need to be able to much more quickly identify and remediate threats and reduce the time and effort required to deploy to new sites as your organization becomes more aware of the risk. We can do that through our automated deployment. We also are glad to announce that we recently announced a partnership with Azure IoT Security to integrate our information with the Azure IoT Security platform.
I’d say our number one differentiator is the fast and easy deployment – the fact that you can out-of-the-box plug it into your SPAN port or TAPs and within minutes, you’re seeing information about assets, you’re seeing information about the network topology, and you’re seeing information about any risks you might have in the environment. As I mentioned before, we have a unified platform that addresses both IoT and OT security. I know people throw around these terms a lot. IoT is typically not considered part of OT, although you might ask the question, what about my security cameras? What about my building management systems? Those all have PLCs. So, it’s a mixed world, and what we believe is that people want to manage all of these with a single unified framework.
I want to direct you to our resources page on the website, where you’ll find a very detailed guide to MITRE ATT&CK for ICS. As we talked about today, we’ve also been holding a series of roundtables with CISOs and senior security folks, with companies including Baker Hughes, Mundipharma, ONE Gas, First Quality, Vector (which is a leading electric energy company in New Zealand), Ports of Auckland (which is a major transportation port). We’ve recorded all of these webinars, and they’re much more about the organizational challenges of bringing IT and OT together and building a unified approach, but then we do go through the technology as well. Those are all on our website in recorded form with transcripts if you don’t feel like listening to the whole thing. You’ll also find other information there – how to accelerate network segmentation, a full eBook on NIST recommendations, etc.
I also want to direct you to some upcoming webinars. On June 11th, we’re doing a webinar with InfraGard featuring the Metropolitan Water District of Southern California. They are the largest water utility in the country, serving over 19 million people, and that environment includes not just the water treatment plants, of which they have four of the top 10 in the US, but also a number of hydroelectric facilities that they operate as well. On June 12th, we’re doing a SANS webinar with Jacobs Engineering, very well known in the OT world. We’re going to talk about OT security for pharmaceutical, water, and smart buildings. Then on June 19th, we have some senior security folks from Novartis that are going to talk about practical approaches to zero trust for IoT and OT.
I want to thank you for your time, and I’m going to take a quick look now at the Q&As. There’s a question about how is it deployed? So, the system is deployed as virtual or physical appliances, which are connected directly to a SPAN port or a TAP in your ICS network. Then you have typically many of these sensors distributed across different facilities, all managed through a central management console. The console is used to manage the sensors themselves, for example, by deploying updates or configuration changes or authorizing different roles on the sensors. But the central manager is also used to provide a unified view of risk across all your facilities, and you can view this risk and these alerts coming from the sensors in various ways. You can monitor them locally. Of course, you can monitor them at a business unit level or at a geographical level, and you can monitor them centrally across all of your facilities, whether they’re global or across the country.
So, an important part of the solution is that it’s highly scalable. Some of our customers include three of the top 10 US energy utilities, three of the top 10 global pharma companies. In addition to those, we’re installed in chemical companies, oil and gas, and healthcare.
One last point to add is that we preconfigure the appliances. So, we’ve been shipping and installing our sensors remotely for years now, and especially when COVID hit, we can’t go on site. So easy deployment is definitely part of our strengths.
Thanks for bringing that up, Joe. Actually, as a result of COVID, we’ve had a number of clients get in touch with us because they want to accelerate the timeline of deploying the plants that they had not yet deployed to. And as Joe said, we can ship an appliance and don’t actually need to send anybody into your building. The appliance is preconfigured, we hand it to someone in your building, they bring it in, they plug it into the SPAN port, and it all starts working.
I want to thank everyone for their time today. If you’re in the United States, I want to wish you a great long weekend. Feel free to get in touch with any of us for further information. We’ve done our best to pack a lot of information into a small amount of time, but I’m sure there’s a lot more we can tell you and we can arrange presentations like this individually for your teams. Thanks everyone and have a great day. Back to you, Carol.
Thank you so much Phil and Joe for your great presentation and to CyberX for sponsoring this webcast which helps bring this content to the SANS community. To our audience, we greatly appreciate you listening in. For a schedule of all upcoming and archived SANS webcasts, including this one, please visit sans.org/webcasts. Until next time, take care and we hope to have you back again for the next SANS webcast.