WannaCry, NotPetya, and TRITON demonstrate that ICS and IIoT networks continue to be soft targets for cyberattacks, increasing the risk of costly downtime, safety failures, environmental incidents, and theft of sensitive intellectual property.

NIST and the NCCoE recently published a NIST Interagency Report (NISTIR) demonstrating how off-the-shelf, ICS-aware behavioral anomaly detection (BAD) effectively reduces cyber risk for manufacturing organizations, without impacting OT networks, as well as risk from equipment malfunctions.

The report was the product of a close collaboration between NCCoE, CyberX, and other technology providers such as OSIsoft.

In this joint webinar with NIST and CyberX, you’ll learn about:

  • Mapping the security characteristics of BAD to the NIST CSF
  • Using NIST’s reference architecture for your own ICS & IIoT environment
  • How CyberX detected 15 examples of high-risk anomalies in NIST’s testbed environment, including unauthorized devices; unauthorized remote access; plain-text credentials; network scans using ICS protocols; and unauthorized PLC logic downloads

We’ll also discuss how CyberX’s agentless platform helps you:

  • Auto-discover your ICS & IIoT assets, protocols, and network topology
  • Identify critical OT vulnerabilities and risks
  • Prioritize risk mitigation for your most valuable processes (crown jewels)
  • Enable rapid ICS threat detection, response, threat hunting, and prevention
  • Implement converged IT/OT security in your corporate SOC via certified apps for IBM QRadar, Splunk, ServiceNow, Palo Alto Networks, and other integrations with your security stack

Download the NIST Presentation

Download the CyberX Presentation

 

Webinar Transcript

Phil Neray:

Thank you very much, Michael. Kudos to the NIST NCCoE team for putting together this project. I think raising visibility for the need for stronger security controls is an important mission, and you guys are doing an awesome job. Thank you.

My name is Phil Neray. I am the Vice President of Industrial Cybersecurity at CyberX. What I’m going to do today is talk a bit about our company, our technology, and how specifically our platform was used to identify anomalies in the NIST testbed environment. I’m going to give you some examples of that. But first, I’m going to start with a little context just about who we are.

We were founded in 2013. We’re based in Boston. We’re the only firm in our space, ICS security space, that has a patent for its threatware analytics. Our customers tell us that we have a simple and mature solution that interoperates with what they already have in their environments, both on the IT side with respect to the diversity of automation vendors and then their security operations center.

The part about the patent is important because that’s really what we’re talking today. We’re talking about behavioral anomaly detection, which requires new and more sophisticated form of analytics than we’ve used in the past with, for example, signature-based solutions or solutions that require configuration of rules. We’re going to talk about that in a few minutes.

The part about partnering with industry partners is very important because we believe, and other experts in the field like Gartner agree, that building a unified approach to IT and OT security monitoring and governance is important for several reasons. Number one, scarce resources, being able to use them across both IT and OT. Number two, most of the attacks we’ve seen cross from IT to OT and sometimes the other way around. Having a unified approach helps you more easily spot and investigate those attacks.

Finally, the companies that are implementing this type of technology, the large manufacturing and companies in other industries like oil and gas, pharmaceuticals, and energy are looking for a single point of accountability for all of their digital risk. That includes both IT and OT cyber risk.

Here are some examples of the partners that we have been working with. We have the first and only native certified applications for Splunk, the first and only certified native applications for ServiceNow, for the Palo Alto Networks application framework, and we had the first certified native app for IBM QRadar. Again, you can see that our commitment to helping you unify IT and OT security monitoring in the SOC has been going on for a while. We’ve been dedicating resources to it. It’s not just a marketing campaign.

In terms of the challenges we address, there are three core challenges combined with the unified monitoring that I was just talking about. Number one, what devices do I have? How are they connected? How are they communicating with each other?

Most of you know that, over the years, not all organizations have been doing a great job of tracking what devices they have, what manufacturers, what protocols they’re using, what vulnerabilities they have. One of the first things that the security operations center needs to know is what devices are out there? How are they connected so that we can do a better job of investigating when we get an alert?

The risk and vulnerability management part is important because everyone knows there are vulnerabilities in this environment. Some of the devices are 20 years old, some of them are difficult to patch. Many are still running Windows XP. How do you prioritize the most important vulnerabilities to protect your crown jewel assets, because you can’t fix everything at the same time?

Then the continuous monitoring part is relevant to the behavioral anomaly detection covered in the NIST report. How do we continuously monitor all network activity so we can quickly identify a breach when it first occurs and quickly respond to it and mitigate its effect before it can get destructive or disruptive?

In addition to providing our platform for asset management and risk management, we also were the first company to create an in-house ICS threat intelligence team. We’ve reported nearly a dozen zero-day vulnerabilities across all of the industrial automation vendors you see there at the bottom.

Our team is very familiar with these devices across multiple different vendors, across both process control and the discrete manufacturing, and across different geographies worldwide. This intelligence is delivered to our platform through software updates, but also provides very deep information that we used to identify what might look like anomalous activity in our platform. The threat intelligence enriches the built-in analytics that we were talking about a few minutes ago.

In terms of deployment, the system is non-invasive and agentless, so very easy to deploy. It works by monitoring SPAN port traffic. It’s passive monitoring. We use deep packet inspection and network traffic analysis to identify the assets and the threats. Once you plug it in, that’s all you need to do. You don’t need to sit down and write rules or download signatures. It’s based on a behavioral analytics approach that does not require doing any of that. It’s based on the embedded knowledge that’s in the system.

In terms of the app platform architecture, we talked about the key use cases up at the top, and there’s a REST API and, as I mentioned, the number of native apps for most commonly used tools you would have in your security stack. The key, though, is this middle section here where we have five distinct analytics engines to detect anomalies, not just simply relying on baseline deviations, although that’s part of our patent is the way we look for baseline deviations in a way that minimizes false positives and false negatives.

But several other mechanisms that’ll show up in the examples we’re going to see in a few minutes, one of them being protocol violations. If an attacker is misusing a protocol in the way that it was not designed to be used by the designer of the protocol, the vendor that built the industrial protocol, that could indicate an attempt to compromise a device or the protocol itself using a vulnerability. There might be function codes or values that were left undefined that the attacker is now using.

We’re going to see that in an example in a few minutes. That would be an example of an analytics engine that detects anomalies without relying solely on deviations from the baseline.

We also have an engine that looks for operational issues. This is more relevant to the folks in the plat than to the folks in the security operations center. This would be indications of unusual activity that could indicate malfunctioning equipment or misconfigured equipment, and we’ve found in our customer base that the data that we collect makes it very easy for the folks in the plant to troubleshoot those issues and identify the root cause.

At the lowest level, of course, is that deep embedded knowledge of all the devices and protocols. Our threat intelligence, which I mentioned before, and the unique-in-the-industry capability called the malware analysis sandbox, which allows you as a customer to upload suspicious files that you think might be ICS malware, have it analyzed in our ICS-specific environment, and then the system returns with IOCs.

This is to address the fact that there are lots of sandboxes out there, but they were designed for IT, not OT. If a piece of software is communicating on a given port that’s unique to OT or using a DLL that’s unique to OT, the IT sandboxes won’t really know what to do with that. That’s why we’ve built their own malware analysis sandbox.

Of course, central management console for getting a unified view of risk across all of your facilities worldwide and all of your business units with role-based access control to give different groups different access to different parts of the organization. As I said before, the analytics engines are the key part that we’re going to talk about here relative to the NIST study.

Why is behavioral anomaly detection important nowadays? This is a report that came out recently from CrowdStrike. What they’re reporting is that the number of attacks that do not rely on malware is growing. It’s up to 40%. In fact, if you look at certain geographies like North America, almost 50% of the attacks do not rely on malware. It’s also a pretty high number for Europe and Indo-Pacific Region.

That means that attackers are getting smarter. They don’t need to worry about signature-based solutions in the environment because they’re not using malware that can be identified through signatures or common IOCs. You can think of it as zero-day malware, but you can also think of it as people using stolen credentials, diskless attacks like using PowerShell, or even more recently we’ve heard both from Cisco and then just recently in the last couple of days about compromises of routers that don’t use malware that would easily be detected by signature-based solutions.

It’s important to look at behavioral-based anomalies because that’s what you need to identify malware that’s never been seen before or targeted attacks. There was a great quote in this piece from CrowdStrike saying the important question isn’t how quickly can you prevent the initial compromise. It may be impossible to prevent a determent attack or from getting into your environment. What’s really required is to be able to very quickly detect that an attacker’s in your environment, and then investigate and remediate or contain the threat. That’s how we help our customers in the ICS space specifically.

The other reason that we need to look at behavioral anomaly detection is that these environments, as you probably know, are insecure by design. There are many vulnerabilities. This is a report you can download from the URL at the bottom left that looks at network traffic data that we collected from over 850 production ICS networks across all of the industrial sectors and across multiple continents worldwide.

You can see some of these things may not be surprising to you, the fact that nearly half of the sites we analyzed are still running unsupported versions of Windows like XP, making them very susceptible to attacks. Nearly two-thirds are still using plain text passwords on their network, which means when an attacker gets in, they can very quickly sniff the traffic and monitor for plain text passwords that they can then use to get into other systems.

Also, this idea of the air gap that some, less now than a few years ago, believe is the only protection they need. The air gap has long been believed to be mythical. What we found is that nearly half of the sites we analyzed had internet connections, either because they didn’t know about them or because there were multiple connections between IT and OT subnets that were not being monitored or segmented, certainly very flat networks. We found a lot of those across both IT and OT.

Then some cases, it might just be a control engineer needs access to the internet when they’re in the OT network to be able to look up a vendor, industrial automation vendors, documentation, and maybe they put in a dual-homed ethernet card in their device, in their laptop to be able to do that. These are other reasons why it’s easier and easier for attackers to break into these environments.

I’m going to talk about TRITON. This is an attack that many folks have talked about, and you may be thinking, “Why are you talking about it again?” I’m going to talk about it for two reasons. One, there was some new information released in January at the S4 Conference that showed some interesting details about this attack that we didn’t know about before. Number two, many of the tactics that the attackers used showed up in the list of anomaly scenarios that NIST put together. NIST put this list together before we knew anything about TRITON, so it’s interesting.

For those of you who are not familiar with TRITON, it was an attack launched on a petrochemical facility. What was unusual about it is the attackers went after the safety controllers in the plant. They compromised those safety controllers. The idea was that they were going to disable the safety controllers then cause something else to happen that would blow up the plant, cause loss of life, potentially environmental damage because the safety controllers would no longer be able to shut down the plant, which is their job if the temperature, for example, or the pressure goes above certain thresholds. That’s what was unusual and caught the attention of many companies, not just in the petrochemical industry but in all industries.

I’m just going to go through and show you a quick description of the kill chain. We were guessing about some of the elements of this kill chain when we put it together shortly after the attack occurred. But it turns out we were right about some of this.

One of the things we’re not sure about exactly is how they got into the network. We’re going to guess that they stole OT credentials. This is probably the most common approach. It was the one mentioned in the FBI-DHS report last March about Russian threat actors being in our critical infrastructure.

But by using those stolen credentials, they were able to go through the firewall between OT and IT and through the DMZ and install malware on one of the Windows-based devices in the OT environment. We don’t know which one. We’re going to guess it was the engineering workstation. It could have been the HMI. This was Python-based code that had been compiled to run on Windows.

Here’s where the interesting part comes. This code was purpose-built to communicate directly with the safety PLC using its native ICS protocol, which is the TriStation protocol. These attackers knew enough about this environment that they knew those were the devices in the environment. They were Schneider Electric Triconex Safety Controllers. They were probably doing reconnaissance in that environment for months before … The way we discovered that this attack was occurring was they made a mistake in the code and the safety controller shut down the plant for a week. We’re going to see some information about that in a second.

The idea was to take over these controllers. The attackers actually installed a remote access Trojan into the firmware memory region of the PLC. They knew enough about the memory layout of that device to be able to insert their backdoor without disturbing its normal operation. Very sophisticated attack. The idea was that they would then disable the safety controller and launch a second cyberattack causing loss of life, certainly asset damage, and potentially environmental damage.

A behavioral anomaly detection system like ours would have detected strange things going on multiple times during this kill chain. Number one, the remote access connection from IT to OT. Number two, during the reconnaissance phase, the attackers were likely doing address scans and port scans. Number three, they uploaded new ladder logic code into the PLC as a precursor to putting their backdoor into the firmware. That would also be detected.

Then, finally, the fact that they were using the Triconex protocol in a way that wasn’t intended to be used would also have been flagged as a protocol violation. That’s one of the analytics sensors I’ve mentioned before. I’m going to show you exactly what they did. It was very sophisticated.

We reverse-engineered this malware early last year. You can see a full report on our blog at the URL below. We also posted some Snort signatures for the attack. But, of course, Snort signatures would only help you if the attackers used the exact same code. I think a behavioral approach is much better.

But if you look at what they did to get into the PLC, they used a standard command that’s part of the Triconex protocol called GetMPStatus. It’s not used that often. But this command has a special identifier, or they inserted a special identifier such that if there was no identifier, the command proceeded into the PLC as it normally would, and the original code that was intended to be executed in the PLC got executed.

However, if that field in the command had an ff in it, the code that they put into the PLC would then check to see if it had an ff. If it did, it was a way for them to communicate with their backdoor. If the op code associated with that special identifier was a 17, it meant we’re reading the memory in our backdoor. If it was a 41, it was a write memory. If it was a f9, it was the execute code.

As you can see, this is very sophisticated. They built their own modification of the native Triconex protocol so that they could communicate with their backdoor. All of these things, as we’re going to see shortly, would have triggered alerts from a behavioral anomaly detection system.

Some new information came out in January at the S4 ’19 Conference that was interesting. Number one, there were actually two incidents, not one. There was another incident a couple of months before where the plant was shut down when the safety controller was tripped. The plant folks called in their automation vendor, which concluded that it was a mechanical failure. They missed a really key opportunity here to identify a cyberattack.

This is another reason why security teams from the SOC need to have visibility into what’s going on in the plant because they can provide additional insights when these incidents occur. It may be a mechanical issue, but it may not. You want to catch that early on.

The other fact that came out of this presentation at S4 is that the second incident actually affected six safety controllers, not just two. This is when they shut down the plant the second time. You can think of hundreds of million dollars of downtime and cleanup just from these accidental shutdown, not to mention the danger from some toxic gases.

Then, finally, in the incident response, there were multiple red flags that were ignored. The firewalls were misconfigured. That allowed the attackers to very easily move from IT to the DMZ to the OT network. There were AB alerts on the workstations that were being ignored that would have told security teams that maybe cached credential stealing malware was being used.

There is something called the RUN/PROGRAM key on the safety controllers. You’re supposed to leave it in the run key most of the time and put it into program key only when you’re updating ladder logic code, which is what the attackers did. They uploaded ladder logic code.

Anecdotal evidence suggest that most organizations just leave it in the program key all the time because it’s a pain and it may have actually been dangerous to go out to the plant floor, or in this case a petrochemical plant, to switch the position of the key.

Then, finally, suspicious RDP sessions from the IT network to the plant’s engineering workstations. Again, an example of remote access from the IT network to the OT network that could indicate something suspicious and unauthorized going on. I’ll point you also to the three articles at the bottom there: Dark Reading, CyberScoop, and E&E News. Excellent articles reporting on this talk that was at S4 ’19 with more details.

True lesson, though, was not so much technical as organizational, which was that there were clearly missing definitions about whose role it was to ensure that controls had been properly implemented and that were actually effective. Was it IT? Was it OT? Was it the integrator that built the plant? Was it the automation vendor that supplied the equipment? There was clearly a miss here from an organizational point of view that might have prevented this attack early on.

Now let me move on to the anomaly scenarios in the NIST report. There’s 15 of them. I’m not going to go through every single one of them. I just wanted to show you the full list and I’m going to pick five or six to give you an example, starting with really easy ones like unauthorized devices connecting to the network and the use of plain text credentials, going to the reconnaissance phase with the scan, remote access session, in this case through SSH, data exfiltration, logic download, and the protocol violation example I was talking about before.

Then all the ones below there, we’re going to leave you. You can go read them in the report. Denial of service attacks, brute-force password attacks. These are also in the report.

The way alerts show up in our system is in two ways. One is in this timeline another is in an alert dashboard. You can bounce back and forth between them. The idea here with the event timeline is this gives you a good way to go back and investigate the incident and look at what else is going on at the time.

These aren’t all alerts. Some of them might be a notice. A programming update to a PLC might be a legitimate update, but you want to know about it and you want to have a workflow for knowing who to call in the plant, which automation engineer to call to find out if this was legitimate or not, or if it happens on a regular basis at a certain time.

You can see the variety of alerts that are identified by our system. It’s really hundreds of different scenarios, behavioral scenarios that our system identifies, including some of the ones we’ve been talking about today like remote access connections and things like that.

The first one was unauthorized devices connected to the network. This is a classic anomaly that all of our customers want to know about. Did a contractor plug their laptop into the network? Did an employee plug their laptop into the network? Did somebody at the plant level buy a new device and put it into the network without telling anybody? You want to know about that right away and you want to have a workflow for executing it.

For all of these, at the bottom, I’ll be showing some of the text that I pulled right out of the NIST report explaining either the severity or the importance of the anomaly or how it was simulated in the testbed environment.

The second one is use of unencrypted credentials. In this case, it was using HTTP instead of HTTPS to get access, in this case, to a directory. It was immediately identified as something that we should know about. Polaris in the testbed environment was the engineering workstation that was part of the standard simulated environment.

The next one is an address scan. Both address scans and port scans are important, as NIST describes at the bottom. This is a way for attackers to locate vulnerable services, vulnerable devices in the reconnaissance phase. The reconnaissance phase can last for months. I mean in the case of the Ukraine grid attacks, we know that guys were in there for a long time as well. Here you can see that the system immediately detected that address scan, and there’d be a similar alert for a port scan.

Next one is remote access section. In the TRITON case, they used RDP. In this case, it was SSH. This was an example of someone using an SSH connection from the engineering workstation to a server.

Next one is data exfiltration to the internet. The attackers need to have some way of communicating with their software that’s installed in your environment. They’re going to find some way to do it. In the case of Industroyer, which was the Ukraine grid attack, they used Tor as a way to do it.

This scenario was testing the ability to detect someone trying to exfiltrate data using the DNS protocol. It’s a good way to do it using DNS because no one really monitors DNS for unusual activity. This would be a way to exfiltrate data, for example, about what devices are in the environment, what firmware levels they have, and things like that.

Then this one, again, if you think about our TRITON example, they downloaded or uploaded, however you want to say it, new ladder logic code into that safety controller. Then they used that code to insert their backdoor into the firmware memory region. In this case, you can see we were using the ethernet IP protocol. At the bottom there, it’s explaining how it’s simulated using the Allen-Bradley software.

There’s a nice note there about physical access was required in order to change the operation mode from run to remote run. That’s that key I was talking about before. I know I said before often that key is just simply left in remote run.

Then, finally, the protocol violation, similar to what the TRITON attackers did in usurping the Triconex protocol here, we’re using a similar example with Modbus where a function code called 49 was used with Modbus, which is not allowed. You can see also at the bottom there’s a notification the PCAP file exists. In many of these alerts, we’ll capture the full PCAP file so that you can do a deeper dive if required.

Just to give you an idea of what the alert flow would be in a typical scenario. You get some alerts. Again, they can come from the alert dashboard. In the case of a firmware update, you can click on that alert and get some more information about which devices were involved. You can then click on those devices and see where they are in the map.

This is a full map arranged by the Purdue model of the devices that we discovered, auto-discovered, in your environment. This is part of the asset discovery phase. But in terms of the incident response phase, once you click on that alert, it’ll tell you, yeah, this was the device in question.

You can see how it’s connected to the other devices. Then you click on that device and it actually says here are the devices that it regularly talks to, in case the engineering workstation and the HMI, which makes sense.

You can then click further and get details on each of these devices, who makes the device, what operating system it’s running in the case of Windows device, what protocols is it using, what IP addresses. In this case, you can see there’s two Mac addresses for this device. Then if you want to find out about the PLC itself, it turns out it’s an Emerson device speaking Delta V.

You can get even further information to find out about the serial number, the version that it’s running, is it an authorized device—in other words, is it a known device to the organization? Does it regularly scan, which in this case, the controller would not, but if it was more of a SCADA type device, it might scan as a standard part of what it does. Then, again, you can see down there some more information about IP address and Mac address.

Very detailed information, again, to give visibility to the SOC team so they can better understand what’s going on. It’s another reason why IT security tools aren’t really suited for this environment. They don’t know what these protocols are. They can’t do network traffic analysis on these protocols to understand their payloads. They certainly don’t know anything about these types of devices, especially the embedded devices.

I’m going to wrap up with a description of part of the methodology that Idaho National Labs has defined. There’s actually a full SANS webinar on this that you can go find on our website. All of the SANS webinars we’ve done are all on our website with a transcript. If you don’t feel like watching the recording, you can just scan the transcript and look at the slides.

This was a webinar we did with Andy Bochman at INL a few months ago. A great quote, “If you’re in critical infrastructure, you should plan on: you are going to be targeted.” You will be targeted, basically. If you’re targeted, you will be compromised. It’s that simple.

It’s very similar to what we heard at the beginning of the presentation from the folks at Cylance, which is: it’s not how do you avoid being compromised, how do you quickly identify if you have?

I’m going to focus on one of the four steps here. The method here is identify your crown jewel processes. In other words, the one who’s compromised would cause a major issue to your firm, either a major environmental issue, safety issue, revenue issue, a major production line goes down.

Map the digital terrain, which is essentially build a topology map similar to what I was showing you before of all your devices and how they’re connected. Illuminate the likely attack paths. That’s what I’m going to show you about. Then generate options for mitigation and protection.

In our system, when you go to the map, you can pick a device and you can say, “You know what? This is my important device.” You can either mark it as important or say, “Simulate the attack vectors to this device,” which is basically telling the system, knowing what you know about the topology of the network and the vulnerabilities of the network, “Can you tell me all the different ways, rank by risk, that an attacker would use to compromise this PLC that I’m showing you here?”

The system then goes through and says, “I found three different ways to get to your device, this PLC number 11. Here they are ranked by risk.” Then draws a picture of exactly what that attack vector or kill chain looks like.

This is also a very helpful way of explaining to your OT folks or to your line of business folks who may not be so technical to show them the risk to this PLC. In this case, there was a subnet exposed to the internet. There was no segmentation between that subnet and the other subnets. Then there were a bunch of vulnerabilities.

You can then go through and you can simulate how you would mitigate this threat, either by patching those devices or by implementing better segmentation across those different levels. We call this automated threat modeling. It’s unique in the industry. Again, it’s a way to prioritize how you address the vulnerabilities in your OT environment.

One of the ways to mitigate these types of threats is by integrating with the firewalls or NAC systems in your environment. In this case, we’ve done this with Palo Alto Networks, but we’re also doing it with other vendors such as Cisco and CheckPoint.

The idea here is to pick a few key use cases shown in the middle there, that when they are detected by our platform, you want to very quickly create firewall rules to block the source of the malicious traffic. Working with our customers, we picked five use cases, many of which we’ve talked about in this webinar, including PLC stop commands, which can break production, obviously.

What this allows you to do is very quickly our platform will generate new firewall policies. Admin is still in the loop. Admin still needs to approve the policy and deploy them to all the firewalls in the environment, the relevant firewalls. But it’s a very quick way of streamlining that time between detecting a threat and preventing the threat.

The other thing we’ve done in integrating with Palo Alto Networks is something we called granular network segmentation. The idea here is by monitoring the traffic, we can identify all the devices, what protocols they are using, which ones are authorized and unauthorized. You can use this as a way to create segmentation policies in your Palo Alto Network environment.

The idea is many organizations are now looking at how they can better segment their OT networks, but in order to segment them, you need to know which devices you have and how they communicate with each other so you can make sure that the right devices are communicating with each other and the ones that shouldn’t be communicating with each other aren’t.

That’s the second aspect of the integration we’ve done here, something that Palo Alto calls dynamic access groups or DAGs, which is a way to create policies that are based on groups that can change dynamically, like, for example, all the Modbus devices should be able to talk to each other, but a Modbus device shouldn’t be able to talk to a Emerson device.

The third level of integration is something that Palo Alto has developed called their app framework, which was recently renamed the Cortex. This is a way of analyzing data that’s already being collected by the Palo Alto appliances you already have. If you already have Palo Alto appliances in the right spots in your network, what this allows us to do is to take the traffic from those devices and do much of the analysis I’ve talked about here today.

You can download this app from the Palo Alto Networks marketplace. Again, we’re the first and so far the only vendor to have done this with the app framework.

A quick note on our customers. Two of the top five US energy utilities in the US, names you would recognize but also some of the largest companies in pharmaceuticals, chemicals, pipelines, gas distribution not just in the US, also across Europe and Asia Pacific region. I’m going to give you an example of one of them.

This is First Quality. They are specialty paper manufacturer based in the US with plants across the US. You can see here the benefit for them is the visibility that we give the SOC into their OT environment, so they can apply unified security monitoring and governance.

Specifically, they’re now monitoring over 8,000 devices. They’re using our centralized management system to give them global visibility at the SOC level. They’ve integrated our platform with their existing SIEM, in this case, it’s QRadar, their orchestration system, in their case it’s Siemplify, and their PAN, Palo Alto Networks, infrastructure.

In summary, what our clients tell us is we have the most mature and more interoperable solution to help you at multiple levels. At the strategic level, reduce risk. At the tactical level, gain visibility, allow you to prioritize your mitigations, and to be able to detect and respond to threats quickly. At an operational level, to very quickly integrate with your existing environments both at the OT level in the plant and also in your SOC in a way that’s non-intrusive and agentless.

We have some time remaining, so I’m going to look at the questions in a second. But for more information, I’ll direct you to our resources page where you’ll find a complete knowledge base consisting of our threat and vulnerability research presentations we’ve done at Black Hat, the Global ICS and IOT Risk Report I referred to earlier—an eBook on how to present OT risk to the board, and an executive guide to NISD, which is the first compliance directive directed towards critical infrastructure environments that came out of the EU. Then a bunch of events there where you can see us in the next few months.

You can also download two chapters from Hacking ICS Exposed, which is the Bible in the industry for understanding the differences between IT and OT security and a great way to get started if you’re new to the field. That’s it for now. Let me take a quick look at the questions.

Question: “Can CyberX behavioral anomaly detection be used to monitor and detect unusual activity in other aspects, not just in manufacturing plants, but also in building management systems?” The answer is yes. We are being used for that. It is part of our capability. The protocols are a little different in those environments, but we’ve reverse-engineered them as well. We understand those devices.

It’s particularly relevant to, for example, data centers. If you’re an organization that relies on your data center, like in financial services or if you’re a cloud provider. Also if you’re in the transportation industry, running airports, for example, they have building management systems, they have power management systems, and they have all the other control systems that we’ve talked about before for their automated baggage handling and various other things. So yes, definitely relevant to those other areas.

A question about NIST 800-53 controls. We do have some information about how our platform will help you address that. We also have some information about how the various aspects of our platform address multiple dimensions of the NIST CSF. The slide that folks from NIST presented only looked at the detect portion, but we will also handle the other portions as well, either directly or through integration with other products.

Both the identify part, which would be relevant to asset discovery. The prevent part, which would be related to vulnerability management, threat modeling that I talked about, and integration with firewalls. The detect part, we talked about. The respond part, relative to incident response and forensics and integration with your SIEM. Then the recovery part, relevant to how you automate recovery activities and how you automate reporting to all of your stakeholders.

Question about the pricing for these systems. Typically, they’re priced per site. Usually our customers are using one appliance per site, so it’s priced per appliance.

The appliance are the standard Dell servers, so there’s nothing special. We’re not a hardware company, we’re a software company. But it’s a way to license the technology that I’ve been talking to you about on a per site basis based on the number of devices you have at certain breakpoints. Obviously, as many of our customers are, if you’ve got tens or hundreds of facilities worldwide, there’s a sliding scale of pricing based on that.

Somebody asked, “Can you monitor multiple sites?” The central manager that I talked about at the beginning is the way we do that. Again, there’s role-based access control, so that you can get a unified view across all of your sites.

One nice thing about our central management also is that you can set up logical zones to monitor. You might have devices being monitored on one part of the plant by one appliance and another part of the plant by another appliance, but you want to think of them as a single logical zone. For example, a given production line system has the mechanism for defining logical zones.

This was relevant to the question one of the gentlemen asked about can you manage multiple sites from a single console? That’s what our central management system does.

Here’s a question: “Do you work with MSSPs?” Absolutely. Many organizations don’t have the manpower or resources to do this on their own. We work with different MSSPs, IBM Security being one, DXC Technology being another, Wipro being another. We really work with whoever you prefer as an MSSP.

We also offer our own expert ICS guys as a backup to your own team or to your MSSP team. If a level three incident response is required, we have folks who handle incident response for their country, protecting their critical national infrastructure. These are guys who dealt with nation, state threats day in and day out, who can then come on site and help with incident response either to help your own team or to help the MSSPs.

I think I’m going to wrap it up here. Thank you for your time today. Thank you, Mike and Jim and Tim from NIST-NCCoE, A) for creating this project, and B) for helping us out with the webinar today. I encourage you to, if you have any other questions, send them to [email protected]

Michael Powell:

Good afternoon. My name is Michael Powell. I’m the manufacturing sector lead here at NIST, NCCoE. Today I’m going to discuss NIST recommendations for security and manufacturing control systems using behavioral anomaly detection.

A quick overview of our agenda. I will discuss the NCCC overview, cyber risks to manufacturing organizations, why stronger ICS cybersecurity is needed, benefits of behavioral anomaly detection, NIST testbeds that were used for the build, the process control system and the robotics. I will touch on cybersecurity framework mapping for this field.

The NCCoE assembles experts from businesses to industries such as CyberX, which did a great job of helping us with this field, academia, and other government agencies to work on critical national problems in cybersecurity. This collaboration is essential to exploring the widest range of concepts.

Our engagement and business model. First, define the problem. We define the scope of work with industry to solve a pressing cybersecurity need. Next, assemble. We assemble the team of industry, government agencies, and academia institutions to address all aspects of the cybersecurity challenge.

Next, we build. We come together in a lab to build a solution and document our findings in our NIST documents, which could be a practice guide or a NISTIR. The build is practical, usable, repeatable implementation to address the cybersecurity challenge. Next, we advocate our implementation, advocate the adoption of the example by going to lectures and speaking with fellow colleagues to have them implement the guide.

We currently have two manufacturing projects right now. I will briefly discuss them later. I want to point out the email to engage with us on our projects and to collaborate with us. I will briefly discuss at a high level this NISTIR behavioral anomaly detection, securing manufacturing industrial control systems.

Right now, as I said, we are in the advocate stage. The goal is to provide a cybersecurity example solutions that businesses can implement or use to strengthen their cybersecurity and their manufacturing processes using behavioral anomaly detection tools. This document is in its final stages and should be released March of this year. You can download a draft right now on the website as shown on the slide.

For its use case, we use two distinct but related manufacturing demo environments: the collaborative robotics systems and a simulated processing system, which Phil from CyberX will discuss a little bit more once he talks about the use cases. The behavioral anomaly capability is achieved by three different detection methods: network-based, which was used by CyberX, agent-based, and historian-based.

Our goal was to make sure we are addressing cybersecurity risk to manufacturing organizations. Behavioral anomaly detection mechanisms support a multifaceted approach to detecting cybersecurity attacks against ICS devices on which manufacturing process depend on, in order to permit the mitigation of those attacks. Introducing anomalous data into a manufacturing process can disrupt operations, whether deliberately or inadvertently. More sophisticated hacking tools and techniques are readily available for download from the internet. The growing cyber-dependency makes it critical for attacks harder to stop.

The NISTIR is intended to help organizations accomplish their goals by using anomaly detection tools. As you can see, there are many different such, but I will only discuss two: detect cyber incidents and time to permit effective response and recovery, and to reduce opportunities for disruptive cyber incidents by providing real-time monitoring and anomaly detection alerts using these tools.

The environments that were used, the process and control system, emulates an industrial continuous manufacturing system that manufacture process to produce or process materials continuously, whether materials are continuously moving, going to chemical reactions, or undergoing mechanical or thermal treatment.

Next, the collaborative robotics system, which consists of four distinct machining stations, two machine-tending robots, and a supervisory PLC, Modbus TCP. It also had HMI and several servers for executing. I like the little video here that shows them actually working in action.

We map the behavioral anomaly detection capabilities to NIST cybersecurity framework. This mapping allows manufacturers to align cybersecurity activities with businesses’ requirements, risk tolerances, and resources. The profile provides a manufacturing sector specific approach to cybersecurity from standards, guidelines, and industry best practices.

Finally, I wanted to talk our new project description that we have coming up, which hopefully should be out next month for public comments. The name of the project is Protecting Information System Integrity in Manufacturing Environments.

This project explores methods one could deploy to help prevent, mitigate the threat, identify as it pertains to deploying cybersecurity capabilities in an ICS manufacturing environment, which CyberX, hopefully, will be taking part of this build with us also. That is all I have. Phil, if you would like to take over?

Speaker Bios

Phil Neray

Phil is the VP of Industrial Cybersecurity for CyberX, whose notable customers include 2 of the top 5 US energy providers; a top 5 US chemical company; a top 5 global pharmaceutical company; and national electric and gas utilities across Europe and Asia-Pacific. Prior to CyberX, Phil held executive roles at IBM Security/Q1 Labs, Symantec, Veracode, and Guardium. Phil began his career as a Schlumberger engineer on oil rigs in South America and as an engineer with Hydro-Quebec. He has a BSEE from McGill University, is certified in cloud security (CCSK), and has a 1st Degree Black Belt in American Jiu Jitsu.

Michael Powell

Michael Powell is a Cybersecurity Engineer at the National Cyber-Security Center of Excellence (NCCoE) at the National Institute of Standards and Technology (NIST) in Rockville, Maryland. His research focuses on cybersecurity for the manufacturing sector, particularly how it impacts industrial control systems.

Mr. Powell joined the NCCoE in 2017. In his previous positions, he was responsible for the management/oversight of building and commissioning US Navy DDG-51 class ships. He also served in the United States Navy for over 20 years, retiring as a Chief Petty Officer.  He holds a bachelor’s degree in Information Technology from University of Maryland University College, A Master’s degree in Public Administration from Bowie State University, and a master’s degree in Information Technology from University of Maryland University College. Mr. Powell is currently in the final stages of completing his Doctorate degree in Computer Science at Pace University in West Chester, New York.

Jim McCarthy

Jim McCarthy is a senior security engineer at the National Cybersecurity Center of Excellence (NCCoE) at the National Institute of Standards and Technology (NIST). He currently

serves as the Federal lead for Energy Sector projects. The NCCoE collaborates with members of industry, government, and academia to build open, standards-based, modular, and practical example reference designs that address cybersecurity challenges in key economic sectors. The center benefits from formal partnerships with market leaders, including several Fortune 50 companies.

Mr. McCarthy joined the NCCoE in 2014, after serving at the U.S. Nuclear Regulatory Commission. He also worked in various cybersecurity roles at the U.S. Department of Transportation. In his previous positions, he was responsible for the management and operation of the cybersecurity incident response teams for these agencies. He also performed security assessments on components of the nation’s critical infrastructure systems. He holds a bachelor’s degree from Providence College and master’s degree from the Johns Hopkins Carey Business School.

Tim Zimmerman

Timothy Zimmerman is a Computer Engineer in the Intelligent Systems Division at the National Institute of Standards and Technology (NIST), Gaithersburg, Maryland. His research focuses on cybersecurity for the manufacturing sector, especially its impact on industrial control systems and robotics.