# Projects:2016s1-160a Cyber Security - IoT and CAN Bus Security

## Prescient Kannampuzha - Security Investigation of CAN Bus IoT network implementation and its interface to the Internet

Group Members: Adrian Daniele, Michael Bassi

Project supervisor: Matthew Sorell

## Abstract

The project investigates the potential of utilising a CAN bus (Controller Area Network) for creating a network communication network between low-cost Internet of Things (IoT) devices. The project first investigates the potential advantages of using CAN bus for creating an IoT network. The project critically analyses potential security concerns that exists within the CAN bus architecture. The project proposes solutions to the security issues discovered in the analysis process. It then analyses CANcrypt an automotive CAN bus security solution and its compatibility in terms of security to a home IoT network. The project also evaluates the requirements to creating such a network. The project results found that CANcrypt is a viable solution that can be used to secure CAN bus networks used as a home IoT network together with slight modifications in the CAN protocol, its higher layer protocols and minimum hardware requirements. The results of this project can be used as a foundation for creating a secure communication framework between IoT devices in the future.

## Introduction

Problem Statement More and more devices are becoming interconnected while moving towards the future. One method of networking is using CAN bus [Controller Area Network Bus] for networking IoT [Internet of Things] devices. CAN bus is system of communication developed by the automotive industry to reduce the number of connections between different components in the car. The security of using CAN bus and its protocols as an IoT network has not been rigorously analysed. This project aims to close this gap by investigating the security aspects of such an implementation. The project will try to find security flaws and propose solution to issues found. The project outcomes will have significant impact on future designs for IoT devices and will results in a more secure framework for IoT networking.

Author Role

The author's role in this project is to design and conduct majority of the research required for the project. The author may be assisted by students and supervisors from a collaboration between the University of Adelaide (Australia) \& Tallinn University of Technology (Estonia).

Motivation

A report by (Cisco, 2011)\cite{cisco2011} forecasts that over 50 billion \hyperref[subsec:IoT]{\textit{Internet of Things (IoT)}} devices will be interconnected by the year 2020. The main motivation for this project is the significance of security of these devices. In an ever increasingly interconnected world, the security of these devices will become paramount.

The CAN bus is already used in different industries such as Automotive, Medical \& Production Machinery and Aviation Industry. The motivation for using CAN bus for networking of IoT devices instead of other methods is due to the various advantages such as cost, robustness, bus system and bandwidth (further elaborated in section~\ref{subsec:CANbus+-}). In addition, the author is personally motivated to utilise this technology due to working at a firm with a strong focus on manufacturing products using CAN bus.

Significance

The implication of a security breach in an IoT network could comprise the security, privacy and safety of users:

    * A malicious user could potentially take over your home automation or security system (Security).
* A malicious user could connected to your baby camera, it could monitor your user habits such as times when you are out of home or sleeping (Privacy and security).
* A malicious user could open your door (Safety and security).


The project's expected outcomes may provide solutions to potential security issues (such as the ones above) and could lay the foundations of using CAN Bus for securely networking IoT devices. The project can have applications in other industries apart from home automations as security aspects could be relevant to other industries that use CAN bus such as the automotive industry. The results of this project can further increase the security of communication protocols used for IoT networking and can provide valuable insight into potential implementations of using CAN bus consumer home automation. The software model developed for simulation of security can be used by other organisations for evaluation of their own security.

Objectives

• Use existing systems to develop a home automation IoT Network model using CAN Bus.
• Investigate security of \emph{Internet side Interface} of control nodes

- Evaluate any issues found - Propose solutions - Identify requirements and barriers to implementation

Scope - Project Constraints Project is limited to analysing CAN bus, and further limitations are applied to using only CAN open protocol for any protocol specific analysis. Further limitations are applied in terms of security solutions, as only CANcrypt as a solution is identified.

## Background

Internet of Things (IoT)} \label{subsec:IoT} The Internet of Things (IoT) has been defined as, “as a global infrastructure for the information society, enabling advanced services by interconnecting (physical and virtual) things based on existing and evolving interoperable information and communication technologies.” (ITU, 2016)\cite{ITU2016}

In this project IoT devices are defined/limited to low cost embedded devices such as sensors that are interconnected through existing communication technologies.

Controller Area Network (CAN) Controller Area Network (CAN) is a serial communication protocol that is used for networking/communication between interconnected devices. There are many different variations of the protocol that are used in different industries. The latest version is known as CAN FD, which allows larger data rates and uses an extended frame. The CAN protocol was first designed by Bosch and was released as open source (CAN-CIA, 2016)\cite{cancia2016}. CAN was initially developed as means of communicating between different parts of car. This protocol has later been extended to meet the requirements of various other industries such as aviation and production. Different industries have created their own proprietary protocols. Currently there are proposals to integrate this protocol as a means for IoT devices to be interconnected at a low cost.

CAN Bus A bus is a physical communication system that allows the transfer of data between interconnected devices. A CAN Bus is a networking system that allows the transmission and reception of data using the CAN Protocol. A CAN Bus is composed of two lines (CAN High and CAN Low) for transmission of data and Power and Gnd. A new device is connected to the bus simply by connecting the CAN transceiver to the respective High (CAN H) and Low (CAN L) lines.

The CAN bus is relatively tolerant to noise due to its differential signalling and certain protocols even allow the CAN bus to operate with just one line working at a lower bit speed (fault tolerant CAN ISO 11898-3)\cite{cancia2016}.

OSI Model Open Systems Interconnection (OSI) model is a conceptual model that characterises and standardises different communication systems and functions into abstraction layers. OSI Layers refer to differing layers of abstraction used in the communication scheme. This project will mainly investigate the Physical Layers, Data Link Layer and Presentation Layer.

Sniffing CAN Bus Data sniffing refers to making a copy of data that is transmitted or received on a networking without modifying what is being sent or received. CAN protocol by design is a broadcast system this implies that all nodes connected to the bus will receive all messages. This suggests that sniffing CAN bus data is extremely simple and currently the most common way to prevent message from being recovered is to encrypt the data transmitted at the presentation layer.

Using CAN bus for intercommunication for IoT devices results in a robust and relatively simple structure of the network. Firstly CAN Bus is already being used in a variety of different places such as the automotive industry, medical/production machinery and aviation Industry. CAN bus has relatively high resistance to electrical noise due to its differential signalling system of communication. Moreover, the addition of IoT devices to the CAN Bus is as simple as connecting two wires to the CAN Bus. It is a relatively simple and well understood protocol and many micro-controllers have CAN protocol capable communications built in. These reasons together with its simplicity leads to a relatively cheap system of connection.

Control Nodes

Control node provides Control \& Monitoring of systems. A cluster of these are known as Control Systems. Provides a translational interface from IoT network to Internet.

## Methodology

Project Structure}

• \textbf{Extensive literature review on preliminary research surrounding CAN Bus protocol security}

-Research CAN Protocol (Frames, Error Correction, Etc.) -Research other Protocols that can run over CAN Bus physical layer -Research CAN Bus OSI Model Layers and its implementation -Research existing IoT communication using CAN Bus -Research other similar protocols or physical networks for ways in which they have secured their physical networks.

• \textbf{Identify potential security flaws pre-existing or Discover new potential security flaws}

-Research and investigate whether security flaws exist -Using preliminary research propose whether new potential vulnerabilities could be used -For each security flaw investigate into how the attack or vulnerability works

• \textbf{Propose possible solutions to flaws (existing) or Propose new solutions to flaws}

-Investigate into potential solutions that pre-exist -Investigate into methods of detection or prevention of flaw -Investigate into actions/solution to be performed once vulnerability/attack is detected (Find a solution)

• \textbf{Evaluate the limitations, requirements and barriers to implementing the described system architecture}

-Evaluate requirements imposed by security solutions -Evaluate general requirements imposed by IoT devices and its limitations -Evaluate cryptopgraphic limitations

Methods

• \textbf{Literature Review:}

The main method of information acquisition will be through using online research or consultation with experts. The selection of research articles will be of high quality through use of high level academic resources such as Compendex and IEEE Explore.

• \textbf{Security Analysis:}

Evaluation of security flaws will be investigated using either a software or theoretical model. Previous knowledge from stage 1 and information from Estonia Cyber Security Study tour and collaboration with Tallinn University of Technology students will be used.

• \textbf{Solution Evaluation:}

Evaluation of potential solutions to flaws will also be investigated using the same model to compare the benefits and evaluate whether the solution is a feasible implementation. Collaborate with group members and students from Tallinn University of Technology to create and evaluate solutions.

• \textbf{Requirements, Limitations and Barriers Evaluation:}

Evaluation of requirements, limitations an barriers will be done using information gained from the previous topic. Further methods that aid evaluation include collaboration with other members.

## Literature Review

7 Layer OSI model To analyse the security for the CAN bus network protocol we need to understand how the CAN bus specification can be abstracted into OSI layers. The 7 layer OSI model is a method of structuring a network protocol into seven different layers and thereby enabling the management of a complex network protocol. This model is abstracted into different layers implementing both the hardware and software for the protocol.

\centering \includegraphics[width=2.5cm]{7osi.png} \caption{7 Layer OSI reference model} \label{fig:7osi} \end{figure}

Physical Layer The original CAN bus specification in ISO-11898 only defines the \emph{physical and link} layers in the OSI model. Higher layers are left for the user to implement for flexibility and optimisation to specific applications. In the OSI model, the physical layer deals with the encoding of bits onto a physical medium, the characteristics of the physical medium and its connectors. In terms of the CAN bus specifications ISO-11898-2 \& ISO-11898-1, defines the Physical coding sub-layer, Physical media attachment and Physical media dependent interface. The physical coding sub-layer refers to the encoding of bit information onto the physical medium of transmission. The CAN bus system uses two lines for as the physical medium for networking and two lines for power. The two data lines use a differential system to encode information and provide an increased level of protection against noise. The information is encoded at a bit level using dominant and recessive states. The physical media attachment refers to the device and receiver characteristics of CAN bus transceivers. The original specification did not specify exact requirements but an industry wide accepted standard was later specified ISO-11898:2003. The electrical aspects of the physical layer such as current and voltage is also specified in ISO-11898-2. Physical media dependent interface refers to the connectors used to connect to the physical medium. For the CAN bus a connector was not explicitly specified in the model to provide flexibility. Hence different connectors exist on the market ranging from connects such as DE-9 connector (the defacto standard) or RJ45 connector. There are also newer specifications for the physical layer, CAN FD, that enables higher bit rate than the original specification. In terms of security of the physical layer there are specifications for protection against noise and short circuit damage prevention, but otherwise there are no other security or safety measures.

In the OSI model, the Link layer (also known as data link layer) deals with the transmission and reception of frames (a group of encoded bits representing the message). The link layer is responsible for routing of frames and providing encapsulation, it may also provide other services called link-layer services (Kurose \& Ross , 2015)\cite{kurose2015}. For the CAN bus, the data link layer is responsible for bit timing and synchronization, message framing, arbitration, acknowledgement, error detection and signalling and fault confinement [Source CAN Specification 2.0 Part A]. The main specification for the details are specified in CAN 2.0A and CAN 2.0 B which describes an extended data frame version.

For CAN bus routing is done by multicast, this implies that at layer 2 level, all devices (nodes) that are connected to the transmission medium will receive the message. Hence in reality no specific routing method occurs as message is sent to all routes. Messages are sent using an identifier that is sent in the frame that identifies who can use the message. This identifier is also known as the Priority or Arbitration field. All nodes receive this message and filter messages based on this field. If the arbitration field is irrelevant to the node it will discard it.

CAN bus specification provides multiplexing through a Carrier Sense Multiple Access system using Non-destructive Bitwise Arbitration CSMA/NBA. Carrier Sense refers to the system that the transmitting node also listens to transmission medium. The specifications also allow multiple nodes to send a message at a time and by using a technique known as non-destructive bitwise arbitration which only allows the highest priority message to be sent (hence implementing a form of collision avoidance). This priority level is based on the Arbitration field.

The services provided by layer 2 are described in further detail:

\textbf{Bit timing and synchronization:} For CAN bus specification, all nodes must operate at the same bit rate, similarly there is a defined sequence used to synchronise to the bus. If the nodes are not all synchronised, the messages will interfere and could cause de-stability in the bus resulting in transmission to stop.

\textbf{Message framing:} For CAN bus specification, the exact encapsulation and framing of data is specified. The exact specifications are as described in the figure (TO BE ADDED) below. There are different structures for different types of message frames. The types of frames that are defined are data frame, overload frame, remote frame, error frame. In this specification, other details such as space between each frame (inter-frame distance) and other details are specified. The main frames of concern or focus in this thesis will be data frame, overload frame and error frame. An extended frame version of the specification CAN2.0 B is also specified.

\textbf{Arbitration:} For CAN bus, arbitration is a method by which messages can have non-destructive bitwise conflict resolution. This is achieved by having the CAN receiver simultaneously transmitting and listening to what is eventually transmitted on the CAN bus. Every message sent on the BUS contains of either a dominant or a recessive bit. A dominant bit will always ‘defeat’ or be prioritised higher than a recessive bit. Each node sends its own message and starts with a priority. This priority determines who gets access. For example, if node A’s priority is 101 while node B’s priority is 110 (assume 1 means dominant and 0 means recessive). Then when the first bit is transmitted by both Node A and B, no one will detect a conflict as both sends the same bit. For the second bit, node A sends a dominant bit at the same time as node B sends a recessive bit. The BUS sends a dominant bit because the dominant state overrides the recessive state. Node B detects a dominant state was sent which is different to what it wanted to sent, hence it will wait till next transmission. Node A will not detect that any other node wants to send a message and will keep transmitting as normal (hence known as non-destructive). This arbitration is mainly achieved using the identifier/priority field but depending on higher order implementations parts of this field can be used for data transmission increasing bit rates.

\textbf{Acknowledgement:} For CAN bus acknowledgements are done by all nodes even if they don’t care about the actual message. This is used to provide a strong ACK and used to prevent propagation of errors. Unique Identifier: Two different nodes cannot use the same Identifier. If this occurs the bus will have erroneous data (but will be detected by the transmitter). The only exception to this is if one of the nodes send NO data.

Error Detection, Signalling and Fault confinement

The same specifications also provide Error handling services:

[CAN 2.0 A Specifications]\cite{kvaser}

These are the main security features for providing reliable and secure means of transmission on the bus. In the specifications, 5 different methods are employed for detection, handling and signalling of errors.

\textbf{Bit Monitoring:} The transmitter on the CAN bus monitors the transmitted bit level, if it detects a different one was sent than intended it will send a Bit Error Signal. This is effectively a self-monitoring check. During bit arbitration where lower priority messages will not get sent, this bit monitoring will not send a bit error signal (otherwise errors will be sent constantly if another node also wants to transmit at the same time).

\textbf{Bit Stuffing:} If 5 consecutive bits of the same value is sent on the BUS then an extra stuffing bit (an extra bit) is sent that is of the opposite value. Receiving Nodes can detect if an extra stuffing bit was not sent and will signal a Stuff error if more than 5 same valued bits are sent. This is also done to reduce the DC component of the bus.

\textbf{Frame Structure Check:} Receiving nodes check whether the structure of the message frame is correct. As mentioned in the previous section 'message framing', each message frame type has a specific format/structure for the message. If this structural format for fixed bits is not met, then a Form Error is signalled.

\textbf{Acknowledgement check:} Same as previous mentioned section all receiving nodes provide an ACK on receiving messages. If the transmitting node does not detect an ACK then an Acknowledgement Error is signalled.

\textbf{Cyclic Redundancy Checksum (CRC):} Each transmitter sends a 15-bit CRC code after the data. If any of the receiving nodes detects a different CRC in the message than what is calculated by the receiving node. It will signal a CRC error.

\textbf{Error Confinement:} The CAN specifications suggest a method of confining errors of faulty/ erroneous nodes through the use of errors counters and escalating consequences. Each individual node keeps track of two error counters: Transmit Error Counter and Receive Error Counter. A success will reduce the counter and a failure will increase the counter. The transmit error increases at a faster rate due to the specifications assuming that most likely errors are caused by the transmitter. If an error is detected in transmission the node will send an ‘Active Error Flag’, will increase the counter and will try to resend the message. This process will repeat until the error counter increases above 127 then the node will enter an ‘Error Passive state’. In this state the transmitter will still try to send the message but will send a ‘Passive Error Flag’, it will also suspend transmission meaning that it will wait until the BUS is idle to send its message. Eventually if the Transmit Error Counter increases to more than 255, the Bus will be off and will no longer transmit [Bus off state].

NOTE: This behaviour is governed by the transmitting node not by a master or other receiving nodes. This could be a significant point of attack if a malicious transmitting node could destroy bus traffic. i.e. error confinement is self-regulated. In fact, all the protocol etiquette is self-regulated.

Higher Level Protocol Layers

For the CAN bus, the higher level protocol layers(HLP) Network, Transport, Session, Presentation, Application layers from the OSI model are not officially defined by one ISO standard. There are many different HLP that could be used. Each industry (automation, automotive, control) use their own specific HLP implementation. Each manufacturer in the car industry uses their own proprietary protocol. Buses and trucks use a different protocol known as SAE J1939. Industrial applications use CANopen. CANopen is maintained by the CAN-in-Automation group. Another HLP is known as DeviceNet, this is a protocol standard maintained by the Open DeviceNet Association (ODVA) and is controlled by Rockwell Automation. Another HLP that can be considered is CAN kingdom and Smart Distributed System (SDS). These specifications together comprise of majority of the HLP in the market.

• CAN open [it is an open specification] used in industrial automation. It can choose different types of communication models such as master/slave. Client/server. Broadcast.
• CAN kingdom this is a master slave system. This is an interesting system to have because in this the master contains all the identifiers and distributes them. In the investigation this could give a potentially radical answer to security. CAN king free simulation software.
• DeviceNet this is once again a master/slave system. This once again has relevant simulation software hence should be considered. Also this simulates closest to an IoT network.

\cite{kvaser}

Previous Studies

Alternate Protocols that can run on CAN Bus: Very Simple Control Protocol (VSCP) is an open source protocol that can be run over a physical CAN Bus that exists which can be used for implementing home/building automation and Internet of Things (Galloway, 2015)\cite{galloway2015}.

Currently there have been several research papers and simulations done such as using CAN Bus \& IoT for Collecting pressure information for buildings using CAN Bus (WS,XC,FL\&LZ, 2012)\cite{various2012pressure}. This does not contain much information pertaining to security but rather in implementation of such a system.

Gap in CAN Bus \& IoT: Currently there is no widespread adoption of using CAN Bus as the main medium of intercommunication between IoT in real life applications. Also there is very little academic research done in terms of security aspects of using CAN Bus for IoT. There are some security conferences conducted by CAN in Automation (CIA) group which could be relevant for IoT (iCC, 2016)\cite{icc2016}.

Distributed System Safety Report

The following section is an excerpt from the Safety of Distributed Machine Control Systems, Validation Methods report. source:\cite{kvaser2}

 Bus Errors: The bus is vital in a distributed system. Two obvious types of errors are that messages are destroyed and that messages cannot be sent on the bus.

Timing Errors: The nodes may require fully synchronised and correct clocks in every node for correct operation of the system. Both hardware and software faults may result in incorrect timing.

Data Consistency Error: Nodes cooperating on the same task should have data of the same age. Inconsistent data may lead to different decisions taken at the nodes, even if they are programmed with the same algorithms.

Initialisation and Restart Error: It will be hard to know in which order the computers of the network will start after a power up sequence. Proper routines for synchronisation must be implemented.

Babbling Idiot Errors: \emph{'Babbling idiot'} errors occur when one or several nodes in the system overloads the communication bus by erroneously sending a lot of high priority messages on the bus so that other nodes cannot send their messages.

Configuration Errors: Usually a system will only have the correct function if exactly the right types of nodes are used at the correct physical positions. An incorrect mix of modules, or an incorrect parametrisation of programmable modules, may cause a configuration error.

The paper shows that a variety of errors can occur in control systems. What was interesting about this system is that it specifies that systems must prepare for each of these errors and do certain things as response to certain events or errors. It also provides a good list of potential levels of failure of a control system. This could range from a catastrophic failure of system to no failure. This paper will be very useful in evaluation of extent of security issue for a system.

This should be considered part of a risk assessment of a system in designing IoT networks. This can enable the risk to be mitigated. Detailed parts of this report is missing but I am trying to get it by email. The specifics of detailed testing and validation is removed from the report.

ODVA Industrial Cybersecurity report

The following section is an analysis from the Industrial Cybersecurity report by OVDA (Open DeviceNet Vendor Association) which maintains the standards for DeviceNet.\cite{odva}

There are many security issues brought up by the report that is relevant to the thesis. It provides a list of issues and concerns that needs to be addressed for a protocol to be considered secure. It also highlights the major difficulty with smart nodes presently.

Issues brought by this:

Authorization of devices: The ability of devices to know whether the sender or receiver of the message is a trusted entity. This is highlighted as the most significant issue present. There are complications in sharing keys and ways of authenticity without (pre-sharing keys). There are issues about how to manage trust between the devices especially in terms of CAN bus there are only few fields of data that can be used to identify devices (and these fields can be easily spoofed or replayed). This report suggests that a secure protocol must allow management of authorization and authentication of devices.

Integrity of Messages: A secure protocol must ensure that a message has not been tampered with. This report suggests that a cyrptographic proof is necessary. An interesting point to this is if the message is tampered and the cyrptographic proof is also tampered there will be almost no way to figure out if the message has been tampered. In the CAN bus if both the CRC and the message is modified there is almost no way of detecting a message has been tampered with (by the receiver) [although the sender will detect that the message they sent has been changed] [consider Man in the middle attacks too].

Spoofing Identity: A secure protocol must ensure that different devices cannot impersonate other nodes. The main concern is duplicate device identifiers. In terms of CAN bus, two devices cannot have the same Arbitration ID.

Information Disclosure(info seen by people who are not meant to see it): This suggests that only the intended receiver of the message should receive the message. Currently in CAN bus this can only be ensured using encryption.

Denial of Service: A possible flooding of message could stop legitimate messages to not go through.

Elevation of privilege: Increasing the level of authorisation.

CIP Security Summary

Another research done by the highest layer protocol used by DeviceNet. This summary shows that DeviceNet borrows some ideas and assumptions for different existing standards used in Internet networking.

CIP (Common Industrial Protocol). This report shows that the previous assumption of secure inner layers cannot be made and difference in the method of HLP have been made. This report stipulates that a secure system will enable the:

The rejection of altered data (thus maintaining integrity of communication).

The Rejection of messages by untrusworthy nodes (managing authenticity, how to authenticate devices).

Reject messages that are not allowed (authorisation).

The CIP security suggests that not all devices need to have the same level of security hence different devices can have differing level of security also known as ‘security profiles’. Each device should provide the user with appropriate level of security.

The security used for CIP devices are based up IETF-standard (RFC 5246) and (RFC 6347). This method tries to utilise TLS used in HTTPs for communication.

It achieves authentication of endpoints using pre-shared keys or X.509 authentication for point authentication.

Message integrity is achived using TLS message authentication code (HMAC)

Message encryption is done by an algorithm negotiated via a TLS handshake.

NOTE: the CIP system is mainly used in EtherNet/IP based systems so will need to check how easily these ideas can be applied to a CAN network. \cite{odva}

## System Architecture

\centering \includegraphics[width=17cm]{table.png} \caption{System Architecture Comparison table \cite{table} \cite{1} \cite{2} \cite{3}\cite{4}\cite{5}\cite{6} } \label{fig:table} \end{figure}

\textbf{Analysis of Table of Results}

This base analysis of what different system offers, shows that CAN bus is in a unique situation of being the mid-cost scenario for things. This means that CAN bus can be utilised well, because it has the capabilities to support the number of a medium number (255) of devices at acceptable speeds of 1 Mbit/s and a single line. CAN bus is a flexible system that does allow a trade-off increasing the length of a line by reducing speed.

The cost of Ethernet based bus systems are more expensive in comparison to CAN bus per node. Although, it does support a higher bandwidth. These are the two key differences between CAN bus and Ethernet based solutions. There are a host of existing technologies that ensure security in Ethernet based traffic. Many of the technologies that are used for Internet Protocol can be used for Ethernet based traffic.

LIN is the extremely cheap BUS based system, very similar to CAN bus but even cheaper. This specification was created by the automotive field, when using CAN bus for everything was deemed too expensive. LIN underperforms in almost every category, especially speed and maximum number of nodes (only 7 devices maximum). But it should be noted that LIN is extremely cheap, lightweight and easy to implement. LIN also is not every secure.

WiFi enables wireless connections between nodes. WiFi has extremely large bandwidth the equivalent of Ethernet in the suite of wireless systems compared. Although, WiFi speeds are still lower than maximum speeds achievable by Ethernet. WiFi grants mobility to devices, but overall is the most expensive type of system to implement per node. The main advantage compared to CAN bus for WiFi is the increased mobility and bandwidth.

Bluetooth is an alternate wireless connection system. Bluetooth devices are cheaper than WiFi devices, but has the limitation of only have 7 nodes communicating simultaneously. One possible way to get around this limitation is to utilise a master polling system. Where the main master polls devices and creates a sort of polling pattern that determines a schedule. Bluetooth also has low penetrating power through walls, hence multiple transmitter will be needed if there are major obstacles. High priority devices can be scheduled more often. The main advantage in terms of CAN bus is the wireless nature of these devices, Bluetooth devices in general are more expensive than CAN bus, per node.

ZigBee is a cheaper alternative to Bluetooth, it makes the trade-off of limiting speed and range, but reduces the cost and also enables more simultaneous nodes to be connected. ZigBee also has the capability to route traffic through creating ad-hoc networks between ZigBee capable devices (which could eliminate some of the range problems). This system also has another advantage over Bluetooth and WiFi in terms of power consumption.

Important conclusions that can be derived from the analysis, is that each different specification has its own strength. This means that CAN bus alone as a system or rather any system by itself will not be suitable for all situations. Hence, this thesis recommends using a system architecture which combines all of these strategies for their particular use case, by using CAN bus as the main backbone and augmenting it with all the other systems.

In the proposed system architecture, the major back bone for the devices will be run on CAN bus, and will use LIN connections for extremely low level IoT devices that can logically be grouped together (For example switches that are very close to each other). In addition, the system will utilise Ethernet connections for high bandwidth intensive purposes such video information transfer and processing. In places or smart devices that require mobility, a wireless based solution should be used. For high bandwidth performance., WiFi should be used. For majority of wireless devices Bluetooth or ZigBee can be chosen, depending on the requirements for speed or power.

## Security Analysis

The following analysis is only done the CAN bus system of the architecture, further more an assumption that CAN open as a Higher Layer Protocol is made. These limitations are used to limit the scope of the project. Further more, after each issue is described a solution will also be proposed and the mechanism in which it alleviates the issue will be described.

Denial of Service (DoS) Denial of Service style attacks predominately work by overloading a particular system architecture with too many messages or request or some mechanism until it causes the eventual shutdown of the communication system. With CAN bus, this extremely simple to achieve due to the very structure of the bus system. Every node can theoretically transmit at the same time, and the “highest priority” value message in the arbitration field will be sent. Theoretically a malicious node could apply a DOS attacks by simply using the highest priority message, which is a series of 1’s 29 bits long. So 11111….1111 as the arbitration field value in the CAN frame.

An easier method to achieve BUS shutdown through DoS, is clamping the BUS to a constant dominant bit (1). Initially each node (IoT device) in the CAN bus, will try to send a message, but it will not go through. It will keep repeating this process of trying to send a message. Then after 7 attempts, it will enter error states. Where it will make the assumption that the node itself is the cause for this failure. Due to this, it will eventually reach an error counter to move into radio silence state. In this state, no nodes will be able to communicate and the master will not be able to communicate either.

This behaviour occurs, because the default CAN protocol behaviour makes the assumption that messages that can’t be sent are failing because of a failure of the sending node (itself). Hence to stop being a nuisance or interrupting normal CAN bus traffic, the node itself will shut down/radio silence.

Detection of this state will be quite simple as clamping the BUS to be a constant value is detectable. Unfortunately, the master node cannot send any messages to safely shutdown any important nodes. CAN Open protocol does not specify any actions, to be done in DoS situations.

This is a major flaw of using a CAN bus system, but it is an inherent flaw due to the very design of a bus system. The only way to truly solve this problem is by redesigning the system, but then it will no longer be CAN bus and is out of scope of this assignment.

There are three solutions that are recommended to mitigate the impact of the DoS issue. The first one, is to segment the CAN bus network into multiple buses that are isolated from each other. The second, is a variation of the first option, and creating a secondary bus consisting only of critical nodes. The final method, is to create a safe shutdown of nodes, when no communications are detectable on CAN bus.

The first system, is something that is very practical. CAN Bus lines can be logically and physically separated into different segments grouped by a single link in a room for a household. Each line in a room will be connected to the next line in the next room, by a two-way repeater, which just repeats messages received on one end to the next link. This repeater device is effectively a conventional network switch, but is compliant with the CAN protocol. The use of a switch will enable CAN systems to isolate DoS faults to just a single room. This will also help to identify the culprit device, as it limits the area of the fault. As a result of this constraint, BUS line lengths will be smaller, thereby increasing maximum theoretical speed. [Further extension to this idea will be creating some form of message switching based on identifier fields. This could also reduce traffic as it will only send on messages that need to be routed. But further discussion on this idea is out of scope of this thesis].

The second system, is one that is utilised in the automotive industry to provide some level of safety against faulty components that may cause BUS communication to halt. Safety critical nodes are in separate BUS line. This is again to create some form of isolation from lines that have been attacked. By creating a separate node/redundant node only for safety critical ones means the number of attack vectors or nodes are limited. Similarly, the redundancy just for the critical nodes, mean that critical information or control messaging can still be done. The only situation where this system fails, is when a safety critical node is compromised. The safety critical line may also be hidden physically so it’s harder for malicious users to access. Although, this should NOT be used as an expectation that it provides ‘real’ security.

The last method, of creating safe shutdown procedures is something that should be done for extremely important devices or functions, for example a security lock system for a household. This is technique is also utilised in the automotive industry for safety critical components such as an Engine Control Unit (ECU). If nodes detect that a DoS or other critical attacks debilitate the ability to communicate the entire system should shutdown safely (also known as graceful shutdown). In the case of a car, it might turn the engine off. In the case for a security lock, it might revert back to manual mode, not allowing automatic opening and only allows access through a physical key. This method will usually require high levels of logic and planning on the node, because the designers of the device will need to think of how to execute such a design. The detection of this state should be simple enough (as whenever it reaches radio silence mode -> it turns to safety/graceful shutdown mode).

Spear DoS Spear DoS (node targeted DoS) is a variant of DoS style attacks, but instead of bringing down the entire BUS line, a specifically targeted type of message or a particular type of device could be targeted. For example, only the master node, or only smart light devices are targeted, to disable its functionality to stop transmitting messages. The mechanism of Spear DoS, requires the malicious node to listen on the Bus until the targeted Device Identifier field is broadcasted on the bus. When it detects the target ID, it will clamp the BUS to dominant bits for all parts of the data transmission. The transmitting node, will detect that the message sent was not correct and will attempt to re-send the message. Similarly the malicious node will attack any subsequent messages in the same format. This occurs until the targeted node’s CAN controller goes into error state and shuts down transmission.

This variation of DoS, could be less detectable as it would display symptoms of a faulty transmission. The only node who can truly know that it was not intentional, is the transmitting node. And the transmitting node, is completely crippled from transmitting any messages that it is being attacked.

The impact for this attack is quite intense, especially if it is done on a safety critical device. The solutions to mitigate this issue, can utilise the techniques described in the previous section on DoS. Additionally, it has an extra solution that applies to any vulnerability that relies on the identifier field. The main target mechanism for this vulnerability, relies on being able to identify a device solely by the identifier field. If the identifier field can no longer be tied to a single device, or if the identifier field is encrypted by some means, then this type of targeted attack can no longer work (as the malicious node cannot detect the targeted device).

The limitations to this proposed solution, is that one encryption should change often enough, that an encrypted value cannot be heuristically linked to a particular node (as using the same encryption key on same identifier will result in same output every time, hence encryption key must be changed). Or alternatively, the identifier field could be sent in the data field, once again with varying encryption but this method has a downside of taking valuable data space.

BUS Sniffing BUS sniffing (spying) refers to a device that silently listens to all traffic that occurs on a CAN bus. These types of devices are not necessarily malicious. They are often used in automotive industries to provide easier access to on board diagnostics of CAN buses. But in a home IoT situation, it can be used maliciously to spy on users unsuspectingly.

The book “Information Security Theory and Practice”,\cite{book}, analyses this hardware scenario and uses a CAN Bus analyser device to analyse CAN bus traffic and shows a physical example of this attack vector in action. The mechanism of this type of attack is inherent to the CAN Bus system. The CAN bus is a broadcast system hence there is no way to avoid all nodes from being able to “listen in” to a message. To make matters worse, in basic CAN protocol, devices are tied to their device Identifier and hence messages can be tracked to each device. Additionally, it is almost impossible to detect if a sniffing device exists on the BUS. Sniffing can be used to gain information about user behaviour. For example, a large number of turning light off messages, could be used to identify user sleep time. This information could be used to a malicious user’s advantage. Similarly, if pictures or other sensitive information is transferred over a CAN bus, it could also be stored.

As it is very expensive to physically create separate lines, and it also defeats the purpose of a BUS system, most solutions to solve this issue by “speaking in a language that no external devices can understand” i.e. to use some form of encryption that enables privacy of messages being sent. For this particular issue if nodes are able to encrypt both the device identifier field and also the data that is being sent, will enable full communication between accepted devices with no issues arising from bus sniffing. By encrypting the identifier field, it will essentially break all higher level protocols because the specifications for higher level protocols are only defined for encrypting the data payloads and thus all HLPs will need additions/modifications to account for this.

Even with encryption, theoretically a message will still be broadcasted over the BUS. Hence if the number of messages being sent is used a metric, it may be used to identify peak usage times and off-peak usage times for a household. This information could be used by a malicious user to target the household. This issue will not be solved by encryption because even if the spy can’t understand what is being sent, it can still “hear” something is being sent.

A possible solution to this problem is sending fake or empty messages at random intervals, to keep the BUS busy during off-peak times. This should be done by the master node, to reduce the complexity level of the IoT devices. The master should keep a track of the average rate of messages being sent at peak time, and try to maintain that during off-peak time, in order to mislead any potential sniffing devices.

Transmission of false messages This attack vector describes the state where a malicious node transmits either a false message impersonating another node (spoofing), modifying the message sent from another node (man in the middle attack or message injection) or finally replaying messages (replay attack). Each type of attack is described in more detail below. \subsubsection{Spoofing} Spoofing is a term that is used to describe when a malicious node impersonates a trustworthy node and sends messages impersonating that device. For example, a malicious node could pretend to be the master node, and send a shutdown message to the safety critical IoT devices. If a malicious node could potentially access a CAN bus network and it has full understanding of the protocols used, basic CAN protocol does not use any form of authentication or encryption hence a malicious node could gain highest level permissions and can execute any command that is possible. This issue is further worsened if the malicious node is paired with a remote sniffer and a malicious user could relay malicious CAN bus messages remotely. These types of devices exist in the automotive industry, and are primarily used for remote on board diagnostics and testing where test commands are sent remotely to cars; but these devices could easily be modified to be used maliciously in the home IoT CAN bus system.

The spoofing mechanism relies on two CAN bus properties: there is no authentication of devices described in the CAN protocol and there is no device identifier field encryption or obfuscation. A real life example of this, is the Jeep Cherokee hack in 2015, where researchers were able to overtake a normal node and use that to “spoof” messages to other parts of the car such as brakes by pretending to be the ECU. Solutions to solve this issue is by implementing some form of authentication to devices and also encryption of device identifier fields. The same solution as described in previous sections for encryption can be used. As authentication will solve this issue if it is possible to implement authentication perfectly, but there are limitations and issues in implementing authentication properly in a CAN bus system due to difficulties in safely sharing keys or pairing nodes.

\subsubsection{Replay Attack} Replay attack is a type of attack vector where a malicious node ‘records’ a legitimate message and then ‘replays’ it at a later time. If the message that is recorded is relevant it could have devastating effects. This type of attack can even persist through simple encryption. For example, if only symmetric key encryption is used, it can still replay the message because the malicious node does not need to know what the encryption is, it will just replay the already encrypted message. An example of this type of attack on CAN bus, in an automotive scenario is seen the book pg (177), “Information Security Theory and Practice” \cite{book}.

A solution to this replay style attack is to use a Message Sequence Number (or a counter) and combine that with the data payload before encryption. A random number will be used to start the message counter base value (also known as a nonce). Each subsequent message will use a number that is higher than the previous one. Hence, if a replay attack is used, this number will be equal or lower than expected; thus will be rejected. (Note: A nonce is a number that can be used only once).

A caveat to this situation is if the replay is played a sufficiently long period such that the message sequence number has restarted its counter. This means careful considerations will be needed for the size of the sequence number. The larger the number, the more transaction that can happen without having to change the number sequence, but it also takes more overhead in the data frame. Also, considerations in choosing the next number will also need to be done. It has to be a sufficiently random number, that a malicious node can’t guess it.

An alternative solution to replay attacks, is to also encrypt a timestamp with the data payload. If the timestamp does not store enough information for example it only stores up to 1 year, then a message could be replayed after 1 year. Ideally it should be sufficiently large in comparison to the expected lifespan of the system. If we assume the lifespan should be around 30 years (9.5e8 seconds) which can be stored in 30 bits. The use of a timestamp will also increase the overheads of the payload. In fact, 30 bits is nearly half of the maximum frame size (64 bits) this suggests using timestamps in a low data frame situation for CAN bus is impractical and a sequence numbered based approach is the only viable option.

A malicious node could replay messages. This could lead to unwanted behaviour for example it stores the open door command, and then replays it later. Obviously with encryption it will be hard to know whether it is open door or close door command. But it could keep guessing and result in havoc

Encryption concerns and limitations \subsubsection{Encryption of device identifier field (arbitration field)} Unfortunately, there are few issues with encryption that needs to be addressed. Firstly, as discussed in previous sections the identifier field will need to be encrypted. Unfortunately, the device identifier field is also the arbitration field (priority field) and so for higher priority devices if encrypted may not retain its priority. This creates a potential conflict of interest. A solution to this is partial encryption where the Arbitration field can be split into a priority field and an identifier field. In this case the sniffer can determine the priority of the device, but will not be able to determine which device sent which message. Another alternative to the encrypted identifier issue, is to use the same identifier in the identifier field, but use the data field to describe to the real device. But this solution has the disadvantage of using data space to send this information.

\subsubsection{Encryption Key Change} Encryption keys need to be changed after use for a particular time. If the same key is used eventually it could be solved with sufficient time, especially considering that sniffing could leak messages to a much more powerful system. It cannot be considered cryptographically safe unless the keys change at a rate faster than which it can be probabilistically guessed. Ultimately this value will depend upon the type of algorithm chosen and how easily the correct key may be guessed.

A key limitation for the CAN bus, is the maximum data frame size of 64 bits. Generally, most algorithms that use 64 bit sized keys are considered unsafe as modern computers are sufficiently powerful enough that if a CAN sniffing device can transport encrypted messages outside to a higher powered computer it can crack it. Block ciphers is a type of encryption algorithm that uses a fixed size block and manipulates that with a fixed block of data payload to create the encrypted results. Certain Block ciphers have a limitation that it needs at minimum data payload size that is larger than 64 bits such as AES. And since CAN bus frames are limited to 64 bits, any block cipher that requires data payload sizes larger than 64 bits can’t work. Similarly, the Arbitration Field is at maximum 29 bits (this does not line up with the conventional power of 2 style block ciphers => 32 bits or 64 bits usually).

An alternate type of algorithm known as a stream cipher which uses a stream of pseudo random numbers (cipher stream) and usually just uses an XOR (exclusive or) function with each data bit. This pseudo random stream can be used to encrypt any size of payload. It can internally be generated using pseudorandom numbers, which can use key sizes larger than 64 bits. As it is only used internally.

Using stream ciphers, requires both the recipient and the sender to be synchronised. The CAN bus protocol, automatically does synchronising the synch bits. Hence this issue could be solved due to the protocol specifications. Stream ciphers will also need a way to share the starting point in the keys between two nodes. This key sharing issue will be discussed in the next section.

\subsubsection{Encryption Key Sharing, Generation and Management} The key sharing issue refers to how keys can be exchanged between two devices safely before any cryptographic communication has occurred. This issue exists because there needs to be a secure way to transfer key information about different nodes or their synchrony safely without revealing this information to any nodes that can spy on this information.

One solution to this problem is using public key cryptography to exchange keys. This would require the nodes who wish to be authenticated to have the ability to store public key certificates. Also the processing capabilities to support this infrastructure. Research states that public key crypto is computationally intensive to achieve on 8-bit low-cost embedded system and would likely need a cryptographic coprocessor to achieve this result. [A cost-Efficient Implementation of Public-key Cryptography on Embedded Systems, Conference Paper · July 2007, DOI: 0.1109/EDST.2007.4289808 · Source: IEEE Xplore, Conference: Electron Devices and Semiconductor Technology, 2007. EDST 2007. Proceeding of 2007 International Workshop on Electron Devices and Semiconductor Technology].

Another issue related to keys, is the generation of keys used for encryption. If the keys were created in a way that could be guessed, then encryption could be compromised. This is a big issue because in reality it is very hard to create truly random numbers through software alone. Usually, software can only create pseudo-random numbers. This means for the IoT devices, it must have some method to create sufficiently random numbers, so that the pattern can’t be guessed. A solution could be to use some hardware means to generate randomness.

\subsubsection{Replay attacks encryption concerns and limitations} There are few important limitations that are imposed by the solutions to replay attacks through encryption using nonces and sequence/message counters. For both these items, the larger the value, the more effective security it can provide to a system and the more utility it can provide to a particular system. As it will become harder to guess the nonce – since more numbers to guess, and the counter can run for a longer period of time before it has to be reset. Unfortunately, increasing the size of these values is a trade-off with increasing overheads in transmission as now the increased size will take even more space of the limited data-payloads and reduces efficiency of transmission. This trade-off will need to be analysed for the particular implementation used to come up with ideal values for that system architecture.

Another issue is the generation of sufficiently random numbers for the nonces. If the number pattern can be guessed then, an attack could be done, using these values or the replay attack could work by waiting until the pattern is safe to use. This randomness issue was discussed in the previous sections. And a hardware solution might be necessary to create truly random numbers.

The hardware in the IoT devices will also need sufficient processing power and storage space to store the counters and generate the nonces, which might limit the choice of these values.

\subsubsection{Encryption key storage}

If a valid device, is compromised and taken out of the CAN bus and is externally analysed. Either the encryption keys must not be extractable (or takes long enough that the keys would have changed) or they must also not save the encryption keys in non-volatile memory. If they store in non-volatile memory there is a possibility that the device firmware could be modified and if a malicious user knows where the encrypted files are stored in non-volatile memory it could use the already accepted or created keys.

A solution to this issue is to change keys faster than the potential rate of modifying the device firmware or more realistically, to store it in volatile memory. Although, storing it in volatile memory results in more hardware limitations imposed on these low-cost embedded devices.

Master Node - Attack Vectors For completeness it is worth considering the main master node as a computer and should be protected in a similar way, for its connection towards the internet. For example, it should use a firewall and have authentication of communication and use encryption for communication to the outside world. Any existing network security analysis can be applied as this is a well-studied topic. As the interface to the internet in a Home IoT network is effectively the same as a computer connecting to the internet.

The main master node should also have sufficient protection from possible attacks from the inside network. As in, if a person broke into the house, they should not be able to easily access the main master node’s management interface and perform malicious code. This can be accounted for by using a strong password for accessing the management interface.

Updating firmware to malicious ones - Authentication This case had to be included because it was one of the reasons why this thesis was created. Nodes should implement some form of protection against accepting updates from non-manufacturers. This can be achieved if the device has some form of public key cryptography where the manufacturer’s public key is stored in each IoT node, and the manufacturers private key is used to encrypt the hash of the firmware update (Signed firmware). This should prevent malicious people from being able to update the device. Unfortunately, this will not stop a manufacturer from going rogue. Similarly, this will also impose hardware and software limitations on the IoT.

Authentication} Authentication of devices is a solution that can ensure the safe exchange of encryption keys and will also ensure that malicious devices cannot spoof valid devices. Authentication is effectively validating whether a device is accepted or not.

A particular implementation of authentication could be based on a unique identifier for a device such as a Media Access Control (MAC) address or serial number. This unique ID can be used to authenticate devices by a system of white listing based on just an address. The whitelisting process will require user intervention. This could be done through the main master node’s management interface. And the user can confirm whether the intended unique device ID matches. This should further be augment by introducing a password system to accept new devices from the main master node’s management interface.

\subsubsection{Authentication of new devices} If a simple plug-n-play system is used to authenticate new devices, malicious devices could connect without any user notification. CANopen does not have any strategies to mitigate this issue. It assumes that safety comes from the fact someone physically has to connect to your network, which makes authentication unnecessary.

There is an issue with this type of system as malicious devices could spoof valid device IDs. This can occur if the device ID is broadcasted in an unencrypted form to the main node during the joining phase to the new network. This issue can be mitigated by encrypting or hiding the authentication procedure of the unique ID. [Note: the unique ID referenced here does not have to be the same value as the Unique ID used in the Identifier Field/Arbitration Field in the CAN bus]. This could work by the main node automatically accepting new nodes, and starting an encrypted conversation with it in order to authenticate the device. Once the device is authenticated it will be white listed. Since this authentication procedure occurs through encrypted communication, malicious nodes will not be able to utilise the devices unique ID such as a MAC ID in order to regain authentication for a malicious node.

A different form of authentication is utilising a form of pairing two devices similar to pairing Bluetooth devices. This occurs by firstly accepting a device, and then both devices share a key which will ensure that authentication does not need to re-occur. Unless one of the devices lost the key or if it expired. This is also a valid alternative, and similarly the authentication procedure should be encrypted.

One method of authentication can be achieved using public key crypto. Where each device has its own unique set of Public \& Private key pairs. But this method is relatively expensive to implement and is more resource intensive, hence will not be discussed in detail.

Integrity Integrity of messages is not an issue as CAN protocol implements a form of Cyclic Redundancy Check and a series of Acknowledgement Bits to ensure the message integrity is not compromised. This section is included for completeness sake.

ID Allocation issue Each device on the CAN bus requires a unique Device Identifier Field value. If two devices have the same ID, the message will conflict, and hence the message will be lost, and if these two devices always overrider each other, then both those nodes will eventually be in radio silence mode (effectively DoS attacking itself). Thus an important consideration for home IoT network is the allocation unique identifiers for devices. For CAN open there is a predefined list of all the major types of devices that could exist and the various states and messages that it can communicate. In the context of a home IoT network, many new devices could be added and creating a pre-defined list might hinder in obfuscating the device identifier field.

Furthermore, the CAN bus has a 29bit field space for identifiers, this theoretically enables device counts of up 2\^29 approximately equal to 500million devices. To reduce this even further the identifier field space is also the same value used for arbitration (‘priority of commands’) hence to implement some level of priority this field is reduced even further. Currently, the CANopen specification utilises a system known as object dictionary. Where the master can be pre-configured with device types and uses serial number matching to identify the device. This enables plug-n-play style of devices that are pre-configured, but does not allow new types of devices to be added.

Ultimately, the CANopen protocol only has a limitation of 255 nodes to exist on a single CAN open network. This is not enough for a mass produced system, for example if a million households over the world used around 500 smart IoT devices, then the count will already be exhausted.

A solution to this problem is by creating a form of look up table which assigns a different unique identifier field and matches that to the specific device’s MAC address. This process should occur during the authentication phase. This will result in a modification of the CAN open protocol, but this slight modification will enable new devices to be added and will also remove the limit of 255 maximum nodes as a larger number of values can fit in the object dictionary.

Additionally, another solution is to solely use the Arbitration field as a priority field and to send the device identifier information in the data field. This alternate suggestion is considered as a major change in the CAN open protocol, but is still an alternative, the resulting issue being that there will be a massive overhead as every message sent will be taking up extra space from the identifier equivalent to the maximum accepted size for the identifier. If a value of 29 bits is chosen, then nearly half of the data frame (64 bits) is used just for identifying the device. Hence this recommendation is only required if an extension to the CAN bus protocol is required, in the case of an extremely large number of IoT devices existing in the home IoT network (greater than devices being represented by a 16-bit number greater than approximately 65,000 devices)

## CANcrypt security analysis

CANcrypt is system created to secure CAN communication between nodes, created by Esacademy (Germany)\cite{cancrypt} it adds encryption methods and authentication methods that relevant to CAN systems’ requirements of real time processing requirements and low hardware capabilities. This system was introduced early 2016 into the space is mainly intended to work in between the CAN controller (data link layer) and the CAN protocol layer (CANopen higher level protocol). The CANcrypt system utilises a separate hardware configuration and management device to enable pairing of devices, generation and exchange of keys. Due to the relatively new nature of this system, raw figures on uptake are not readily available for further analysis. It should be noted that, this system was recognised by the organisation that maintains the CAN bus standard’s newsletter.

\centering \includegraphics[width=8cm]{can_layer.png} \caption{Layer \cite{cancrypt}} \label{fig:canLayer} \end{figure}

Authentication Authentication is achieved through a pair of two techniques an external device to exchange generated keys and a hierarchical system of keys in built to the device (that can only be overridden by keys higher than the level to override).

CANcrypt starts performs authentication by a system pairing process. This pairing process starts on the external CANcrypt configuration device. Applying this scenario to a home IoT network, the main master node could replace this functionality and perform configuration of the network. The CANcrypt system relies on having the initial configuration of the system to be performed in a safe a secure environment. This implies that no malicious nodes should exist while initially setting up the network. This is one limitation of the authentication scheme utilised by CANcrypt. \cite{cancrypt}

\centering \includegraphics[width=8cm]{can_keys.png} \caption{Key Hierarchy \cite{cancrypt}} \label{fig:canKeyr} \end{figure}

The hierarchical system of keys is used to combine to create a permanent unique key for the device. CANcrypt supports key sizes between 128, 256 or 512bit. It uses a combination of three variable keys (manufacturers key, systems integrator key and owner key) and a 32-bit serial number for the device to create a unique permanent key for the device. This type of hierarchy of keys could potentially back fire, as the higher hierarchy keys can be used to override the lower ones. In particular, a systems integrator that decides to go rogue can easily override the encryption as the power is bestowed to them by this system’s specification. This issue can be detected as changing keys or adding new devices will require the shutdown of the network temporarily. Furthermore, if a new device is added (or one exchanged), all keys need to be erased and newly generated for pairing of devices. The keys generated are stored locally on each device pairs. \cite{cancrypt}

Encryption Algorithm The encryption algorithm used to secure general operational communication between CAN bus nodes is based on a One-time pad encryption. A one-time pad is a sequence that is used only once, to encrypt the data, this works in a similar way to the stream cipher encryption method. Except the one-time pad could be considered as a single block of the stream.

CANcrypt system generates a 64 bit one-time pad based on a 64 bit dynamic key, an 8 bit message counter and the (128-512) bit permanent key stored on paired devices. The dynamic one-time pad created then used single bit XOR with data values to encrypt the data. The choice of XOR systems in a low-cost embedded system is good as single bit XOR can be implemented very quickly in hardware. The dynamic key used in these communications are exchanged using the external configuration device during pairing of devices. In the context of home IoT systems, most likely this will be between main master node and the respective IoT device.

\centering \includegraphics[width=10cm]{can_preamlbe.png} \caption{Preamble \cite{cancrypt}} \label{fig:canPreamble} \end{figure}

The CANcrypt system does add overhead into basic CAN communication due to the preamble that it sends prior to any secure communication. The CANcrypt system uses the preamble as a system to ensure messages are not stale (the system specifies if a message is received 10ms after the preamble, then the message should be rejected) and also to synchronise message sequence counters. The additional overhead occurs, because it utilises an entire CAN frame for the first frame of communication. After the first frame, it no longer has any overheads.

Scope of applications The preamble together with the message counter provides a solution for replay style attacks. Since the dynamic keys used to create the one-time pad changes, even if a replay attack is done after a long time, when the message counter could be repeated, the encryption will no longer match. This is a very elegant solution implemented in a low-cost situation.

Authentication of devices, if the devices have been paired correctly in a secure environment then CANcrypt system will ensure that its authenticity can be validated. Furthermore, tampering of device firmware will cause it to be lose pairing. To reconnect to the network, the device will need to be re-paired. Ensuring the integrity of IoT devices. Protection against spoofing is also achieved by authenticating devices. As only authenticated devices will have the required keys to be accepted by other nodes.

Protection against sniffing is achieved through one-time pad system and single bit XOR encryption over the data frame. CANcrypt system also enables protection of the identifier field through ambiguity of CAN-ID pairs. Each device pair, has a pair of Identifiers. Both devices randomly use one of the IDs, during communication. This means that nodes that sniff these connections will not be able to distinguish between the two nodes, albeit it will be able to tell what two nodes are connected. This CAN-ID pairs will reduce the total number of identifier fields that can be used effectively in half, but will enable obfuscation of device which is more important for privacy concerns.\cite{cancrypt}

Note: the two pairs are chosen on random, if the two nodes chose the same node by accident, then they will resend the message. One node can force the other node to use a specific ID pair value, through specific type of messaging. Although, using this will reduce the benefits provided by this scheme.

Limitations Denial of Service is still as effective as before, potential solutions to mitigate this is explained in sections above and those solutions can be implemented to account for this.

Choice of sufficiently random numbers, if the numbers used to generate the pseudo-random one-time pad or the dynamic key are not random enough, then a device could predict the pattern and utilise that information to potentially compromise the encryption scheme.

Another type of attack that could exist is a node that waits silently until the initial pairing phase is detected. Since the assumption is made, that it only occurs in a safe and secure environment, a malicious node could just simply wait until it has been safely and easily accepted. This is a serious flaw that has not been considered, because CANcrypt was created as a solution for automotive industry where when the components are originally installed in a safe and secure environment. This is the biggest limitation of using this system in a home automation IoT network setting.

Hardware Requirement \subsubsection{Memory} About 2KB of code space Non-volatile storage for key(s) (32 to 512 bytes per key) About 100 bytes of RAM \subsubsection{Processor cycles} Some 100-150 cycles each for encryption/decryption/authentication Housekeeping: background task called about once per millisecond, a few 100s of cycles \cite{cancrypt}

Conclusion CANcrypt system seems like a promising system that solves and accounts for many of the basic issues that are present in using CAN bus as a communication network for IoT devices. It provides a system of encryption, authentication and key management. This scheme looks promising but more work will need to be done in terms of safe addition of devices into the network. A home IoT network, is a type of system where it is quite likely that new devices will regularly be added to the system for maintenance and other purposes. A potential solution to this is by creating a white list, and a user can manually add a device to the white list.

## Barriers to implementation and deployment

Barriers to implementation and deployment/List of Requirements: All the information from all the previous sections together can be summarised to create a summary list of requirements, that will be necessary to create a CAN bus IoT network system: Home IoT networks will need the capability to:

Design choice

• If possible, to distribute the majority of the complexity to the master node rather than placing it on the nodes itself. This design choice is intended to reduce the cost of IoT devices, and reduce the requirements for its hardware. It will also enable greater functionality in interaction between IoT devices as the master node is the smart device that controls the main logic.}
• CAN bus master node should be capable of using different Higher Layer Protocols as different manufacturers may opt to choose different HLP. This issue exists because there is no industry standard protocol.}
• Master node must secure and isolate the CAN bus’s connection to the internet. }
• All IoT devices will need to meet the minimum requires for CAN bus communication.}
• IoT devices + main node must be cheaper than other alternatives for it to truly succeed in the market.}
• CAN protocol modifications to suit home IoT networks better. (Also solving the ID allocation issue)}

Enable encryption of device communication on CAN bus

• Enable encryption of CAN identifier or effectively obfuscate identity of device transmitting messages}
• Enable encryption of data frame}
• Encryption should be sufficiently quick to be processed within 10ms. }
• Encryption should be sufficiently simple to be implemented on low-cost embedded devices.}
• Encryption should be able to work on low data frames of 64 bits.}
• Encryption should aim to minimise overheads on transmission of encrypted data.}
• Encryption should be able to account for replay style attacks}
• Encryption algorithms should be based on truly random numbers for generation of keys.}
• Encryption keys should change with sufficient frequency, with respect to the ease of compromising the key and the level of security required.}
• Hardware must be capable for storing encryptions keys. Additionally, encryption keys should not be stored in a location where it can be compromised on the hardware.}
• Exchange of encryption keys should be possible in a secure method.}

Enable authentication of devices communication on CAN bus

• Enable authentication of devices through some mechanism for example pairing.}
• Authentication of new devices will need to be considered}
• The special case of authentication of new devices in a malicious environment will also need to be considered.

Authentication should also be able to authenticate firmware updates.}

• The hardware and software must be capable of performing required algorithms or processes to authenticate within the capabilities of a low-cost embedded device.}
• The hardware should be able to store unique device numbers such as a serial number or a MAC (Media Access Control) address.}
• Authentication should support the secure exchange of encryption keys.}

## Conclusions

The system architecture of using a main master node, and then using CAN bus as the main back bone for wire IoT communication is on that provides a nice balance between the costs and features provided. This system architecture when augmented by other systems increase the versatility and flexibility. CAN bus as a system by itself has many fundamental flaws due to lack of security. The various solutions proposed in the thesis will help mitigate this gap and enable CAN bus to be fully utilised. Further more, CANcrypt an automotive crypto solution addresses many of the issues raised and finds valid solutions to them. Future version of the system architecture should use variation of CANcrypt and small variation of CAN open together to create a secure communication environment. The analysis of the technical security review shows that it is possible to secure a CAN bus for a home IoT network and the results of this analysis can be used to create the framework of this new system.

Future Work

Potential future work include creation of a full CAN bus system example. This project could also be extending by performing a security analysis of all the different types of potential systems interacting together. Threshold studies to see market acceptable for economic costs of IoT.

## Plagiarism Declaration

I declare I have not colluded or plagiarised and where other references have been used have been cited appropriately.

## Project Management

Risk Management

   * \textbf{Ethical release of information:}
There is a potential to release sensitive information that could be used maliciously. Thus the ethical/safe release of potential data must be concerned. This involves notifiying relevant organisations well in advance if a potential security issue arises.
There is a potential risk to not meet deadlines. This risk is mitigated by setting internal deadlines and setting a clear goal.
* \textbf{Loss of files:}
There is a potential risk to lose report files, source code and research. This risk is mitigated by using an automated backup solution - Dropbox. This also enables versioning of software


Project Specific Skills \& Training Cyber Security training will provide fundamental and important training into potential security aspects that could be applied to this project. The Cyber Security Training will be conducted in Estonia during Semester 1 holidays. This professional development/skills training has already been approved for by the relevant discipline and global learning office.

No other professional development or skills training are allotted for this project as of current status.

Technical Challenges

Adaptation of software might not be possible if software is not open source or if it is really expensive. The creation of an entire modelling suite is not within the scope of this project.

\newpage

## Estonia Trip Report

ESTONIA TRIP This a post-trip report for the Estonian Cyber Security Summer School 2016 course, Collaboration with Tallinn University of Technology. This report will be split up into three sections: Trip summary (which includes a summary of events, special locations visited or general experiences), C3S summary (which includes a summary of material learnt) and Tips \& Improvements section (which includes a selection of tips for future applicants and suggestions for improvements for future version of this course). Trip Summary 1.1.1 Places of Interest Old Tallinn (Viru Gates): Scenic views of an ancient castle and the gate that leads to the kingdom of Old Town Tallinn. The gates serve as the gateway to the main town centre. Old Town Tallinn is maze of stone roads, contains a plethora of restaurants and souvenir shops. This is a short trip by trolley bus from the hostel.

Churches: There are many churches that you can visit the interior and for some of them you can also climb them. Town Centre:

Try the Ox soup, view the Town Hall and also look for the Zero Point of Tallinn and see if you can spot 5 spires of different buildings.

KGB Museum: Located in Hotel Viru, it provides insight into how life under a communist controlled country used to be. It provides a good historical overview and exposes you to Estonia’s culture.

Tallinn TV Tower: Experience exhilarating views from one of the highest points in Tallinn. This was personally my favourite experience. The fresh air and view is incredible …

Farmers Market: A moderate walk from the hostel, gives you access to the freshest berries and fruits that you have seen but be aware that these berries go bad very quickly, so eat them quick.

Forests: Take a stroll through one of the many forests located literally next door. Great for a morning walk and also for spotting some wild strawberries.

Mektory: A place completely different that Adelaide Uni Hub. The best way to explain this is like a library but instead of books, it contains rooms with different resources for different industries that can be used for start-ups. For example, they have rooms with Filming and Audio equipment, 3D printing equipment, Machining and Metal rooms, Virtual Reality rooms and much more. It is a haven that promotes entrepreneurship and supports tech start-ups in Estonia.

Food to try Manna La Roosa (4/5): Great decoration, good food for a reasonable price, go for the atmosphere.

Old Tallinn Town Center (3.5/5): Ox soup and pastries from the town center in old tallin. Go spear fishing for some pickles if you get the chance. Self-service, mediocre for cleanliness \& toilet, good serving size, cheap.

Umami Resto (5/5): Personally my favourite place I’ve eaten in my life so far. Great outdoor atmosphere, great home brewed beer (definitely try the mango or dark beer). Probably the best entrée, main, drinks and dessert I’ve had in Estonia. Everything was cooked to perfection.

Events and Meetings} Mektory tour and entrepreneurship presentation.

e-Estonia Showroom presentation about e-Government and Estonia’s influence on

Presidential meeting with the Estonian technological minister. Skype tour and presentation. ICR 2016 presentations Cyber Security Summer School tour of Tallinn (Guided) Cyber Security Summer School week long with various talks, practical experience and moot court. NATO visit and presentation. First lady of Estonia meeting. C3S Summary \& Learning Summary Mektory: The entrepreneurship speech really provided insight into the financial and economics side of starting up tech companies. It also highlighted both the risk and benefits of investing in tech companies.

Skype: The most famous tech startup from Estonia, it is a role model for other start-ups. Their somewhat unique work culture and work buildings that are designed to promote team communication and personal health demonstrate a different type of work culture than portrayed by regular engineering companies.

ICR 2016: This provides an opportunity for you to have your thesis to be peer reviewed by an international cohort. It also provides you an opportunity to present your thesis to an international audience. To me this event provided me with a lot of insight into my own capabilities and also forced me to think about presentation of my thesis and not only its content.

This conference also provided opportunities to network with many international people and also to listen to other people’s presentations.

e-Estonia – eGovernment capabilities: This presentation provided a deeper understanding of the X-Roads system used by the Estonian government. They also gave a real life demonstration of e-Voting and digital signatures using 2 factor authentication. It explained the motivation of the X-Roads system and provided insight into its design choices.

NATO: The NATO visit provided insight into what NATO does and how it promotes increased cyber defence capabilities of countries.

Cyber Security Summer School (C3S): A week long summer school course and one of the main purposes of this trip. This provided an introduction into digital forensic analysis, the tools and procedures used by industry leaders, criminal cyber security law and the necessity for having a good link between tech people and lawyers in a cyber security related case.

This course formally introduced the concepts of triaging through data, preservation of data and data modification. The course had a simulated scenario which was very thoroughly designed involving a murder case. This simulated scenario presented us with a chance to use forensic tools to provide evidence to support a legal case. This case also provided an opportunity for tech people to liase with lawyers and come to a mutual understanding of what is required, and how the evidence is required to be presented. There were also numerous presentations and networking sessions about different technology related topics and also cyber law related topics. Tips \& Improvements Summary} Tips Accommodation: You can borrow equipment such as an Ironing Board + Iron or cleaning equipment from the reception. This means you don’t have to lug around an iron to remove wrinkles from your clothes.

Daylight persists for a long time in Estonia, during summer. Hence, for optimal sleep, ensure that you take a sleep mask.

Don’t forget a Travel Adapter for EU plus a power board for all your devices.

Food: Food is really cheap to buy from supermarkets. Your main source of food is from your Local Konsum or if you wish to walk even less there is a 9-11 mini shop across the road from the hostel.

If you want the closest Hypermarket just take a trolley bus and stop just before the really big sign saying Majistral. There you will most of things you need plus a few fast food places if you don’t want to spend time cooking. Now for the last option you can go to one of the good restaurants in Old Tallinn. I recommend trying the Ox ribs/Ox soup place.

Travel: Travel light and drink moderate amounts of water. Travelling in Estonia is very easy through public transport but if it is after 12am you will need a Taxi. Using Taxify app is probably the easiest way to use a taxi. A special tip – prepare one set of clothes for wet weather. Eg. Bring an umbrella and shoes that you don’t mind becoming wet.

Must see places: TV tower view from the top – Hands down one of the best experiences with the caveat being you are not afraid of heights.

More views are available from roof top bars.

Just wandering around Old Tallinn is quite enjoyable, you can see many historical buildings, churches and roads. Also there are supposed to be underground catacombs.

A ferry trip to either Hellsinki or Stockholm is not too long and is quite possible to plan a day trip or an overnight trip. So it is a possibility that can be pursued.

Clubs \& Pubs: Club Hollywood, Club Mint, Must Puudel

Money: I suggest a 50-50 split of Cash Money and Credit Cards/Travel Cards. Try and convert money before travelling as usually you will get a better rate. The densest area for currency exchange in Adelaide is near Rundle Mall \& King William St Intersection. Just walk around there to find the best rates. Most places in Estonia accept credit cards but sometimes it is just good to have some cash. Especially in small market places, although even they accept cards nowadays.

Learning: A Laptop is almost a necessity for this trip, make sure you have one that works semi decently (don’t need a high end one). I also suggest keeping a journal or jotting down questions or interesting points raised during the study tour. It is also a good way for you to reflect on your overall experience.

Improvements A significant portion of your expenses on bare necessities such as basic food and basic travel can be reimbursed. The current system requires use of your own funds and then is reimbursed at a later date. This system could be improved by firstly creating a list of acceptable items that are refundable. Secondly, use a pre-paid credit card or some equivalent card system and give 1 between 4 people/room. This should enable people to use that money very easily to spend on essentials while also not having to deal with the hassle of refunds. Although few procedures will need to be implemented such as having to keep all receipts and proper informing of acceptable expenditures.

## References

\bibitem{cisco2011} CISCO, Dave, A. (2011), \emph{The Internet of Things How the Next Evolution of the Internet Is Changing Everything}, \url{http://www.cisco.com/c/dam/en\_us/about/ac79/docs/innov/IoT\_IBSG\_0411FINAL.pdf}, last accessed: 08 MAR 2016.

\bibitem{ITU2016} ITU, \emph{Internet of Things Global Standards Initiative}, \url{http://www.itu.int/en/ITU-T/gsi/iot/Pages/default.aspx}; last accessed: 08 MAR 2016.

\bibitem{cancia2016} CAN-CIA, \emph{History of CAN technology}, \url{http://www.can-cia.org/can-knowledge/can/can-history/}; last accessed: 08 MAR 2016.

\bibitem{icc2016} iCC, \emph{The International CAN Conference}, \url{http://www.can-cia.org/services/conferences/icc/}; last accessed: 08 MAR 2016.

\bibitem{cancrypt} CANcrypt, \emph{CANcrypt}, \url{http://www.cancrypt.eu/}; last accessed: 08 OCT 2016.

\bibitem{galloway2015} Galloway~A, \emph{Easy IoT with VSCP}, \url{https://community.freescale.com/projects/easy-iot-with-vscp/blog}; last accessed: 08 MAR 2016.

\bibitem{kurose2015} Kurose, Ross, \emph{Computer Networks \& Applications}, 6th Edition; section 6, pg 112. \bibitem{various2012pressure} Weiyuan Sun, Xing Chen, Fangyi Liu, \& Liang Zhao. (2012). \emph{ Collection System for Pressure Information of Buildings Based on the IoT and CAN Bus}. Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on, p274-p277.

\bibitem{PHDproforma} University~of~Adelaide. RV\_1. \emph{Report Template \LaTeX}, 18 MAR 2016.

\bibitem{table} Texax Instruments \url{http://www.ti.com/lit/an/slyt560/slyt560.pdf} last accessed: 08 OCT 2016

\bibitem{1} CAN bus specifications - ISO 11898-6:2013 \bibitem{2} \url{http://www.computer-solutions.co.uk/info/Embedded_tutorials/can_tutorial.htm} \bibitem{3}\url{http://rickardnobel.se/actual-throughput-on-gigabit-ethernet/} \bibitem{4}\url{IEEE 802.3-2008} \bibitem{5}\url{LIN Specification Package Rev. 2.2a} \bibitem{6}\url{https://learn.sparkfun.com/tutorials/bluetooth-basics} \bibitem{7} Babak Falsafi, TN Vijaykumar. \emph{Power-Aware Computer Systems: Third International Workshop}, PACS 2003, San Diego, CA, USA, December 2003 Revised Papers, Page 90. \bibitem{kvaser} KVASER, \emph{CAN protocol tutorial} \url{https://www.kvaser.com/can-protocol-tutorial/#/tab-1398107038616-8-4} last accessed 09 Jul 2016 \bibitem{kvaser2} KVASER, \emph{Safety of Distributed Machine Control Systems, Validation Methods report} \url{http://www.kvaser.com/wp-content/uploads/2014/08/sodmcs.pdf} last accessed 09 Jul 2016 \bibitem{odva} KVASER, \emph{ODVA report} \url{https://www.odva.org/Portals/0/Library/Publications_Numbered/PUB00278R2_Optimization-of-Industrial-Cybersecurity.pdf} last accessed 09 Jul 2016

\bibitem{book} Claudio Ardagna, Jianyin Zhou, \emph{“Information Security Theory and Practice”}, pg 177

Section last edited by A1647873 (talk)

## Abstract

The aim of this project is to find methods of determining whether a device on a network is the one that is expected and hasn’t been tampered with. The project will primarily be looking at simple devices such as Ethernet connected Programmable Logic Controllers. The use of digital fingerprints will be explored to build up a known characteristic profile of each device’s Ethernet traffic. This will include looking at characteristics such as time to live, round trip times and Internet Protocol Identification numbers of the received packets. Once reliable methods have been established, a process will be developed that can create the fingerprint for each device and monitor it for malicious activity. In a real-life application, processes can be built into a software package that would run on a central computer and monitor devices on its local network, alerting an administrator if an anomaly is detected.

## --- 1 Introduction

The Internet of Things (IoT) involves connecting sensing and output devices to the Internet via a gateway. IoT devices are a relatively new concept and the security and authentication of these devices is rapidly developing. These devices usually aren’t in secure places and can be compromised. Hackers could trick the system into thinking these ‘things’ are still active, or providing false data. Spoofing is when a device imitates the characteristics of another device which can be used to gain control or change how a system operates. The easiest way for this to be done is called internet protocol (IP) address spoofing, where the false device takes on the IP address of the original device. This means, there is a need to find a means of device identification which can’t be easily replicated or falsified.

Current security methods involve using security certificates and challenge-response methods that are used in standard computer networks. In industrial networks, security is usually an afterthought. Allowing access to critical equipment from the internet opens up a major vulnerability in these systems. The same applies for personal Internet of Things (IoT) devices, although the consequences of a hack may not be as severe.

This project aims to find a way to identify these devices by creating a digital fingerprint that is unique to each one. This would allow devices already deployed to be monitored, whereas most research is directed to new devices and assumes they can be updated. Cost is an important factor when building IoT devices. Reducing the processing power each device needs to identify itself results in it being built more cost effectively and consuming less power. By analysing patterns in the data transmitted over Ethernet channels, models can be built to define each device. There will be statistical models or models to simply observe how close a reading is to the device’s ‘average’ which will be used as its fingerprint. These fingerprints will then be used to monitor live devices and continually check whether they are the same device, or if they have been tampered with.

The outcome will be a process that could be implemented into IoT software packages or run in parallel, monitoring devices in real time. Devices connected in industry now link to the outside world, usually through a computer (Industrial Internet of Things). Usually these devices are simple sensor devices that are connected via Ethernet. Home PCs have much more variable traffic and it becomes more difficult to create an accurate fingerprint for them based on network characterstics. The process will be developed by first creating a basic reference network with two devices and a router. The device’s Ethernet traffic will be monitored to create a fingerprint based on its traffic characteristics. Test cases will then be developed and executed to test for different attacks. The captured data during each attack will be analysed to see if the system can detect the anomaly. The project will explore a range of methods to identify devices that don’t rely on certificate/key based systems. The concepts found may also apply to other digital transfer mediums such as wireless, which is increasingly being used in IoT applications. Once a device is identified, it will be monitored to determine if it has been tampered with. Where tampering is detected, a system administrator will be alerted to conduct further testing to determine the cause of the alert. This method would be effective on small scale systems, but not as effective on a large-scale system, as it would add a large amount of additional administrative burden to monitor alarms.

## 2 Background

2.1 Technical Background

The most common form of IoT security is public-key infrastructure (PKI) which is a system that uses certificates and allows the data traffic to be encrypted. The concept works by sharing a secret key between the two parties that want to communicate. This key is bound to a public key and a third party who can validate the connection. The issue with this method is that the key may not be stored securely on the devices. If one of the devices is accessed while the system is offline, the key can be compromised. This leads to a hacker being able to ‘impersonate’ the original device by using its key. Once keys are compromised, new keys must be issued for the devices which is another cost to businesses and a nuisance for consumers [1]. Other systems involve using a password to authenticate, but this again has many issues. Passwords can be extracted from the device, or it can be stolen by listening to the Ethernet communication channel. This can be done by software on a PC or by connecting a device in the middle of the communication channel to monitor it (man in the middle attack). Passwords can also be guessed by brute force, going through all combinations, unless other measures are in place. There are many other alternatives such as using a code book, longer codes and time based access codes, however, these still can be compromised [2].

Automation is another industry that is moving IIoT (Industrial IoT) where security is not given as much consideration. In the past, most of these systems were closed and had no access to the outside world. By making them Internet connected there are many benefits, however, a large security risk is opened up. Communication channels can be monitored by outsiders and devices can be remotely accessed or modified. Many of these devices are using old technology with small computing resources and limited bandwidth. It is common for industry to use Ethernet as the communication channel, while consumer IoT devices are moving towards wireless. The concepts found in this project may also be extended to wireless communications, however authentication over Ethernet will be the major topic of investigation [3].

Machine-to-machine (M2M) communication is another local form of communication that IoT devices will engage in. In this situation, a third party cannot be used to verify the transaction, making authentication harder. One way of authenticating these devices is using a challenge-response system. This works by one party asking a ‘question’ to the other party and the response is then verified against the expected reply. The method can also be compromised by monitoring these initial handshakes. Many of these authentication protocols add overhead to the data being transmitted, decreasing the network’s efficiency [4].

One example of how the proposed fingerprinting techniques have already been used is called “Passive OS Fingerprinting,” a form of passive network traffic monitoring. This system works by monitoring TCP packets for the Time to Live (TTL) and TCP window size values. It then compares these to known values for different operating systems (fingerprints) to identify which operating system the packets came from. This is an example of fingerprinting a device, however, when spoofing a system using a device with the same operating system, it will not be useful [5] [6]. Methods have also been found to identify spoofed IP packets using active and passive methods. An active method would involve sending packets across the network and analysing responses. Passive methods work by observing existing network traffic. Using the TTL field in the IP packet, it can be determined if the Ethernet route has changed. More testing on this can be done in a local network, as most examples are over an internet connection with many more routers in between. This means that changes in routes may occur at a higher frequency compared to a small local area network which would be static in the case with only one router to the outside world [7].

Looking at the IP Identification Number is proposed to provide another way to authenticate devices. It is associated with the devices IP and changes as packets are broken into smaller fragments. The information is then used to link the fragments and recreate the original packets. Checking the window size in the TCP header is another method but not as useful when a device is swapped with and identical device running the same operating system with similar software [8].

2.2 Project Aims

The aim of this project is to find methods of determining whether a device on a network is the one that is expected and hasn’t been tampered with. One possible attack is where a device in a network has malicious code loaded onto it, changing how it functions. The second is via a remote attacker gaining access and polling the device periodically to gain information from sensors. This could expose a system or even allow a remote attacker to control outputs of a system. The third type of attack to be tested is moving a sensor device to a different location in the network, resulting in the device providing incorrect information. Another attack would be a man-in-the-middle attack by inserting another switch which could listen in or modify data flowing through it. Methods to build up a digital fingerprint of the device’s Ethernet traffic characteristics in a fingerprint creation phase will be explored. Once the fingerprint has been created, a network’s traffic will then be monitored and analysed for any inconsistencies. The outcome will be a process in which a fingerprint can be created and used to monitor Ethernet traffic from a particular device. The system will have applications in the home environment, with IoT devices, or industrial setups with Ethernet controllers and sensors.

2.3 Literature Review

Li and Trappe provide some methods of detecting spoofing from network traffic similar to what will be explored in this project [9] [10]. It also investigates alternative methods to cryptographic keys for authentication, although it is directed towards wireless networks. This is done by using “forge” resistant relationships, such as sequence numbers and traffic statistics. The paper states they are forge resistant, however, this will be further researched in the current project. In a normal scenario, with one device transmitting, the sequence numbers would show a monotonic pattern. If another device was added to the network to spoof the IP of the initial device, the sequence number shows a rapidly fluctuating pattern, as they are likely not to be synchronised. In the case of custom firmware being used to modify the sequence numbers to receive a monotonic pattern, duplicate sequence numbers could still be detected. Gaps between the sequence numbers were also analysed as a varying gap size is another method of detecting a spoofed device. A similar process will be used and tested on the IP identification numbers further in this report. Packet loss is another metric used to determine if a device has been spoofed. Due to wireless transmission characteristics, devices at different locations will have different packet loss probabilities associated with them. This may not be very useful for the current project as LAN connections have much smaller packet loss probabilities, which are harder to detect. The next method that is explored is interarrival times which is the difference in time between packets that are received from a source. The sources are sending packets out at a constant bitrate and the difference in bitrates can be observed and analysed. From this, an extra or modified source device can be detected. This would be similar to the transmission time method explored in this project where the round trip time (RTT) to each device is checked.

Another way of defending against spoofed IP traffic is examined using hop count filtering in [11]. A technique is devised to create an IP-to-hop-count mapping table. It can be used to check whether a device with a certain IP has a consistent hop count. A similar table would be devised in this project with a hop count field along with others. Factors such as stability of hop-counts, and its effectiveness are explored and could be built upon in this project. It also implements a learning state and a filtering state which would be similar to the fingerprint creation state and monitoring state of the final system in this project. Methods of how an attacker could fool the system are explained such as finding out the hop-count of a client to server and modifying their hop count so it will match once it reaches the server. The paper is focussed on Internet servers whereas this project is directed to LANs which may have different characteristics.

Source [22] looks more specifically into hop-count filtering to detect spoofed traffic. The main purpose of this is to prevent Distributed Denial of Service (DDoS) attacks. An interesting situation arises when one device changes operating system. There is the possibility that the initial TTL, set by the operating system, is different and may raise a false alarm. The possibility of this occurring in this project is eliminated by only monitoring simple Ethernet devices which are usually only capable of running a single operating system, unlike general computers. It is determined that for the purpose of defending against DDoS attacks, the hop count filtering method is effective and hop count spoofing would be hard for an attacker. This is because outside attackers can’t easily determine the end TTL value at the server. In a smaller LAN, this may be easier to determine and the effects of TTL spoofing will be explored. Two running states are explained, alert and action states. The monitoring state is when the system is monitoring for spoofed packets and action state is where spoofed packets are detected and discarded. This project will create similar states, however, instead of discarding packets, the system would be required to create an alert to notify of the attack. The TTL values for each client are determined by the value on the first time it connects to the server. This process would be similar to the fingerprint creation phase of the proposed system. Both systems assume the user is legitimate in this phase, otherwise incorrect TTL values may be recorded. An investigation on how RTT can be used to improve hop count filtering (which is a method of determining where packets originated) can be found in [12]. Attackers are able to spoof the hop count number. It alone is not enough to identify a device, as it is not reliable. The investigation was able to verify that RTT could be used in conjunction with hop counts to further narrow down where packets came from. The study focussed more on large inter-country networks whereas this project will be directed at smaller LAN. It was stated that “further work is required to derive and test and algorithm that utilizes both RTT and HC to detect IP spoofing”. The aim of this project is to conduct some work in this area and test the viability of this method.

In [13] a method to check TTL values at each router, instead of at the end hosts is investigated. Although this is a viable method and has been proven to be more effective than using standard hop-count filtering, it requires modified router software and may not be practical for a small LAN setup. This would also reduce the routing speed, which may be critical in some router applications. The use of hop-count and an identification number (PID) to detect spoofed packets and prevent DDoS attacks is shown in [7]. The PID contains information about the router path and the hop count in an encrypted form. The PID is stored in the header in the place of the normal IPID and to decrypt it, each party needs a shared secret key. The use of a key and modified headers makes this method impractical for this project, due to the inability to modify the target devices to achieve this. It is also stated that this method also works for IPv6 protocol, which will be useful in future applications.

Source [14] further extends the hop count detection methods by checking IPID fields to detect spoofed packets. It has more of a focus on the Darknet, a smaller harder to access subset of the public Internet. Some graphical ways of showing the two fields of information were shown and patterns could be seen, allowing anomalies to be easily detected. The source acknowledges that the two fields can be forged (changed by the sender), however, the network may not operate as expected, defeating the purpose of forging the data. This project aims to go further by combining these two data fields with transmission time to see if forging can be detected.

Fingerprinting a remote physical internet connected device using clock skew is also possible [15]. Clock skews are dependent on how a device’s components are constructed and is unique to each device. Similar to the techniques being explored in this project, clock skew makes use of information already made known from the device and requires no modification to its software. Although it may not detect changes in software, this technique has been shown to accurately determine whether a device is the same physical device.

## 3 Finding characteristics and creating fingerprints

The first steps in this project involve capturing and analysing network traffic. Some possible methods to build up characteristics for devices have been found to be useful in the literature review [9] [14] [12]. These methods will be explored to see if they can be used to characterise each device and whether the characteristics are unique to each device.

3.1 Background Theory

This section covers the software tools that will be used during the project and how they will help to create the end result. Packet information theory will be explained to give some understanding of the source of characteristics.

3.1.1 Simulation Method

A range of device arrangements will be utilised to conduct tests for the project. The least complex set up will use two computers and a router. One will monitor the traffic (server) to the other computer (client) that is connected via the router. The captured Ethernet traffic will then be analysed to look for patterns that are unique to that particular client. To test the case where a device is moved in the network, two routers will be connected and the client computer’s connection will be moved from the original to the new. This would simulate someone possibly maliciously moving the device around an industrial network, leading it to possibly provide false information (e.g. a temperature sensor). PLCs will replace the computers to conduct final tests on transmission time. It is expected that the transmission time for computers will vary due to the rapidly changing requests a user initiates, making the monitoring system unreliable. The PLCs will be swapped, have their connection points changed and see whether modifying the logic they execute raises an alarm in the monitoring software.

3.1.2 Wireshark

Wireshark is a packet capturing tool that was used to save the Ethernet traffic to a file which could later be analysed. It is a free and open source tool, with a graphical interface, making it a suitable option for this project. It also gives detail into what is contained within the packets, providing an initial way to look for patterns and characteristics within subsequent packets. As it captures all traffic that the computer receives, filters had to be implemented to only view packets that are relevant to the testing, otherwise the amount of data displayed can be overwhelming (observed in initial tests).

3.1.3 Matlab

Matlab is a computing environment with a graphical interface and a programming language that can be used to perform calculations. It works well with large matrix manipulations and allows data to be plotted. The data from Wireshark can be fed into Matlab to generate matrices that hold the contents of the captured packets. Data can be extracted from the matrices and plotted to look for common characteristics. Algorithms can also be written and tested to check captured data to see if an attack has occurred.

3.1.3 Internet Protocol Packet Information

Each Ethernet packet transmitted over the local network contains information that can be exploited to provide characteristics about the sending device which can be used to create a fingerprint. Within each Ethernet packet is an IP packet which contains information that will be analysed in this project. There are cases where there is no IP packet inside the Ethernet packet, although this is rare in the traffic generated from the devices that will be tested. Figure 1 shows the breakdown of an IP packet and its contents. From Figure 1, it can be seen that the TTL value is within the IP packet. The TTL value is created by the sending device and is a counter that is decreased by one each time the packet crosses a network router. The IP identification number is also contained within the IP packet and its function will be explained in section 3.4.1 [16].

Figure 1: IP packet contents with bit offsets shown at the top [17]

3.2 Requirements

The aim of this project leads to the creation of the following requirements that would provide a useful device identification and monitoring system: Detect when a device has been moved to a different part of the network Detect when the programming of a device has been modified The system must not add excessive amounts of load to the device or significantly add to network traffic. Detect when multiple computers are accessing a device A simple method to create the fingerprint for the device Be able to operate under changing network conditions such as high loads on routers

3.3 Method 1: Time to Live Fingerprinting

TTL is a value assigned to each packet specifying the maximum number of routers a packet can traverse before being discarded. By checking the TTL number of the packets transmitted by a device, some insight into the path that its packets take can be observed. A change in this number usually suggests the device has changed position on the network which could be due to malicious activity. Another reason for a change is the packet is forced to take a different route if a connection becomes congested or a device is busy. The effect of this would also have to be explored and see how it limits the TTL fingerprinting approach.

3.3.1 Time to Live Characteristics

Every module that processes the packet, such as a router, must decrease the value by one, even if it processes the packet in less than a second. Once this value reaches zero, the network discards the packet, resulting in it not reaching its destination.

It is proposed that the TTL could be used to monitor devices on a network (with two or more routers) to determine if they have been moved and alert the user. This is a relatively simple test, but may provide a second check for further testing methods to see if a device has been correctly identified [16].

3.3.2 Design

The network will be set up as follows for a normal operating condition:

Figure 2: Initial server-client setup The network will be set up as follows to simulate the device being moved to a different location on the network:

Figure 3: Client's network location changed

3.3.3 Method The initial test involved one PC connected via a router to another PC, with no other devices on the network and no internet connection. Using Windows Command Prompt, a ping command was executed at the host to the client PC. Wireshark showed it was using Internet Control Message Protocol (ICMP). A filter was created to only show packets from the required IP addresses and with the ICMP types for a ping request and response, then the selected packets were exported. This made the analysis simpler by only showing packets that were relevant to the tests. An external library was used to read the packets into Matlab to plot data and analyse results [18]. A library called TracesPlay was found and gave the basic tool to import packet capture data into Matlab. Once the library was imported into Matlab, the following command could be used to bring the results into a matrix. Each row represents a single packet with the columns showing the sending IP, receiving IP, capture time, IP ID and TTL respectively.

This worked well, however, the IP addresses were in decimal format and another function would be required to interpret them. Integer format (i.e. 3409667082) may be useful for sorting the data, although IP addresses are commonly displayed in dotted decimal (i.e. 192.168.0.1). Refer to Appendix 7.2.1 to see how conversion to and from these formats was done. Steps that are undertaken to create the fingerprint characteristic: A simple loop was created in Matlab to ping the remote computer four times in a row and repeat five times after waiting a three second delay each time. The Wireshark packet capture was enabled and the script was allowed to run. Once the pings had stopped, the packet capture was stopped.

A filter was applied in Wireshark to show only relevant packets and the packets exported as a .pcap file. The TracesPlay script was executed to import the packet capture to Matlab. From here, the mean of the row containing the TTL value was taken. This is the fingerprint value of TTL that would be associated with the client device.

3.3.4 Results

The device was moved to the same router as the monitoring PC and the same test was repeated. It was found that the TTL was incremented by one, validating that the network is functioning as expected. This change could be detected in Matlab and raised an alert as the value was different to the characteristic recorded for that device.

3.3.5 Discussion

Finding a mean value of the TTL for a device can be useful to help build a fingerprint. Using a mean would reduce the effect of packets occasionally taking a different route through the network, due to congestion at times. However, if the device was maliciously moved to another part of the network, the mean TTL is likely to change. This method could be circumvented by using a router with custom firmware installed on it [19]. Custom firmware can be used to force the router to increase or decrease the TTL of each packet by a certain amount. For example, if a device had a TTL of 126 and was moved to a position behind another router the TTL may be reduced to 125. With the help of an extra custom router after the device, the TTL of the packets could be increased to 126. One way of detecting this would be observing the transmission time, which will be discussed later. The effect of adding an extra router would increase the transmission time, as it introduces more processing delay and queuing delay if it is close to capacity. It is also important to note that in a home system with one router, the TTL would be the same for all devices. Small industrial networks usually operate on the same sub network, running through switches instead of routers. The switches do not decrease the TTL, making this method ineffective. Analysing the TTL would be more useful in wide area networks where there is more variance in the TTL.

3.4 Method 2: Internet Protocol Identification Number Fingerprinting

The IP identification number changes with each packet sent and the frequency of its change can be observed. Any deviation from the predicted value could suggest the device isn’t operating as it was originally, or was reset or modified in some way.

3.4.1 Internet Protocol Identification Numbers

The Internet Protocol Identification Number (IPID) field provides a way to distinguish fragments that belong to one datagram to those of another. This changes over time and could be used to determine some characteristics about how it changes relative to each device (i.e. a device that sends more data would have a faster changing identification number). This method examines the IPID to extract patterns that will be used to build a fingerprint for each device [16]. One factor to take into account when using the change in IPID is that it will reset to zero once it reaches its maximum.

3.4.2 Method

Description of system setup. Use two devices that are sending out different amounts of information to the network and try to distinguish the difference from the IP identification number. Creating the analyser script (code in 7.2.3): The analyser script loops through each group of four ping requests. It finds the difference in IPID from the first ping response in the group compared to the IPID of the first ping response in the next group. It then graphs them so the change in IPID number can be observed. For example, the table below shows two groups of ping requests where the difference in IPID number between Ping 0 and Ping 4 is 19 (120-101). The jump in IPID number between Ping 3 and Ping 4 happens because during the delay until the next ping group started, the device transmitted other data. Ping 0 1 2 3 4 5 6 7 IPID number 101 102 103 104 120 121 122 123 Table 1: Ping response IPID for two groups of four pings

Test 1: Initial IPID test

The purpose of this test is see how the IPID number varies under normal conditions. The setup is two PCs connected together via a switch. A simple loop was created in Matlab to ping the remote computer four times in a row and repeat five times after waiting a three second delay each time. The Wireshark packet capture was enabled and the script was allowed to run. This is in addition to limiting it to packets to and from the switch and client computer (ip.addr). Limiting the icmp.type to 0 or 8 then shows only the ping request and response packets.

Once the pings had stopped, the packet capture was stopped, the filter was applied and the packets exported as a .pcap file. The TracesPlay script was executed to import the packet capture to Matlab. The analyser script was run producing the following graph. (Refer to Appendix 7.3.1 for packet information)

Figure 4: Difference in IPID under normal conditions

Test 2: IPID change under higher data transfer rate

The purpose of this test is to see how the IPID number varies if the device is sending more data over the network compared to its normal rate. The same setup and packet capture process as Test 1 was used. A large (1GB) file copy was initialised from the client computer to the host computer. The ping script was then initialised within 5 seconds, producing the following graph. (Refer to Appendix 7.3.2 for packet information)

Figure 5: Difference in IPID when a file is being transferred over network

Test 3: IPID values with two client devices

The purpose of this test is to see how the IPID number varies if two devices are connected via the same switch. The same setup was used as Test 1, with an extra PC connected at the switch. The same packet capturing process was completed and allowed to capture for five hours. Figure 7 shows the difference between subsequent ping groups over this period.

Figure 6: IPID numbers received from two clients

Test 4: Long term IPID characteristics for fingerprinting

The purpose of this test is to see how a fingerprint could be established from a device operating under normal conditions. The same setup was used as Test 1.

Figure 7: Difference in IPID numbers over a five-hour time period

3.4.3 Results

The three main attacks that could be detected using this technique are; an identical device being added to the network, the device being accessed via the network more often, or the device sending out extra data due to changed programming.

Test 1 shows under normal conditions, the difference in IPID number should remain around 5 for the particular device tested. Test 2 shows that once a device is sending more data over the network, the difference rapidly jumps to a number above 1000 for the extreme case of a large file being transferred. It can be seen that the difference falls back to zero in Figure 5 which corresponds with the file transfer completing.

Test 3 shows the effect of connecting two devices to the network with similar properties. Figure 6 clearly shows the IPID numbers increasing to their maximum values, before resetting back to zero. The peaks occurring at different times shows that two devices are transmitting.

Test 4 shows how the difference in IPID numbers vary over a larger period of time. The peaks are associated with the device reaching its maximum IPID and falling back to zero. A fingerprint for the device could be created by taking the average of the IPID difference. To increase the accuracy of the fingerprint creation, IPID difference values above 50 could be removed in this case, reducing the effect of IPID number resets on the mean. From Test 2, it would be expected this mean would change if the device is accessed more frequently or sending more data than usual. This still needs to be tested on a real PLC which has more stable traffic compared to a PC.

3.4.4 Discussion and future work

The benefits of this method are that it does not heavily depend on network congestion; how busy the router is, or how long either computer takes to process requests. It is purely dependent on how much data is being sent by the client device. Malicious activity could involve someone outside of the local network accessing a device, causing it to send more data. Another situation could be the device is changed with one that executes processes in a different order or sends out extra data, however, more testing is required for these scenarios. Either of these attacks would be reflected in a faster changing IPID and are likely to be detected. An IP address spoofing attack could be detected by looking at the IPID numbers. This attack is unlikely as switches have trouble managing two devices with the same IP, resulting in them being disconnected at random times.

3.5 Method 3: Transmission Time Fingerprinting

The RTT for each device can be measured by ‘pinging’ the device and calculating how long it takes to receive the device’s response. RTT can be affected by many factors, such as how busy the device is, how busy the network is and at what time this measurement is taken. Looking for correlations between these factors may provide a higher degree of accuracy in monitoring for anomalies in devices. It is proposed that by looking at the RTT from the device and its nearest switch may provide information that can be used to identify devices. The reason the RTT to the nearest switch is also measured is to allow the effect of variable network traffic on a device’s RTT to be minimised.

3.5.1 Factors Affecting Transmission Time

RTT will be monitored to create a fingerprint and monitor devices. There are four main delays that affect the transmission time [20]. The first is the propagation delay of the signal across the network medium, usually a wire. This value is very small as the signal propagates close to the speed of light and over a short distance, 1km for example. Propagation delay would have the smallest effect on the RTT in a local network and changes in location would also have a negligible effect. Queuing and processing delays are also added as the packets pass through each router or switch in a network. These delays usually have a minimum value and will increase as the load on the network increases. The final delay is the processing time of the device that is being pinged. This delay would be dependent on the processing load it is under, which is related to how many tasks it is performing. For example, a PLC that is executing malicious code as well as the code it usually executes changes the load on the PLC, hence its response time to ping requests may change.

The following formula summarises these delays:

dRTT = dprop + dqueue + dproc + dresp

dRTT – RTT

dprop – Propagation delay over medium

dqueue – Queuing delay at switch depending on how busy it is

dproc – Total processing delay of interconnecting routers and switches

dresp – Response time of device being pinged

3.5.2 Method

The initial setup involved connecting two PCs via a switch.

Wireshark packet capture was initiated using a filter. This was done so that only ping requests and responses were shown. This is in addition to limiting packets to and from the switch and client computer.

Four ping requests were executed to the switch. This is quickly followed by four ping requests to the client PC. This process was repeated at twenty second intervals and was allowed to continue for five hours.

Wireshark packet capture was stopped and packet data was exported The Matlab Tracesplay library was used to import the packet data The host, client and switch IP addresses were entered into a script. The dotted decimal IP addresses were converted to integers for easy comparison to the packet data. The RTT for each ping request was calculated The average switch RTT, average client RTT and difference in RTTs (DRTT) between these was calculated and output. (Refer to Appendix 7.2.4 for code)

3.5.3 Results




Figure 8: Client RTT under normal conditions Figure 9: Nearest switch to client RTT under normal conditions

The output above shows the RTT for each the client and switch over the testing period. A small amount of correlation is visible between the two. This would be due the fact that the switch was sometimes taking longer, resulting in the switch taking longer to return packets for each ping reply from itself or the client PC. This could also be extended to larger networks which have variable loads. Using the difference value of RTT (DRTT) from the client and switch at a point in time aims to reduce this effect which is discussed in section 3.7. Looking at just the RTT to the end device also gives some insight to if an attack has occurred. The histogram below shows a plot of the fingerprint characteristic Acl vs an attack RTT distribution, Bcl.

Figure 10: Histogram of RTTs under normal (Acl) and attack (Bcl) cases

It can be seen in Figure 10 that most RTTs are under 3500μs, however there is an outlier at 5900μs. This suggests another method of detecting an attack is by setting a limit on the maximum allowable RTT.

3.5.4 Discussion

It is also important to note that these methods increase network traffic which may be undesirable in some systems. The effect of this could be minimised by increasing the checking interval. Another option would be to analyse data that is already coming from devices as it will contain the same information. This would mean that that the software would have to be tailored for each system, as devices will send packets at different rates, depending on the application. Setting the limit on RTT may also be a way to initially detect an anomaly, then the system could increase the sampling frequency to conduct further testing before raising an alarm. Due to the high variability in RTTs as seen in Figure 8, using the mean and standard deviation will not provide much information as to whether the device is under attack. This is also a result of the histogram without an attack overlapping the attack histogram, making it hard to find differences. Other methods of analysing the data will be discussed in the following sections.

3.6 Kullback-Leibler divergence and DRTT

Differences outside of tolerance ranges could be used to deduce certain changes in the network or device. For example, if the DRTT increased by a large amount or reduced below zero, it could be determined that the device is busier, or its position in the network has changed. Another case is a man in the middle attack, where an extra router is added between the device and the closest switch. This could increase the DRTT, creating an alert in the system. In all scenarios, an alarm could be raised to allow an administrator to determine the cause.

As the DRTT and sequence rate of change values are not normally distributed, using mean and standard deviation is not accurate enough to determine if two sets of observed data are sufficiently different to infer an attack has occurred. This is because the data sets are truncated at 0 and the effects of a constantly changing network environment. KL divergence is a measure of the distance that the measured distribution is from the true (fingerprinted) distribution. If the KL is zero, the measured distribution can be determined to be the same as the fingerprinted distribution. As the KL value increases, the measured distribution is moving further away from the fingerprinted distribution. It is proposed that a limit on the KL value is set, and once that limit is exceeded, an attack alert could be raised.

A simulation was conducted in Matlab using the normal device DRTT and then adding an extra delay to it. The extra delay was to simulate an extra ‘malicious’ switch being inserted in between the device and its closest switch. The delay function added a fixed delay and a normally distributed random delay to each time sample. Simulation delay formula: delay = 𝛼+N(𝜃,𝜎)

𝛼 > 0

N(𝜃,𝜎)- Random extra delay

The first test is the device against itself at a different time without an attack. It is expected that a small amount of variance occurs and this is simulated by adding the delay function with 𝛼=20, 𝜃=1, 𝜎=5. The second test is the device against itself at a different time with a man-in-the-middle attack. The simulation is done by increasing 𝛼, as a longer delay is expected and increasing the normal distribution parameters as more variance is expected. The parameters used were a=200, 𝜃=2, 𝜎=20. The Matlab KL divergence calculator script used was from source [21].

Figure 11: DRTT in fingerprint vs same device over different period

Figure 12: DRTT in fingerprint vs DRTT with device under attack

It can be seen in the first non-attack case (Figure 11), the distributions largely overlap which indicates there should be no alarm raised. The KL value for this case is 0.0050. The second case shows the attack has caused the DRTT to increase compared to the fingerprint (Figure 12), resulting in the blue distribution moving to the right. The KL value increases to 0.1595. This is a large difference in KL, so a possible way to detect attacks would be to set a limit on the KL value for each device.

The method of checking both the switch and device RTT does help to some degree, however, it is impossible to ping the two devices at the exact same time. The delay between pinging each device means that network traffic may change, affecting the results. To avoid this, it would be recommended to use these techniques on networks with simple Ethernet devices connected where the network load is relatively stable.

In actual tests, the two distributions overlapped almost completely, even under attack cases. Therefore, there was only a very little difference in KL between the fingerprint and a no alarm case, compared to the fingerprint with an alarm case. This method would be more useful if the switches added a significant delay or a long transmission medium was introduced. These cases would likely shift the distribution to be similar to the simulation.

3.7 CDF of Client RTT

In testing, the DRTT did not change as much as expected, due to the high speed of the switches. However, the switches increased the frequency at which certain RTTs occurred. Figure 1 shows a histogram of a normal scenario (Scenario A) against itself at a different time which should not create an alarm. The two distributions are a similar shape, with some variance over the range of RTTs. Figure 2 shows a histogram of Scenario A against Scenario B (an attack). B has larger peaks around 1500μs and 2400μs which can be used to create an alarm.

Figure 13: Histogram of device RTTs over 2 different time periods

Figure 14: Histogram of device RTTs under normal (Acl) and attack (Bcl) conditions

A cumulative distribution function (CDF) was chosen to gain a better view of the difference. The CDF gives the probability that some variable X takes values less than or equal to x: F(x)=Pr⁡[X≤x]

Translating this to the current scenario, the CDF gives the probability that a new RTT measurement will take a value less than or equal to a value in the range of possible RTTs.

Figure 15: CDF of normal device RTTs (Acl) vs attack RTTs (Bcl)

The two green arrows show where the peaks in Scenario B have shifted the CDF to the right compared to the normal scenario where there were peaks in the histograms. Overall the two CDFs are quite similar, as both distributions have a similar mean. Therefore, the mean value does not give an accurate indication of whether an attack has occurred.

The method used to detect this variance is to integrate each CDF for RTTs from 0μs to 6000μs and take the difference (Equation 1). This gives a measurement of the yellow shaded area as a percentage of the area under the fingerprint CDF (Matlab code in Appendix 7.2.5). For this example, the area below Scenario B’s CDF is less than Scenario A. A percentage limit can then be set on how much the difference can be before raising an alarm. The difference in this example is 2.80%. This is compared to the difference of the normal case, Acl(1:200) against itself Acl(201:400) which is 1.05%. The results suggest a limit of +/-1.5% on this value would mean man-in-the-middle attacks could be detected with a small rate of false alarm. Further testing of this will be conducted in the next section to verify the results.

Equation 1: Finding the difference between two CDFs

3.8 Sample window size and costs of making a decision

The optimal window size has been found to be 15 minutes of data with four consecutive pings every 20 seconds which equates to about 200 ping-response groups. This gives enough information to build up a CDF that can be used to distinguish attacks from normal operation and outlier longer RTTs will usually occur in this interval under attack conditions. In practice, pinging a device every 20 seconds would add too much unnecessary load to the network which may slow down other information transfers. Using the default MSDOS ping function from Matlab also gives four consecutive pings, however this could be changed to single pings by adding [-n 1] to the end of the command (Where 1 is the number of pings). This would also mean that the sampling time would have to be increased four times, to an hour to yield similar results.

A possible option in a real-time system would be to ping the device periodically and look for outlier longer RTTs. At this point the sampling rate could be increased, so an accurate CDF could be constructed. A sliding window of 200 samples could be used to compare against the fingerprint characteristic. If the CDF difference remains above 1%, it could continue taking samples and sliding the window, otherwise the outlier can be ignored and it would go back to sampling at the slower rate. If an attack occurs, it would be likely that the CDF difference rises above 1.5% in which case an administrator could be alerted.

It is also important to look at the costs involved in making a decision and detecting attacks. If the system says there is no attack when there is, the result may be catastrophic. This could involve another remote computer reading the inputs and changing outputs of a critical PLC in a manufacturing plant or power production facility. The cost of waiting for more samples from a device would be quite minimal as a man in the middle attack would take some time to set up and modify transmitted data. If an extra computer was connected to the PLC, the increase in IPID difference could be detected within about 10 samples at the slower rate (From testing the IPID difference almost doubles when another device is connected). There is a cost associated with the system saying that an attack has occurred when there hasn’t (false-positive). This cost would be much lower than the cost of missing an attack, as it would just involve an administrator doing some further checks to see what has caused the anomaly and the device would continue to function as normal.

## 4 Implementing Fingerprinting Techniques

The following tests involve applying the concepts and methods found in Section 3 to create a fingerprint and monitor devices under different scenarios. Various attacks will be set up and the device’s response will be checked against the fingerprint characteristics to see whether an alarm is produced. In the earlier stages of this project, IPID numbers and transmission time were used to develop a fingerprint for a device. Using a combination of both techniques, a wider range of attacks can be detected from analysing captured data.

4.1 Method

In this section, three attack types under varying network conditions will be tested and the results will be analysed. Attack 1 (Case 1) will be observing the system once a switch has been inserted between the device and its originally connected switch. This simulates a man in the middle attack where the inserted switch observes all traffic that passes through it and may have the capability to modify packet data. The attacker could then provide false data to controller devices on the network, which would appear to come from the original device. They could also modify or block information from passing to and from the device. It is expected that the DRTT will increase in this scenario and the cut-off for when the system should raise an alarm will be explored.

Attack 2 (Case 2) involves setting up another computer on the same LAN to access the device and read its measured values on a regular basis. This simulates an attacker connecting to a network point and reading values from any of the PLCs on the network. From here, they could easily change the outputs of the PLCs which could lead to catastrophic consequences, such as overheating a chemical process. It is expected that this attack will be detected by seeing the IPID increase at a greater rate as the device is sending out more packets.

Attack 3 (Case 3) will look at whether changing the program that the PLC executes changes its RTT characteristics. It is hypothesized that if a device’s programming is changed, the time taken to respond to ping requests may vary. This may not be the case if the device handles ping requests at the network interface, without any input from the main processor.

These attacks will be simulated initially using nothing other than the required switches, PCs and a PLC. However, in real life scenarios there are many other devices on the network transmitting data which has an effect on how fast the switches respond. This can have an effect on the RTTs, making it harder to detect attacks. The effect of this will be explored using a test on Attack 1 with an extra load on the switch. A robustness test will be transmitting a large amount of data between two other PCs connected to the same switch as the monitoring PC. This simulates a large file being copied over the network and gives the switch a much greater load to deal with.

The algorithm for detection is shown below: If (IPID¬ave > IPIDfp*1.3) where IPID¬ave is the measured average IPID number change and IPIDfp is the fingerprinted average IPID number change

Trigger multiple device access alarm

Else If (RTT > RTT¬fpMax) where RTT is the measured single RTT and RTT¬fpMax is the maximum RTT observed in the fingerprint

If the (absolute(CDFdiff¬) > 1.5%) where CDFdiff¬ is the percentage difference of CDF of fingerprint vs measured

Trigger topography changed alarm

Else If (absolute(CDFdiff¬) > 1.5%)

Trigger programming changed alarm

The algorithm shows three different alarms the monitoring system would be able to trigger. The set value for the maximum IPID change is 30% higher than the average of the fingerprint. If this is exceeded, a multiple device access alarm would be triggered. This indicates another computer is accessing the device through the network. The topography change alarm indicates the device has moved position in the network or a man-in-the-middle attack has occurred. This is triggered when a high RTT is observed in the sample time and the CDF difference is greater than 1.5%. The third alarm is a programming change which is triggered by just the CDF difference going higher than 1.5%. A very high RTT is not expected in this case as the network topography would remain unchanged, but the device would take a different amount of time to respond to ping requests.

Picture of set up

Figure 16: Equipment set up

4.2 Results

Case 0: Normal conditions (No-alarm)

The goal of the initial tests is to check that the characteristics of the device do not vary widely at different times. If the IPID or RTT changed too much under normal conditions, false alarms would be triggered. The blue distributions in the histogram represent the fingerprinted characteristic of the PLC under each particular network set up.

Test 1

The first test involved connecting the PLC to the PC through SA (Figure 17). The Matlab pinging function was run for 1 hour, pinging the device every 10 seconds while capturing all packets sent and received. The first fifteen minutes of data was used as the fingerprint which was tested against the results from the next fifteen minutes (200 samples), which shouldn’t cause an alarm, as nothing had been changed.

Figure 17: Initial layout (No-Alarm)

Figure 18: Histogram of device RTTs over two time periods

Distribution 1 maximum RTT 3347μs

Distribution 2 maximum RTT 3102μs

Difference in RTT CDF 1.05%

Table 2: Case 0, test 1 results

The difference between the CDF of the fingerprint interval against the next interval showed a difference of 1.05%. The average IPID change was 90.41 for the fingerprint and 90.41 for the next interval. The maximum RTT in both intervals was below 3500μs for all ping requests. This information is used to set limits on when an alarm is raised in the case of an attack.

Test 2

The test above was repeated with SA swapped for SB. A new fingerprint was created in the first half hour and tested against the next half hour interval. This was done to verify the results found and make sure the limits to be used would be applicable under different network set ups.

Figure 19: Histogram of device RTTs at two different times

Distribution 1 maximum RTT 3253μs

Distribution 2 maximum RTT 2572μs

Difference in RTT CDF -0.09%

Table 3: Case 0, test 2 results

The difference in the fingerprint CDF against the next interval was -0.09%. This is relatively low which was to be expected as nothing was changed in the network set up. All RTTs were again under 3500us. This is similar to the first test as the packets are only traversing one switch, so the delay should not be too different with other switches. Therefore, a maximum RTT limit of 3500μs and a CDF limit of +/-1.5% would be set to detect attacks if measured values fall out of this range. Under the proposed algorithm, neither of the above situations would cause an alarm.

Case 1: Malicious switch inserted (Alarm)

Test 1

The attack to be tested is a man in the middle attack using the fingerprint with just SA to compare against. This was simulated by inserting another switch (SB) between the PLC and monitoring PC (Figure 20). A hostile switch may be able to directly modify data within the packets. This could involve changing the values of inputs and outputs at the PLC, which could result in significant damage in industrial systems.

Figure 20: Layout with extra malicious switch inserted (SB)

Figure 21: Histogram of device RTTs under normal and attack cases

Distribution 1 maximum RTT 3253μs

Distribution 2 maximum RTT 4348μs

Difference in RTT CDF 3.43%

Table 4: Case 1, test 1 results

In this attack case, the histogram shows two distinct peaks around 1400μs and 2300μs which have been introduced with the addition of the extra switch. This has translated to a higher CDF difference of 3.43%. An RTT outlier also appears at 4348μs which is higher than any of the values in the non-attack case which were under 3500μs. The outlier would be detected as there is a RTT outside the 3500μs limit and the CDF limit of +/-1.5% would be exceeded, resulting in a TopographyChangedAlarm.

Test 2

A similar attack was simulated by swapping the switches around and using the fingerprint that only used SB to compare against.

Figure 22: Layout with extra malicious switch inserted (SA)

Figure 23: Histogram of device RTTs under normal and attack cases

Distribution 1 maximum RTT 3347μs

Distribution 2 maximum RTT 5807μs

Difference in RTT CDF 2.80%

Table 5: Case 1, test 2 results

Two peaks on the histogram are also more pronounced, similar to the first man-in-the-middle case. This again results in a larger CDF difference of 2.80%. A RTT outlier also appears at 5807μs which would be due to having the extra switch. Again, the maximum RTT and CDF difference limits would be exceeded, causing the TopographyChangedAlarm. Case 2: 2 PCs accessing PLC simultaneously (Alarm)

The next scenario is that an intruder is able to connect to the network and access the PLC at the same time as the monitoring PC. Once connected, the intruder could change outputs or monitor PLC data, compromising the system which could result in serious damages. Early testing has shown that if a device is sending more data, its IPID will change at a greater rate which is what will be tested. The characteristics from the test using just SA were used as the fingerprint.

Figure 24: Layout with extra PC polling PLC

The average IPID change of the fingerprint characteristic was 90.41 compared to 147.23 in this attack case. This is a large increase which is caused by the PLC sending extra data to the hostile PC. Under all other tests the average IPID values remained in the range of 85-95. As 147.23 is more than 30% greater than 90.41, this anomaly would trigger the MultipleDeviceAccessAlarm. Case 3: Code changed on PLC (Alarm)

This attack was done to see whether changing the code on the PLC had any effect on its RTT characteristics. The fingerprint created using SA was used (Case 0, Test 1). The initial code executed 10 ladders (blocks of code), 8 of these were removed to simulate the attack.

Figure 25: Histogram of device RTTs under normal conditions and when the programming has been changed

Distribution 1 maximum RTT 3253μs

Distribution 2 maximum RTT 3181μs

Difference in RTT CDF 2.351%

Table 6: Case 3 results

It appears that this attack changes the device’s response time to ping requests, as the difference in RTT is relatively large compared to the no-alarm cases. All RTTs remain under 3500μs which means that the TopographyChangedAlarm would not be raised. In this case, the Programming Change Alarm would be triggered as the CDF difference is greater than 1.5%. Further testing would be required to determine the extent to which the code needs to change before an alarm case could be detected.

Case 4: Two switches with high load on one switch (No-alarm)

This case tests how robust checking the CDF distributions is with loads on the switches in the network. Generally, loads on a switch would slow down the speed at which it can switch packets, however its effect on the alarming system will be investigated. Figure 26 shows the normal case with two interconnecting switches that was used to create the fingerprint. From here, two PCs were connected to SB and a large file was copied from PC 1 to PC 2 at 10MB/s (Figure 27).

Figure 26: Normal layout for new fingerprint case

Figure 27: Normal layout with extra devices transferring data – No alarm

Figure 28: Histogram of device RTTs under normal conditions and when extra PCs are transferring data on network - no alarm

Distribution 1 maximum RTT 3183μs

Distribution 2 maximum RTT 2794μs

Difference in RTT CDF 0.360%

Table 7: Case 4 results

The difference in the CDF distributions was 0.360% which is in line with other no-alarm cases. This suggests that varying network loads do not greatly affect the speed at which ping packets travel through the network. All RTTs are below 3500μs which is also consistent with other no-alarm cases and the CDF difference is below the limit, hence no alarm would be raised.

4.3 Discussion

From the above results, it can be seen that looking at a device’s network characteristics can be useful to detect attacks. Setting a limit of +/-1.5% would result in all man-in-the-middle attacks being detected. It would also mean that no false alarms would be triggered under normal operating conditions. However, sending a ping request to multiple devices on the network every 10 seconds in larger systems introduces undesirable loads on switches. It was found that with man-in-the-middle attacks, much larger RTTs started appearing. This suggests it may be sufficient to poll the devices at a lower rate and look for RTTs above a threshold. Once this is detected, the device could be polled at a faster rate for half an hour to build up enough data to check its CDF against the fingerprint. If the CDF difference was over the specified threshold, an alarm would be raised.

Changing the code that the PLC was executing also changed its RTT characteristics and could be detected by the detection algorithm. The fact that no abnormally large RTTs were observed in this case suggests that a separate alarm could be raised to notify an administrator that a device had been modified, instead of the man-in-the-middle attack alarm.

Observing the average IPID change proves to be effective in detecting if another device is accessing a PLC. A limit of 30% above the average IPID difference of the fingerprint would give an alert of attack. This limit also allows some flexibility in normal operation as the device may send out more data for small periods of time. A separate alarm could be raised in this case, notifying an administrator that a device was being accessed without authorisation, either by interference on the local network or remotely.

4.4 Future Work

For a commercial solution, the methods found would have to be implemented in a real-time detection system. All tests were done by pinging the device periodically using Matlab and MSDOS to execute the ping, capturing the data in Wireshark, then analysing in Matlab. To perform this in real time, another programming language would have to be used such as C# that could perform the ping, capture and analysis. A visual user interface would be useful to manage the fingerprints, alarms and set limits for each device.

Further testing would have to be done with different network loads, in larger networks and real-life environments to ensure the methods are still effective. The limits on each fingerprinted characteristic would have to be analysed to measure how accurately the system detects anomalies. More research into the sample size is needed to improve accuracy and decrease the network load of implementing a detection mechanism.

These concepts could be built into existing industrial IoT packages or a system could be built to monitor home IoT devices. The polling rate is likely to vary depending on the application, however, further research would be required.

It would be relatively difficult to ‘trick’ this system which is a possibility that hackers explore. To fool the IPID detection, a man-in-the-middle switch would have to be inserted that synchronizes to the IPID change rate under normal conditions and maintains the IPID change rate for packets destined for the monitoring PC. However, this attack would be detected by the RTT monitoring. More research and investigation into methods that hackers could use to fool this system would be required to implement techniques making it more robust against attacks.

## 5 Conclusion

Throughout this project, methods were explored that could be used to detect attacks on network connected devices. Monitoring TTL values has been effective with Internet servers in other studies, however, they do not provide much information in a local network. It was found that IPID numbers and RTTs could be used to detect three main types of attacks. The attacks were man-in-the-middle or a change in network topography, change in programming and a device being accessed by another computer. These could be detected by setting limits on the IPID change between pings, maximum RTTs and looking at the difference in RTT CDF distributions from a fingerprinted characteristic. Each device on a network would need to be fingerprinted under normal operating conditions and monitored, so alarms could be raised. IP address spoofing could also be detected by looking at the IPID numbers, however this attack is unlikely in modern networks which don’t function properly when a device with the same IP is connected. Future investigations would involve building a real-time monitoring system and testing it on larger scale networks. The limits on CDF differences and IPID differences may need to be varied depending on the application. Some environments may naturally have higher variability in these values, or may require a quicker response to attacks. All of the requirements from section 3.3 were met, except the requirement regarding excessive amounts of load on the network. Further research is required in this area to minimise the effect of the increased load from the monitoring system. The process found was to create a fingerprint based on a device’s response time and IPID numbers from ping requests. The device’s Ethernet traffic would be captured over a period of time and these two characteristics would be compared against the fingerprint to see if they exceeded the set limits before raising alarm. These limits were an IPID difference more than 30% greater, a RTT greater than any that were observed in the fingerprint, and a CDF difference greater than 1.5%.

These techniques could also be applied in home IoT networks, which would be more useful in the future where more devices such as home door locks, lights and other electronics become Internet connected and open to attacks from the outside world. In automation networks, PLCs are being connected via the Internet, allowing remote programming which has cost benefits for companies, but opens up a security risk that anyone could remotely access the device if they gain the correct credentials. By simply looking at the IPID difference, a remote user accessing a device could be detected and the device can have external access closed off while the threat is dealt with.

## 6 References

[1] M. Schukat and P. Cortijo, "Public key infrastructures and digital certificates for the Internet of things," in Signals and Systems Conference (ISSC), 2015 26th Irish, Carlow, 2015. [2] Microsoft, "Password Authentication Protocol (PAP)," 2005. [Online]. Available: https://technet.microsoft.com/en-au/library/cc737807(v=ws.10).aspx. [Accessed 29 Mar 2016]. [3] A. L. T. F. Dirk Reinelt, "Securing communication in automation networks," 5th IEEE International Conference on Industrial Informatics, pp. 149-154, 2007. [4] P. Flood and M. Schukat, "Peer to Peer Authentication for Small Embedded," Zilina, 2014. [5] E. Hjelmvik, "Passive OS Fingerprinting," 2011. [Online]. Available: http://www.netresec.com/?page=Blog&month=2011-11&post=Passive-OS-Fingerprinting. [Accessed 29 Mar 2016]. [6] T. Gibb, "OS Fingerprinting With TTL and TCP Window Sizes," 2012. [Online]. Available: http://www.howtogeek.com/104337/hacker-geek-os-fingerprinting-with-ttl-and-tcp-window-sizes/. [Accessed 29 Mar 2016]. [7] K. Kumar, "Hop Count Based Packet Processing Approach to Counter DDoS Attacks," in Recent Trends in Information, Telecommunication and Computing (ITC), Kochi, 2010. [8] S. Templeton and K. Levitt, "Detecting Spoofed Packets," 2003. [9] Q. Li and W. Trappe, "Detecting Spoofing and Anomalous Traffic in Wireless Networks via Forge-Resistant Relationships," IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, vol. 2, no. 4, 2007. [10] Q. Li and W. Trappe, "Relationship-based Detection of Spoofing-related Anomalous Traffic in Ad Hoc Networks," in 2006 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks, Reston, 2006. [11] H. Wang, C. Jin and K. Shin, "Defense Against Spoofed IP Traffic Using Hop-Count Filtering," IEEE/ACM TRANSACTIONS ON NETWORKING, vol. 15, no. 1, 2007. [12] A. Mukaddam and I. Elhajj, "Round trip time to improve hop count filtering," in 2012 Symposium on Broadband Networks and Fast Internet (RELABIRA), Baabda, 2012. [13] X. Wang, M. Li and M. Li, "A scheme of distributed hop-count," in Wireless Mobile and Computing (CCWMC 2009), IET International Communication Conference, Shanghai, 2009. [14] M. Ohta, Y. Kanda, K. Fukuda and T. Sugawara, "Analysis of Spoofed IP Traffic Using Time-to-Live and Identification Fields in IP," in Biopolis, Workshops of International Conference on Advanced Information Networking and Applications, 2011. [15] T. Kohno, A. Broido and K. Claffy, "Remote physical device fingerprinting," in 2005 IEEE Symposium on Security and Privacy (S&P'05), Oakland, 2005. [16] IETF, " INTERNET PROTOCOL DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION," 1981. [Online]. Available: https://tools.ietf.org/html/rfc791. [Accessed 12 Apr 2016]. [17] "Manual Transmission," Computer Science E-1, [Online]. Available: http://cse1.net/recaps/11-tcpip.html. [Accessed 03 Jun 2016]. [18] "TracesPlay," Sourceforge, [Online]. Available: http://tracesplay.sourceforge.net/. [Accessed 02 June 2016]. [19] "IP Tables Command," DD-WRT, 15 April 2015. [Online]. Available: http://www.dd-wrt.com/wiki/index.php/Iptables#Modifying_the_TTL. [Accessed 03 Jun 2016]. [20] "Speed, Rates, Times, Delays: Data Link Parameters for CSE 461," [Online]. Available: http://courses.cs.washington.edu/courses/cse461/98sp/issues/definitions.html. [Accessed 12 04 2016]. [21] N. Razavi, "Kullback-Leibler Divergence," Matlab, 15 Jul 2008. [Online]. Available: http://au.mathworks.com/matlabcentral/fileexchange/20688-kullback-leibler-divergence. [Accessed 16 Jul 2016]. [22] C. Jin, H. Wang and K. Shin, "Hop-Count Filtering: An Effective Defense Against Spoofed Traffic," in 10th ACM conference on Computer and communications security, Washington, 2003.

## 7 Appendices

7.1 Project Management

7.1.1 Work Breakdown

The project will be conducted in the following order. The initial background research will show ways in which device characteristics can be used. The different methods will be explored in order of expected complexity with each one building on findings of the previous. Finally, the last stage will be to develop a software tool based on all previous findings.

Introduction and literature review

Understand IP characteristics

Plan how software will be used to aid analysis

Explore different methods that could be used for fingerprint creation

TTL fingerprinting

IP Identification numbers

Transmission times

Developing final software tool

3.1 Combine methods into one fingerprint creation tool

3.2 Analyses traffic to check fingerprint

3.3 Test on larger scale systems

3.4 Conclusion of findings

7.1.2 Timeline

The Thesis will be developed in three stages. It will start with the first draft which is due on 22/04/2016. This will contain a basic literature review, introduction and subheadings to show the structure of the rest of the document. After this, further literature reviews will be done with some basic network tests to gain an insight into patterns that may help identify devices. From this, basic algorithms will be developed to create the fingerprint and analyse network traffic. These findings will be included in the next submission, the second draft, due on 04/06/2016. The final stage involves bringing the different methods together to create a reliable device monitoring prototype and testing its operation with multiple devices. This will be presented along with all other work in the final thesis, due on 21/10/2016.

Progress update 30/05/16: Patterns in IP packet characteristics identified and basic algorithms to test traffic created. Project is on schedule to start combining techniques to provide the monitoring facility and test its effectiveness.

Table 1 gives a breakdown on how the work will be carried out with critical dates and timeframes for tasks outlined.

Table 1: Project Timeline (dates)

Research existing authentication methods (29/02/2016-11/04/2016)

Complete literature reviews and plan structure of thesis (12/04/2016-22/04/2016)

MILESTONE: Draft 1 of Thesis due on 22/04/2016

Use packet capture software and Matlab to identify patterns in Ethernet traffic (23/04/2016-04/05/2016)

Time to Live characteristics

IP identification number characteristics

Transmission time characteristics

Analyse effectiveness of techniques and if any complement each other (05/05/2016-27/05/2016)

Build and test fingerprint creation tool

Build and test traffic monitoring tool

Develop prototype software tool to provide creation and checking from packet capture files (30/05/2016-08/07/2016)

Present and discuss recommendations and prototypes (28/05/2016-14/10/2016)

Add any extra literature reviews and sources required to further develop system and test on larger scale networks (28/05/2016-14/10/16)

MILESTONE: Draft 2 of Thesis due on 04/06/2016

Update Thesis as required from feedback (20/06/2016-20/10/2016)

MILESTONE: Final Thesis due on 21/10/2016

10. Prepare presentation items for exhibition and final seminar day (01/10/2016- 04/11/2016)

MILESTONE: Exhibition (24/10/2016-28/10/2016)

MILESTONE: Final seminar (31/10/2016-04/11/2016)

7.1.3 Budget

Most components required such as PCs, software and a network to test are readily available at Adelaide University. A budget of \$250 per semester is provided by the university and will be reserved for any extra equipment that tests may require.

7.1.4 Risk Analysis

The following risks may affect the project:

Loss of work

This could happen if hard drive failure causes the loss of documents and programming completed for the project. It is unlikely to occur and the risk will be mitigated by making regular backups online and offline (i.e. USB backups)

Not completing work in time

This risk may cause deliverables to not be submitted on time, impacting on project results. This risk is reduced by making sure all work is done consistently through the semester and not left to the last minute.

Research must be conducted in an ethical, responsible and legal way.

7.2 Code Snippets

Conversion from dotted decimal to integer:

Conversion from integer to dotted decimal:

strcat(num2str(bitand(bitshift(IPVector,-24), 255)) ,'.',num2str(bitand(bitshift(IPVector,-16), 255)) ,'.',num2str(bitand(bitshift(IPVector,-8), 255)) ,'.',num2str(bitand(bitshift(IPVector,0), 255)))

MATLAB ping

7.2.3 IP ID analyser

7.2.4 Round Trip Time analyser




7.2.5 CDF difference calculator

7.3 Output

7.3.1 Section 3.4.2 Test 1 output

First row is source IP, second is destination IP, third is received time, fourth is TTL, fifth is IPID

7.3.2 Section 3.4.2 Test 2 output

First row is source IP, second is destination IP, third is received time, fourth is TTL, fifth is IPID

7.4 Estonia Tour Report

During the winter break, our honours project group went on a study tour to Estonia to learn about cyber security. We visited government officials to learn about their electronic government system and attended a 1-week summer school on cyber security.

7.4.1 Introduction

The Estonia study tour was a great experience where we learnt a lot about cyber security and worked on our individual honours projects. The environment we were in allowed us to rapidly learn new concepts and work collaboratively with peers and lecturers. Being immersed in the Estonian culture was an interesting experience as we saw sights around the city and learnt about their digital e-Government system. The summer school taught us digital forensic analysis techniques and how to work with lawyers to present a case in a moot court.

7.4.2 Positives

A week was also spent working on our individual honours projects. During this time, we worked together discussing different ideas in preparation for and prepare for the ICR conference. Having lecturers working with us was valuable as we could get quick answers to questions and feedback could be given. Presenting at the ICR conference helped me gain stronger direction as to where my project is going and gave me feedback from experts who attended.

7.4.3 Personal Highlights

My personal highlights include the Mektory visit, the KGB museum, the summer school and exploring the Old Town. The Skype tour was also interesting and gave a different perspective of a working environment. Workers were given flexible working hours and dedicated rooms to relax and play games with each other. We also experienced riding in Tesla self-driving cars on some of our taxi rides. Not only was it fun to ride in the car, but we also discussed the security implications of Internet connected and automated cars. We were also given the opportunity to have some amazing meals and experience the local cuisine. Some of the more unique foods we ate included elk soup and ox rib which tasted excellent. Eating at Umami, an outdoor restaurant was a pleasant experience complemented with great food. Walking to the markets allowed us to purchase fresh fruit and pastries and enjoy the countryside scenery. A few of us decided to go to the Seaplane Harbour maritime museum. It had many interesting exhibits and allowed us to explore a submarine and handle historic weapons that were used on ships. I would recommend visiting the museum, to anyone interested in maritime and weapons history. On the final weekend, we took a day trip to Helsinki to do some sightseeing. It was a busy day that involved a lot of walking, but we squeezed in most of the major sights in Helsinki. The ferry ride was extremely comfortable and got us there early, giving us plenty of time to explore. I would definitely recommend future students to visit there as it is so close and even stay the night, if they have time.

We visited the TV tower which gave a fantastic view of the city and showed us some of Tallinn’s history. We were also allowed to walk around the outside of the tower in harnesses and sit on the edge which was a great experience, although a bit frightening at first.

7.4.4 Recommendations

I have a few recommendations to improve the study tour in future years. The summer school was conducted relatively well, with a good balance of group work and lectures. I think there could have been more lectures about what to look for in digital forensics with examples and less focus on how to use the software which was shown on the first day. Also learning more about what was expected in a technical/legal argument would help as we were unsure at first how we should present our findings to the lawyers and whether it was important to the case. Having more people with a law background would also help the groups work better. We only had one person with legal background and it was hard for them to manage what they needed from the team and they had no one to bounce ideas off of and share the load. Bringing law students from Adelaide is an idea that would have been beneficial as it would be easier for the technical people from Adelaide to work with them and also increase the law presence at the summer school. The study tour group size worked well, although a few less would give more time for supervisors to focus on individual projects. If half of the students were law students, the load could be balanced with the law supervisor for example. The bus passes and phone SIM worked perfectly and allowed us to communicate and travel easily. The food and taxi payments were done by individuals and was sometimes hard to manage and keep track of expenses. I would recommend some sort of prepaid credit card with a few that could be distributed to the group. The card could be linked to taxi apps for group travel and personal cards could also be linked for personal travel expenses. Overall, the study tour was very well organized and was an enjoyable and insightful experience. It was the perfect combination of sight-seeing, group socializing, learning and teamwork which helped achieve its outcome. The tour went for the right length of time, as we were able to explore much of Tallinn and also complete the required work.

## ~~ Abstract

Industrial Control Systems (ICSs) were originally designed as stand-alone networks and were not intended to be connected with outside networks and the Internet. However, as industry has evolved, these interconnections have become a necessity. In an age where cyber-crime is prevalent, legacy devices and communication protocols provide inadequate security, leaving systems extremely vulnerable to outside attacks. The insecurities of a connected network threaten business operations and human safety. Despite these risks, ICS security is an ongoing issue within many industries. Investigation shows that there is an overall lack of cyber security awareness, education and interest throughout industry. System security is not valued as an investment and ICS security standards are not utilised because they are not mandated. Studies also show that many ICS security publications are lacking in critical content relating to Change Management (CM) processes, risk management, assessment and evaluations, and the development of security metrics. In the instance where positive security changes have been implemented, behavioural factors can also come into play: Practical drift and polarity within an organisation can conflict with security measures and must be considered throughout CM processes. A study of CM processes within pharmaceutical and chemical industries identified 10 ‘best-in-class’ CM characteristics used to develop and test a series of change document templates. These templates were designed as a starting point for industry and the development of effective CM processes. The development of suitable security metrics required extensive study. Studies showed that simpler security metrics were more effective at conveying security levels to non-technical personnel. Two straightforward metrics were identified; attack potential index and security breach consequence index, which were used to create a risk matrix. This thesis aims to provide industry with insight and expanded awareness of ICS security risks. It aims to assist the development of effective and lasting ICS security CM processes

## ~~ Acknowledgments

I would like to acknowledge my project supervisor, Matthew Sorell and my two unofficial project supervisors, Nickolas Falkner and Yuval Yarom for their invaluable advice, support and guidance throughout the year. I would also like to thank my fellow cyber security group members for their ongoing encouragement and advice.

## ~~ 1 Introduction

Industrial Control Systems are an integral part of many industries, including; power, water, oil and natural gas, transportation, chemical, pharmaceutical and manufacturing. They are responsible for the control and monitoring of specialised processes critical to business operation. Over the past few decades, industry has gradually integrated control networks with IT networks and the Internet. While this offers more economic, user-friendly data management and plant control, it has also made inherently insecure control systems vulnerable to cyber threats. Fundamental differences in operational priorities between ICS and IT networks means that traditional IT security measures are not suitable as the only line of defence for ICS security. Specialised devices and security software exist, and ICS security standards and guidelines are available that can be used to improve control system security, but evidence indicates that these solutions are not being utilised effectively. This research project investigates the behavioural, financial and managerial issues responsible for the current state of ICS security; it is critical that these problems are brought to the foreground so that industry is aware of ICS security risks and can treat ICS security as seriously as they would a physical structure or business critical process. This project also focuses on the development and application of effective CM document templates that can be used by industry as building blocks for the improvement of security plans and CM processes.

## ~~ 2 Background

The term Industrial Control System (ICS) is broadly used when describing computer-based industrial process control and monitoring [1]. ICSs are found in a variety of industries, including: power, water and waste-water, oil and natural gas, transportation, chemical, pharmaceutical, manufacturing (automotive, aerospace, and durable goods, etc.) and food and beverage [2]. These control systems are generally composed of one or more specific architectures, including Distributed Control System (DCS) architecture and Supervisory Control and Data Acquisition (SCADA) architecture [3].

Systems using DCS architecture were originally process oriented: Data is sent and retrieved directly from devices in the field using industrial communication protocols [4]. Systems using SCADA architecture were typically data-gathering orientated – while capable of complex process control, the main emphasis behind these systems was on quality data presentation [4]. Over time, the differences between DCS and SCADA system architectures have become subtle due to technological advancements and the two are slowly merging into the same entity [4]. For the purposes of simplicity, this research project will solely analyse ICS networks with SCADA architecture.

SCADA systems are used to provide real-time monitoring and control, report generation, data logging and archiving for industrial processes. They can span multiple sites over large distances and are implemented within a variety of industrial sectors. They consist of; Human Monitoring Interfaces (HMIs) – controller operating panels with a graphical user interface (GUI) used to monitor and control system parameters; Programmable Logic Controllers (PLCs) – programmable embedded computers that send and receive data to and from external field devices; Field Devices – devices that measure physical variables such as thermometers, encoders and motion sensors, or send signals back to the controller, such as stop buttons and level sensors; Telemetry system – a communications network used to connect dispersed control systems together; Supervisory system – All data from the control network is gathered for monitoring purposes, and higher level controls. Historian – Software used to graph time-stamped system data for analysis [4].

Over the years, SCADA systems have evolved through four distinct generations of architecture – monolithic, distributed, networked and Internet of Things (IoT) – each defined by certain technological advancements [5]:

1. First generation: monolithic SCADA networks functioned independently from all other networks as common network services were not common at the time. Computing was achieved using mainframe computers.

2. Second generation: distributed systems implemented standard proprietary communication protocols, allowing distributed control processing and real-time information sharing.

3. Third generation: networked systems adapted open Internet Protocol (IP) standards and communication protocols, such as TCP/IP and UDP in addition to industrial protocols like Modbus. This allowed SCADA networks to function over Wide Area Networks (WANs).

4. The emergence of Cloud Computing and the Internet of Things (IoT) have paved the way for a fourth generation of SCADA architecture. Remote network servers in the Cloud provide centralized data storage, allowing more efficient scaling of SCADA networks. It is believed that the IoT will allow greater interconnectivity of device and improve maintenance and infrastructure costs – intelligent field devices will be able to report directly to a centralised control point, alleviating the need for intermediate devices such as PLCs. Human Machine Interfaces (HMIs) may no longer be constrained to one centralized location, but accessible anywhere from mobile devices [4].

The progression of SCADA architectures is reflective of industry’s drive to achieve more economical data management systems. Industry has become increasingly dependent on profits resulting from the interconnection of ICS networks and corporate IT networks such that they simply cannot afford to revert back to old ICS architectures to address emerging cyber security issues [6].

## ~~ 3 Literature Review

This literature review begins with an analysis of the current state of industry control system security, highlighting problems arising from the integration of legacy control system architectures with corporate IT networks. Section 4.2 provides examples of significant ICS cyber-attacks, including Stuxnet and the recent Ukrainian power station shutdown. It outlines general ICS attack vectors and provides insight into the IoT search engine, Shodan and the prevalence of modern hacking tools. Section 4.3 identifies ideal ICS security characteristics and lists available system security improvements. Section 4.4 investigates why available security standards and guidelines are not improving ICS security and a study on the effectiveness of ICS security publications is revised. Section 4.5 identifies issues with upper management as a contributing factor towards poor ICS security. Section 4.6 recognizes problems in justifying ICS security costs. Section 4.7 investigates the lack of security awareness, education and interest within industry. Section 4.8 outlines behavioural problems within an organisation that work against positive security changes, and contribute towards ICS security problems. Section 4.9 summarises all findings throughout the literature review and expresses the need for effective change management (CM) in industry to improve the state of ICS security.

3.1 ICS vs IT Security

A majority of existing ICS networks utilize third generation architectures. Unfortunately, the integration of control networks with corporate networks and the Internet have exposed insecure industrial infrastructures to modern cyber threats. SCADA systems and components were never originally designed with security in mind, and are therefore intrinsically insecure [3]. This is still an issue with a majority of modern ICS components – vendors manufacture products based upon existing market needs, and the current mainstream market prioritises cost and functionality over security [7].

IT security is centred on Confidentiality, Integrity and Availability (CIA) of critical information assets [8]. It aggressively and promptly implements security controls in response to new cyber-threats. New viruses for example, are detected and managed by anti-virus software; new anti-virus signatures are created and deployed during regular software updates, however this process does not occur immediately [6]. ICS security prioritises Safety, system Reliability and system Continuity (SRC) [6]. Delays in dealing with safety failures can potentially cost lives, while failures in system reliability can be damaging to an organisation’s reputation and cost money. Changes and updates to device software generally require down-time, which results in loss of continuity. ICS security must prevent cyber intrusion entirely, and no breach in security is acceptable. Many of the above differences result from the fact that logic executed within ICS has an effect on the physical world [2]. For these reasons, industry cannot rely solely on existing IT security for control system defence, and ICS cannot apply IT type security solutions without compromising fundamental ICS priorities [6].

IT and ICS networks are also exposed to different types of cyber threats. IT attacks are focused on data and monetary theft, and Denial of Service (DOS). ICS attacks are often centred on espionage and sabotage [6]. An Advanced Persistent Threat (APT) is a network attack in which an unauthorised person gains access to a network and remains there undetected for an extended period of time. These types of attacks are a serious concern for both IT and ICS networks. APTs are generally well-funded and sophisticated in nature. An APT attack will typically begin with a reconnaissance-type attack aimed at determining network security vulnerabilities. It will then infiltrate the network using malware, locate critical data or sub-networks and establish communication channels back to the host [9].

3.2 Cyber Attacks

A recent example of an APT attack occurred in December 2015 where hackers were successful in infiltrating the Ukrainian power network and disconnecting electricity to roughly 80,000 customers for up to three hours by remotely turning off electrical isolators for multiple sub-stations. Malware named BlackEnergy was found on several systems, and is believed to be the tool used by hackers to cause the power outages. Hackers also launched a Telephone Denial of Service (TDOS) attack on the power company’s customer help centre to prevent legitimate customers from reporting the outage, further extending outage time [10].

An example of a more sophisticated politically-motivated APT, and a significant milestone in the history of cyber-attacks and cyber-warfare, is Stuxnet. Allegedly an American / Israeli government creation, Stuxnet is a virus that was released with the intention of sabotaging Iran’s nuclear program. It was smart enough to avoid standard IP protections – it utilized four Windows zero-day vulnerabilities (unknown ‘gaps’ in software), exploited holes in firewalls and avoided AV and human detection for up to a year. It infiltrated the facility’s air-gapped control network via USB, and propagated through the network scanning for Siemens Step7 software on computers controlling Siemens PLCs. In 2010 Stuxnet was activated, destroying hundreds of centrifuges [11].

Shodan is a publicly accessible search engine like Google but for the IoT. Rather than retrieving web page content, Shodan locates banner information which can be used to identify ICS devices exposed to the Internet [13]. A quick search for ‘SCADA’ in a Shodan search browser reveals the following exposed devices (Figure 1). Roughly 269 devices were immediately located using the following communication protocols; Modbus – a legacy PLC protocol; File Transfer Protocol (FTP) – protocol for communications within a client/server network; Network Basic Input/Output System (NetBIOS) – a protocol for devices communicating over Local Area Network (LAN); Simple Network Management Protocol (SNMP) – a common SCADA communication protocol. These protocols offer little to zero security and should definitely not be exposed to the Internet: They are very open to DOS attacks and can be easily queried allowing data to be read from or altered with the help of available hacker tools. It is no wonder that articles written about Shodan describe it as “The scariest search engine on the Internet” [14] given the current state of ICS security.

The gradual evolution of hacking techniques and software over the years has drastically reduced the level of sophistication and knowledge required for the execution of successful cyber-attacks [15]. The below figure provides a visual representation of these factors over the last few decades (Figure 2). An individual with motive, time and minimal resources has the potential to do significant damage to many insecure control systems.

Figure 2: Attack sophistication vs. intruder technical knowledge [15]

Cyber-attacks are on the rise. The below data recorded by Homeland Security reveals that a total of 295 cyber incidents were reported in 2015 (Figure 3) – an significant increase of 20% from 2014. Further to this, a study by the Ponemon Institute reveals that only 43% of organisations surveyed implemented thorough security incident detection and alerting technologies [16]. Of these 43%, how many actually report cyber incidents? The true number of cyber incidents per year is most likely much higher.

Figure 3: 2015 Cyber Security Incidents by Sector [17]

3.3 ICS security Measures

In general, an ICS should adhere to the following security objectives, as identified by National Institute of Standards and Technology (NIST) guidelines for ICS security - guideline SP800-82 [2]:

1. Restrict logical access to the ICS network and network activity. 2. Restrict physical access to the ICS network and devices. 3. Protect individual ICS components from exploitation. 4. Restrict unauthorized modification of data. 5. Detect security events and incidents. 6. Maintain functionality during adverse conditions. 7. Quick restoration of the system after an incident.

There are currently a variety of effective security measures that can be used to assist in implementing the above goals. 1. Whitelisting is the act of maintaining a list of trusted devices which have been granted permissions by the administrator. It is a good fit for ICS security and an alternative to anti-virus software, and can be implemented within the system at the field device level. Modern PLC devices commonly include their own operating systems (OS) and are able to implement whitelisting. Any device that is listed is able to communicate with the PLC [6].

2. Device firewalls can be installed locally between field devices and the rest of the network. They prevent unauthorised use and access of the device, eliminating potential avenues of attack within the control network. They are much simpler in design than conventional IT firewalls and as a result, more secure [6].

3. Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) are available at the control network layer. IDS and IPS characterize traffic within the network and identifies unexpected anomalies and intrusions. A positive detection could indicate a malfunction or malicious attack [18].

4. Security Information and Event Management (SIEM) gathers all network security information at one centralized location. SIEM often integrates IDS and correlates local conditions with cloud-based global threat intelligence [6].

5. Unidirectional security gateways offer highly secure data transfer, and are believed to be 100% secure against external network attacks. Data is transmitted through an optical fibre link connecting two gateways. The transmitting gateway has a laser, and receiving gateway has a photocell meaning data can physically travel in one direction only [6].

6. Network zoning is the segregation of control network from corporate networks. This results in a control WAN, and a corporate WAN that are completely separate making control systems more resistant to cyber-attacks by limiting paths of attack [6].

7. Corporate / Control Network Firewall and Router. The introduction of firewall / router combination between corporate and control networks can offer significant improvements to control system security [2].

Combinations of these solutions can improve ICS security robustness and help protect against malicious attacks. However, their implementation generally requires change to existing ICS networks and a break in operation continuity and system availability. There are also a range of associated costs: Significant initial investment is generally required to upgrade legacy control equipment and infrastructure and handle related down-time costs. Recurring security expenses may include maintenance, monitoring and staff training. Industry budgets often only allow for once-off infrastructure costs, rather than ongoing maintenance / monitoring costs [7].

3.4 Standards and Guidelines

Guidelines for securing ICSs already exist; NIST provides extensive documentation that identifies typical cyber threats and common ICS security vulnerabilities, and outlines remedial actions that industry can take to minimise associated risks [2]. However, despite the availability of such tools and guidelines, ICS security remains insecure.

In 2000, a disgruntled former employee from Hunter Watertech – a company that installed SCADA systems for radio-controlled sewage equipment in Maroochy, Queensland Australia – caused 800,000 litres of raw sewage to overflow into local parks and rivers by driving to sites and remotely sending radio commands to sewage control gear. This event could have been avoided had the organisation followed NIST Guidelines [19]. NIST Special Publications (SP) 800-53 – Security and Privacy Controls for Federal Information Systems and Organizations – provides a catalogue of security and privacy controls for organisations and processes for selecting controls which protect assets from cyber-attacks. Hunter Watertech did not have any cyber security policies or procedures in place: Personnel were not trained to react to cyber security incidents. There were no contingency plans in place to deal with emergency situations. Furthermore, there was a lack of audit capability for determining system vulnerabilities. All of these factors would have been addressed following ICS security guidelines, and likely prevented the malicious event.

A study conducted by the European Network and Information Security Agency (ENISA) on protecting industrial control systems, has identified a general mistrust in industry towards security guidelines. This lack of confidence arises from of a variety of factors: There is sometimes mistrust of the organisation producing the publication. The publication may be addressed to employees with technical background, rather than management who may misinterpret recommendations [20].

Knowles et al. [21] analysed 22 well known and used ICS-specific security publications (including 17 guidelines and 5 standards) to determine how genuinely useful they were to industry. Each publication was assessed based on the following categories deemed important for effective ICS security management (Figure 4).

- Risk Management and Assessment (RMA) - Asset Identification and Classification (AI&C) - Threat Assessment (TA) - Vulnerability Assessment (VA) - Risk Level Evaluation (RLE) - Recommendation of Countermeasures (RoC) - Change Management (CM) - Qualitative Metrics (Ql) - Quantitative Metrics (Qa) - Link between Safety and Security (S)

Figure 4: Visualization of the publication analysis results [21]

The above results reveal that overall, numerous publications were lacking in many important categories. While most include recommendations for attack countermeasures, they are generally only provided at high levels of abstraction, and exclude important details on how, where or when measures should be implemented. Another issue stems from the fact that there are countless variations of control system architectures, functioning with distinct system requirements and security priorities. As a result, ICS security publications are designed to include generic content, to accommodate for control system heterogeneity [21].

Most publications don’t include sufficient threat assessments, which is important in identifying what specific or likely threats an industry should be concerned about. Additionally, many exclude guidance on system risk management, risk assessment and risk evaluation which is important for determining control system security posture.

There is an absence of effective CM recommendations throughout most publications: Providing recommendations for attack countermeasures certainly adds value to security engineers, but providing guidance on correctly implementing recommendations is more of an imperative [21]. It is clear from this report that while these publications are used, they do not fully address all ICS security requirements.

3.5 Issues with Upper Management

The study by ENISA on protecting industrial control systems identified the following Key Finding (KFs) relating to managerial problems facing industry [20]: KF 1.4 – Lack of involvement of top management; it is identified that top management are generally not spending enough time focusing on ICS security and are operating under the incorrect assumption that they are currently doing enough.

In 2014, a research survey was conducted by the Ponemon Institute [16]. 597 specialists working in IT, IT security, compliance and risk management were surveyed. All participants were involved in security management activities within their organizations. The survey revealed that only 13% of participants rated their company’s security posture as strong, while 33% indicated that their respective high-level management believed it to be strong. The below graph displays the difference in security posture ratings between those directly involved with system security, and those involved with management (Figure 5).

Figure 5: Organization’s Security Posture [16] Various factors cause this obvious divide between both parties. One is a break-down in communications. 71% of those surveyed indicated that communications between upper management occurred less frequently than required, while 51% admitted to filtering negative security occurrences before relaying them to upper management, and 63% would only relay information after a security incident had transpired [16]. Further studies reveal that in addition to low communication frequency, upper management also often finds security reports and information too technical to be understood - 58%, or the reports are misinterpreted (Figure 6).

Figure 6: What’s Wrong with Communications [16]

Another issue for this information divide is indicated by 43% of survey participants, which points to a lack of interest by senior executives; security is not a direct economic point of interest for industry, and is commonly viewed as a cost rather than an investment [20]. This points toward a general lack of awareness of potential cyber-threats, which is expanded upon in section 3.7 – Awareness.

3.6 Justifying Costs

According to ENISA KF 1.4, lack of involvement from upper management is also attributed to the perception that budgeting for ICS security is a financial loss rather than an investment. This is because benefits of better security are generally not expressed in terms of immediate profits, but instead in terms of loss prevention [7]. This is reinforced by ENISA key finding, KF 3.10 – Compliance is not a market driver in ICS security; as ICS security is not regulated or mandated in most industries, there is little market drive for making large investments; why would management budget for additional ICS security – which for many companies do not perceived as necessary – when they don’t have to? [20].

Statistics from another Poneman Institute study indicates other reasons why participants feel there are barriers in managing IT security changes effectively (Figure 7) [16].

Figure 7: Significant barriers to managing IT security changes effectively [16]

The largest impediment to implementing effective security changes results from resource and budgetary limitations, as indicated by 43% of participants. 37% also indicated that their company lacked skilled or expert personnel, which can also be attributed to financial restrictions. Another ENISA key finding, KF 4.6 – Developing security programs, too costly for operators; indicates that in many instances security investments are too expensive, and rather than replace older equipment and update security systems, companies employ compensatory controls to avoid investment [20].

While ICS security investments may be costly, the use of effective CM can structure to decision making processes. ENISA key finding, KF 8.4 – Top management awareness to be fostered; indicates that security costs should be defended, and presented to upper management as a business driver with economic benefits [20]. CM tools can be used to present ICS security changes to management in such a way that the benefits are clearly understood.

With proper change procedures in place, security changes can be implemented more efficiently; ineffective changes can be prevented; security problems can be identified early, reducing the risk of costly security breaches; security audits can be more focused and effective. More effective changes will ultimately lead to more secure networks.

3.7 Awareness

The Organization for Economic Cooperation and Development (OECD) guidelines on building a culture of security, said the following: Awareness of the risks and available safeguards is the first line of defence for the security of information systems and networks [22] . The consequences of not implementing proper control system security should be obvious. In 2015 PricewaterhouseCoopers conducted a survey of more than 10,000 employees in security-related management positions all around the world. The bar graph below represents industries of varying sizes, and lists estimated total financial losses resulting from all security incidents over the past year (Figure 8). These figures represent significant financial losses and provide a clear indication that cyber security should be respected and prioritised accordingly. However, despite this fact security for ICSs is under-appreciated, under-budgeted and poorly managed.

Figure 8: Estimated Total Financial Losses as a Result of all Security Incidents (US dollars) [23]

The below data from Homeland Security sorts security incidents reported in 2015 by attack vector. Of the 295 reports, a majority of attack paths were classified as unknown (Figure 9). Another significant portion of reported attacks resulted from spear-phishing. A majority of spear-phishing attacks can be prevented with basic IT security, such as traffic monitoring software or email sandboxing – this provides an isolated computing environment that can test email links and attachments for malicious code or unusual behaviour [24]. These attacks can also be reduced by improving employee awareness; while security solutions should not rely on ‘fixing’ user behaviour, in this regard – a user should feel safe clicking links and opening attachments [25] – awareness of suspicious emails and links is also important. While highly targeted spear-phishing attacks are hard to identify and prevent, more common attacks could be prevented using basic security tools, and improving employee behaviour. The large numbers of successful attack due to spear-phishing and of unknown origin, indicate a lack of threat detection resulting from under-developed system security. In many cases, poor security posture is a result of poor threat awareness and attitude towards cyber-security.

Figure 9: 2015 Incidents by Infection Vector (295 total) [17]

A publication by the Global Conference Cyber Space (GCCS) in 2015 indicated that there is a general gap in cyber security awareness, education and interests relating to ICS security. Many ICS engineers still apply legacy thinking when it comes to designing and maintaining control systems, unaware of new dangers presented by increasing interconnectivity. It is common to assume that IT security is enough to protect ICS networks or that hackers are not interested in disrupting certain industries, when in fact this is incorrect. IT engineers generally have greater knowledge and awareness of cyber security and potential dangers, but are historically not responsible for control systems and are not aware of ICS technology particularities and limitations [7]. Industry needs to have full awareness of modern cyber threats in order to implement effective managerial, behavioural and physical changes to ICS security.

3.8 Behavioural Problems

Due to the critical nature and complexity of many control systems, any disruption affecting system continuity and reliability can potentially affect a business’s profit and reputation creating strong reluctance to make any changes. It is important for industry to overcome these fears and implement security controls where necessary [7]; a planned shut-down for maintenance or system upgrades is a much cheaper and less embarrassing option than dealing with an unexpected shutdown or incident resulting from a cyber-attack.

While the repercussions of neglecting security can be both damaging and costly, ICS security is often overlooked, underestimated and/or dismissed as a non-issue. The design, deployment and operation of legacy SCADA networks lack concern for device and network security [7]. This is evident in above Ponemon Institute study – What’s Wrong with Communications, where 43% of surveyed participants indicated that management were simply not interested in hearing network security information [16].

One of the key characteristics of effective CM is proper implementation. In order for a change to remain effective, those responsible for implementing the change must be aware of relevant behavioural challenges and behavioural ‘resistances’. They must create ownership and support relating to the change, and they must be able to establish new behaviours [26].

A study Rosa Antonia Carrillo investigates why change efforts in safety often fail, and one of the identified problems were practical drift [27]. Practical drift is a human tendency to stop following a procedure when negative consequences fail to occur. This results from a lack of understanding by those affected by a change, of the true intent of the change. The reasoning that follows for not following a procedure may be of perceived practicality or efficiency.

Polarity refers to a certain divide within an organisation between workers and upper management; the employee feels that the best way forward for the organisation is to simply get the work done rather than meeting paperwork and procedural demands from upper management. At the same time, corporate views these demands as important and critical for the company.

Practical drift and polarity behaviours within an organisation reduce the lasting effects of procedural changes. Effective CM should therefore, be implemented hand-in-hand with behavioural changes [27].

3.9 A Need for Effective Change Management

A combination of several alarming factors highlights what is wrong with modern industrial control systems and existing Change Management (CM) techniques. These include: Incomplete and generic security standards and guidelines; Deficiencies in effective upper management resulting from intra-organisational communication break-downs, difficulties comprehending and interpreting technical reports, budgetary concerns, and cyber-threat ignorance, and; Behavioural hurdles working against positive security changes. As a result of these factors, we can see why a majority of control systems remain reliant on legacy architectures, equipment, and security, and why the rate of successful cyber-intrusions continues to rise. This is further evident in the following Ponemon survey where respondents rated the agility and effectiveness of managing change to IT security operations (Figure 10).

Figure 10: Agility and effectiveness in managing change to IT security operations [16]

Poor CM is also apparent in the following survey, where respondents identified what tools their organisations utilised to facilitate effective changes. Over 75% indicated that they didn’t implement proper change control, while over 56% of companies don’t utilise network traffic monitoring, perform vulnerability risk management, perform security configuration management or employ incident detection and alerting (Figure 11).

Figure 11: Technologies that facilitate the management of changes [16]

In ENISA’s report on protecting industrial control systems, recommendations for improving good ICS security practice include the creation of ICS security plan templates, the creation of a good practice guides for ICS security, and raising cyber security awareness and training [20].

Proper change documentation and security plan templates can assist in determining when official security audits should be carried out, and can be used to effectively implement change. This will help organisations that have a lack of skilled or expert personnel deal with system complexity issues. ICS Security improvements are a solid investment in an age of growing cyber-crime, where industrial control systems are at the frontier of cyber-attacks [28]. Effective CM is required to properly deploy security technologies, bridge the gap between security engineers and upper management and help convey the true state of system security to all involved in ICS security.

## ~~ 4 Evaluating Good Change Management

Effective CM will not only streamline change processes, but make them more robust. An example of good CM practice can be found by examining the pharmaceutical industry. It is critical that Pharma must adopt very strict change control procedures in order to comply with an array of laws, regulations, guidelines and standards, and reduce the risk of operational licence suspension. Change control procedures are put in place to prevent the serious consequences that could result from poor product quality, incorrect labelling, etc. when making changes. In Australia, the Therapeutic Goods Administration (TGA) regulates the pharmaceutical industry through pre-market assessments, post-market monitoring and the enforcement of standards [29].

The penalties for not meeting standards provides incentive for pharmaceutical companies to produce safe products in a careful and controlled manner using refined change processes [29]. The incentive to comply with standards is something that ICS security does not currently have, as standards are not mandated. In most cases, system security is considered to be important, however it is not prioritised high enough and the framework for changing system security is not standard or refined.

The chemical industry in the US is another example of industry following good security CM practices. The Department of Homeland Security (DHS) require high risk chemical facilities to comply with Chemical Facility Anti-Terrorism Standards (CFATS). This involves a heavily regulated process by the DHS which involves categorising chemical plants into risk tiers, and developing site specific security plans based on security vulnerability assessments and surveys [30] (Figure 12).

Figure 12: DHS Implementation Steps within the CFATS Process [30]

Following the implementation of the security plan the DHS conduct periodic compliance inspections based on the risk tier assigned to the chemical site.

As control systems incorporate devices that have inherently long, asynchronous life-cycles the process involved upgrading a system is often very expensive and drawn-out. The migration from old to new is a gradual process, especially for larger more dispersed SCADA networks. Components of the system require replacing or upgrading at different times, while system continuity and reliability requirements of the system must be maintained. Due to these factors, it is almost natural for an industry to lose sight of the overall security of a system and address each issue as it eventuates. It is therefore extremely important for the organisation to maintain and follow an ICS cyber security masterplan for security requirements when any part of the control system is modified [7]. It is important for CM to adhere to the greater security picture.

The management of change control can have a substantial impact on a control system’s safety, reliability and continuity. In order to evaluate effective CM, we can look at 10 characteristics for ‘best-in-class’ change management within industry, derived from a NSF (The Public Health and Safety Organization) webinar on best industry practices [26]. 1. Business Critical Change Control: Change control is driven by strategically vital / business critical components, not simply to appease compliance requirements. Any changes made are of value. 2. Fast and Effective CM should be fast and effective and able to prevent any change that is not actually required. Slow and complex change control systems can be dangerous as they result in user complacency and user-created work-arounds. 3. User Centred Design CM should be designed to take inputs from all relevant user groups, including technical and non-technical staff. 4. Simple CM should be simple, and not complex. CM steps should be standardised, again to reduce complexity. 5. Standardised Processes: Decision making processes, such as assessing change risk, weighing cost vs benefit for all change considerations should be standardised. For every change the same aspects should be assessed: safety, health and environmental impact, scope of change, down time required, etc. 6. In-depth process knowledge: CM should involve those with relevant knowledge relating to all aspects of the change. 7. Focus on implementation & Behavioural Changes: CM requires good implementation. This requires an understanding of behavioural resistances that may arise. CM should take into account behavioural resistance to change. Offer support to those who are experiencing the change and express the need for the change. 8. Follow up: Follow up is very important. The effectiveness of the actual change AND the CM processes should be analysed. What could be improved? 9. Performance measures: Choose the most important behaviours or outcomes of CM before defining performance measures. In general, measure the number of changes raised, the number of rejected changes, the number of successful changes and the number of failures. 10. Continuous improvement: Implement consistent CM performances user-experience reviews, audits and self-inspections. The system should be treated as one that can always be improved.

Using these 10 good CM characteristics as a guide, an analysis can be made of industries that implement good CM practices. The pharmaceutical and United States (US) chemical industries were assessed. The below table indicates that majorly, both industries exhibit the 10 best-in-class CM properties; while both are heavily regulated the careful implementation of change management is required to maintain the quality and safety of processes and products (Table 1).

Table 1: Evaluation of Good Change Management Processes in Industry

## ~~ 5 Documentation

This section outlines the development process used to develop security CM tools and document templates.

5.1 Security Metrics

In order to create effective security change control documentation, it is first important to define a comprehensive set of security metrics that rank a system’s cyber security level in a meaningful way. The security metrics used should support risk assessments within the practical constraints of what is presently measurable and controllable by the company [31]. Research findings agree that metrics are crucial for achieving effective security CM processes [16].

When expressed correctly, security metrics can achieve a number of important functions. They can set an organisation’s security benchmark for ICS security, and in doing so meet external security demands, such as contractual requirements; They can be used to satisfy compliance for standards (though security standards are currently not mandated); Security metrics can also be used internally, to assist with security risk assessments, analysis of control system changes, and evaluations of security investments [32].

Characteristics of an effective cyber security metric include; Relevance – the variable used to create the metric must be relevant to cyber security; Unambiguous – use of the metric should be straightforward and clear; Direct – the metric is composed of elements that relate directly to cyber security; Measurable – the metric is composed of measurable axes; Comprehensive – all goals and consequences are accounted for [32].

W. Boyer and M. McQueen [31] devised ‘Seven Ideals of Security’ for control system cyber security and they have defined for each ideal, one or more security metrics measuring the realisation of that ideal (see Table 2 and Table 3).

Table 2: ‘Seven Ideals of Security’ & Related Metrics - W. Boyer and M. McQueen [31]

Table 3: Security Metrics Detailed - W. Boyer and M. McQueen [31]

These ten metrics can provide a relatively thorough analysis of an organisation’s security level – if all information is available to security engineers.

A Ponemon Institute survey on “Reasons for not using metrics that convey the true state of security” revealed that 74% of participants regard security metrics as important when they measure the impact of disruptive technologies, while 62% feel that current metrics don’t actually achieve this. 69% also believe that metrics conflict with business goals [16]. Despite the fact that a majority of metrics are perceived as not sufficiently technical, they still conflict with business goals.

Figure 13 below displays statistics for the believed reasons for not using metrics that convey the true state of security.

Figure 13: “Reasons for not using metrics that convey the true state of security” – The Ponemon Institute [16] Without compliance and mandated security standards for organisations to follow, other pressing issues take precedence, as indicated by 72% of those surveyed.

Developing a security metric that appeases both security specialists and non-technical management while conveying the true state of a system’s security is difficult. With the aim of keeping risk assessments not overly complex or technical, the above security metrics defined by W. Boyer and M. McQueen [23] can be combined into two broad categories. The first category: Metrics 1a, 1b, 2a, 2b, 3a, 3b, 3c, 4a, 4b, 6a and 6b contribute to provide an overall indication of a system’s security level and the likelihood of cyber-attacks being effective. They can also give an indication of what level of complexity, skill and resources are required by an attack in order to perform a successful attack.

Rather than using attack likelihood as a security metric, the latter option is preferred to avoid complacency when ranking the security of a system. Determining the likelihood of successful cyber-attack is a difficult task and final risk assessments may be underrated and based on guess work. This is more of an issue when considering the general disposition towards ICS security and lack of appreciation for cyber risks. Determining the complexity required for a successful cyber-attack offers a security metric that is easier to quantify, and can lead to more accurate risk assessments. To properly classify this metric, the ICS in question must be viewed from an attacker’s perspective, compelling security engineers to consider security from a more useful viewpoint.

The below security metric table, Attack Potential Index (API) takes the above into account. API is a ranking system that determines the level of threat required to execute a successful cyber-attack. These range from Level A – Organised Crime, which is essentially large scale, low complexity cyber-attacks that are generally dealt with through basic IT security – through to Level F – Nation-state / Cyber-warfare Attacks such as Stuxnet (Table 4).

Table 4: Attack Potential Index

Examples of API rating #1: Within the power generation industry in Australia, where zoning / segregation between IT WANs and ICS WANs is not widely implemented, it is possible for a disgruntled insider with IT network access to cause problems within control systems. This gives a minimum Attack Potential Index (API) of level B – Disgruntled Insider with access to the IT network.

Example of API rating #2: An organisation currently has a PLC communicating data via TCP plain text communication. This would give an API rating of level A – Organised Crime. Upgrading the device’s communication protocol to TCP encrypted Transport Layer Security (TLS) would give an improved API rating of level B or C.

The second category: Security metrics 5a. Worst case loss and 7a. Restoration Time from W. Boyer and M. McQueen [23] combine to indicate the severity of a potential security breach. This is an important metric, as it helps convey the gravity of cyber-attacks and relates to the impact of disruptive technologies, deemed important by the above survey [16] (Table 5).

Table 5: Security Breach Consequence Index

5.2 Security Risk Assessment

Combining both of the above security metrics (attack potential index and security breach consequence index) produces a 2D security risk matrix (Table 6).

Table 6: Security Risk Matrix

Each combination of Security Breach Consequence index (SBCI) and (Attack Potential Index) API ratings is given a unique Security Risk Matrix (SRM) number that is used to provide an overall security rating. These ratings are split into three categories, and colour coded above. The categories are as follows. Category 1: High Risk – SHM# (1 – 13) Elimination or mitigation actions must be taken to reduce the risk. Category 2: Moderate Risk – SHM# (14 – 20) Elimination or mitigation actions must be taken to reduce the risk. Category 3: Low Risk – SHM# (22 – 30) Risk is acceptable.

The selection of a system’s SBCI rating should incorporate answers to the following questions – as outlined by NIST guidelines [2]: i) How could an incident effect the operation of sensors to impact the physical world? ii) What redundant control measures exist in the ICS to prevent an impact? iii) How could a physical incident emerge based on these conditions?

Lack of communication with upper management accounted for 63% of reasons given for not using meaningful metrics, and 43% agree that management was simply not interested in the information [16]. Risk assessments conducted by security specialists must properly summarise ICS security risk, while conveying this information to upper management in a manner that it is clearly understood. To achieve this, the following table is created to be utilized and included in every Change Request Form (Table 7).

For every area of the control network that will be affected by the proposed change, the security engineer in charge will out the below table – area security risk assessment.

Table 7: Area Security Risk Assessment

5.3 Standards

Protecting a company's infrastructure and services from disruption is an important priority with increasing connectivity prevalent in operational environments. Standards help distinguish what work types and expertise areas can be engaged to control system security posture. It is therefore very important to identify the most effective standards available. The below table displays the results of an ICS publication study: ‘A survey of cyber security management in industrial control systems’ [21] (Table 8). The data reveals that standard IEC/ISA 62443: Security for Industrial Automation and Control Systems – incorporates comprehensive information across the range of defined scopes and metrics as listed below. - Risk Management and Assessment (RMA) - Asset Identification and Classification (AI&C) - Threat Assessment (TA) - Vulnerability Assessment (VA) - Risk Level Evaluation (RLE) - Recommendation of Countermeasures (RoC) - Change Management (CM) - Qualitative Metrics (Ql) - Quantitative Metrics (Qa) - Link between Safety and Security (S)

Table 8: Industrial Control System Security Publication Analysis [21]

The IEC/ISA 62443 standard is considered the de facto general purpose standard which should be used by suppliers, integrators and end-users of ICS used as a guide for securing ICS systems. It defines security requirements for organizations, control systems and SCADA devices, and outlines architectures for any ICS system that is connected to the internet. Standard 62443 is divided into four sets of sub-standards (Figure 14).

Figure 14: Components of IEC/ISA 62443 [33]

The first group of sub-standards with prefix 62443-1 are considered general and provide information on topics that are common throughout the 62443 series of standards. The second group with prefix 62443-2 includes information relevant to ICS security policies and procedures. The third group with prefix 62443-3 are more technical and focus on requirements at the system level. The last group with prefix 62443-4, details requirements associated with the development of ICS products [33].

Other more specific standards by the International Standards for Automation (ISA) / International Electrotechnical Commission (IEC) include:

1. ISA/IEC -TR84.00.09-2013 Security Countermeasures Related to Safety Instrumented Systems (SIS). This includes guidance on the countermeasures used to reduce the chance of security breaches.

2. ISA/IEC -TR99.00.01-2007 Security Technologies for Industrial Automation and Control Systems. This standard provides an assessment of existing cyber security tools, mitigation counter-measures and other ICS technologies.

Many standards for securing IT systems exist. It is important to also consider IT system security as it represents the first line of defence of ICS security. Some of the more widely used ones are listed below: 1. ISO/IEC 27001:2013 Information Technology — Information Security Management. This standard looks at establishing, implementing, maintaining and improving IT management systems, and the requirements for achieving each aspect.

2. ISO/IEC 27002:2013 Information Technology – Security Techniques – Code of Practice for Information Security Controls. This standard is heavy utilised by ICS operators and is used as a base for more specific control-system related applications.

3. ISO/IEC 27019:2013 – Information technology -- Security techniques -- Information security management guidelines based on ISO/IEC 27002 for process control systems specific to the energy utility industry.

4. ISO/IEC 15408 – Evaluation Criteria for IT security. As it is important to ensure that new products are properly configured and maintained, security evaluations as outlined in this standard can be useful for maintaining strong control system security.

5. ISO/IEC TR/19791 – Security assessment of operational systems. This extends ISO/IEC 15408.

6. ISO/IEC 21827 – Systems Security Engineering – Capability Maturity Model (SSE-CMM). This standard is focused on SSE-CMMs which can be used to evaluate security engineering processes and can be applied to ICS security, though it is only generic and does not account for control system diversity.

7. ISO/IEC 17799 – Code of Practice for Information Security Management. This standard provide guidelines for best practices relating to security management.

While these standards are not currently mandated in Australia, it is highly recommended to use them as guidance. However, standards and guidelines are designed to cater for large and diverse audiences and may not apply to specific network architectures or allow for the intricacies within individual organisations. Any change to ICS security warrants additional consideration [34].

One example of an effective change control process within the pharmaceutical industry [35] is the grading of change. This is recommended by the European Union Goods & Manufacturing Practice (EU-GMP), who state that classification of change may assist in determining the level of testing, validation, and documentation required to justify changes to a validated process [36]. This is an important concept for efficient CM. The following table is used as an outline to grade a change as either major, minor or not requiring control.

Table 9: Pharmaceutical Grading of Change [35]

For each change, a row is dedicated to identifying possible measures for each type of change. I.e. for a major change, an official licence, a new approval, or drug revalidation is needed, while a minor change would only require amendment, review or additional documentation. In the case where a change does not require control – and is not relevant to Goods & Manufacturing Process (GMP) then no measures need be taken. Adapting this table, and applying grading of change to ICS security results in the following.

In this case, a change is graded based upon the level of disruption to control system safety, reliability, continuity or security. Significant disruption indicates a major change, while minor disruption would be classified as a minor change, and zero disruption indicates a change that does not require control. The grade of change is also determined by the worst-case level of vulnerability that is reached during the change, as based upon the SRM Category level.

Example of Change Grading #1:


It is estimated that the system’s SRM rating will drop from 22, Category 3 to a minimum security level of 11, Category 2 at one point during the change. A Grading Change Form will be filled out including explanations for the given ratings. A corresponding Change Request Form will be filled out noting the level of change using the above security ratings. If approved, the relevant minor change documentation will be filled out. An audit is not required upon completion of the change.

5.5 Change Request Form

A Change Request Form (CRF) has been created that provides industry with a template and starting point for the initiation of a security change (See Appendix 10.1). When a potential security improvement has been identified, and a change to system security has been defined, the following form should be carefully completed as a proposition for upper management. By analysing and rating changes using the defined security metrics, the effectiveness of a change can properly be evaluated, and detailed to upper management. Whether it is a minor or major change, the CRF can be used to incorporate input from all affected areas within the organisation relating to the change. If the recommended change requires any behavioural training or sensemaking sessions, this is also specified within the form.

5.6 Security Near Miss Form

Identifying, reporting and handling cyber security incidents and near-misses is a critical step in improving ICS security. A Security Near Miss Form (SNMF) has been created that can be used as a starting point in the event of suspicious cyber activity, or cyber-attacks (See Appendix 10.2). It prompts the user to record all relevant details of the incident needed for further investigation. This includes info such as the date and time of occurrence; the type of attack – such as a DOS attack or phishing email; a list of systems that are suspected of, or confirmed as compromised; a list of any lost data; a description of any immediate actions performed during or after the incident.

The SNMF template also includes a section for a system security assessment, to be performed by the department’s relevant security engineer. This includes risk ratings for the system before and after the incident and clear explanations justifying the ratings. This is to assist upper management in understanding the system’s security level.

After sufficient investigation has been put into determining the cause of an incident, investigators can list recommendations security improvements, including relevant change request form identifiers.

5.7 Change Evaluation Form

Change review and evaluation is an important aspect of the change management process and represents three of the 10 good CM characteristics: Follow-up, Performance Measures and Continuous Improvement. As such, a Change Evaluation Form has been created thank links back to a specific change – using CRF number as reference (See Appendix 10.3). The CEF incorporates the following table, which is intended to be completed for every area within a system that was originally effected by the change (Table 11 Area Risk Assessment and Change Evaluation). Each area supervisor conducts their change evaluation by performing a revised risk assessment and determining whether the security change has improved the risk rating as expected. Once all area risk assessments and change evaluations are completed, an overall change evaluation is conducted to answer the following questions: Have the change objectives have been achieved? Has the change improved system security or process efficiency as expected? To what degree is the change still in effect? Are behavioural changes still in effect? Are additional changes required? If the change has not been implemented as planned, or doesn’t achieve its objectives and additional changes are required, a CRF number is included to reference a new change request form.

Table 11 Area Risk Assessment and Change Evaluation

5.8 Training & Behavioural Changes

Dealing with behavioural problems is necessary for the continued presence of many changes. In ENISA’s report on protecting industrial control systems, one of the key findings, KF 4.4 – Awareness topic to be included in the ICS security plan [20], identifies the need include training within ICS CM processes and security plans. Training sessions should provide relevant personnel with: a) A proper understanding and awareness of current cyber security issues. b) Knowledge of the differences between IT and ICS technologies, including process safety and associated management processes and methods. c) The ability to develop security practices that incorporate different skill-sets within an organization to deal with cyber security collaboratively.

Rosa Antonia Carrillo in ‘Complexity and Safety’ [27] recommends the use of sensemaking sessions to handle practical drift – which is the tendency for an individual to stop following a procedure and start taking shortcuts because the real reason for the procedure is unclear to them. Sensemaking is the process by which people give meaning to experience. Sensemaking sessions are performed in a class-room format and are taught by professionals with relevant knowledge and experience of the subject. Below are the following steps taken involved in a sensemaking session for practical drift [27]. 1. Define practical drift. 2. Identify an area where alternative practices have emerged that vary from procedure. 3. Identify where there has been an improvement and why. Identify where practical drift might have created a potential danger relating to the improvement. 4. Record findings in two columns for positive and negative potential. Hold an open dialogue to look for patterns that can lead to a consensus on what is acceptable practical drift, and what is not. 5. What is the process for recording or communicating changes to procedure? 6. Identify any “cardinal” safety rules that must never be violated.

The above format can be applied to other behavioural or awareness aspects of cyber security. Polarity within an organisation results from conflicting operational goals between workers and management; workers don’t fully appreciate paperwork requirements set by upper management who see it as very important. Sensemaking sessions can be created to improved understanding of the reasons behind procedures, rules and paperwork.

Sensemaking sessions can be kept more general, focusing on educating people about potential cyber-attack vectors such as spear-phishing or corrupted USB flash-drives. Sensemaking sessions could also be more specific, and relate to a specific change, informing personnel what documents to use for security issues, and why they are important.

## ~~ 6 Evaluating Documents

The CM tools defined in sections 5.1, 5.2 and 5.4, and document templates defined in sections 5.6, 5.7 and 5.8 have been derived to provide industry with a useful starting point for the development of ICS CM processes and cyber security plans. In order to analyse the effectiveness of these documents before they are trialled out in the field, they are all considered against each of the 10 characteristics for ‘best-in-class’ CM within industry derived from the NSF for best industry practices [25] (Table 10).

Table 12: Documentation Evaluation

## ~~ 7 Conclusions

Evidence shows modern cyber-attacks are occurring at frequently, and have become very sophisticated and targeted in nature. It is becoming more important and necessary for industry to improve control system security in order to ensure that safety standards are maintained and business critical operations are not compromised.

Increasing connectivity between ICS networks, corporate IT networks and the Internet has exposed traditionally isolated and insecure ICS networks. Due to differing operational priorities between the two networks, IT security cannot solely be relied upon for the safe operation of critical ICS processes. Due to successful cyber-attacks such as Stuxnet, and the Ukrainian power hack, ICS networks are currently at the frontier of cyber-attacks, and must be able to deal with advanced persistent threats. The existence of device search engines like Shodan and availability of increasing more sophisticated hacking tools, the security risks facing control systems has never been higher.

In order to resolve insecurities created by legacy ICS, a set of ideal security requirements have been defined. The requirements identified are restricting logical access to the system, restricting physical access, protecting individual ICS components, detecting security incidents, maintaining functionality during adverse conditions, and ensuring quick system restoration. Industry has access to numerous devices and software that can achieve the above goals, yet statistics indicate that these preventative measures are not being implemented.

Numerous ICS security standards and guidelines exist, but evidence of successful cyber-attacks and poor ICS security posture suggests that they are not being utilised effectively. A study of such security publications reveals that many standards and guidelines are lacking in specific content relating to risk management, assessment and evaluations, and there is a noted absence of Change Management (CM) recommendations. Additional studies show that compliance with standards is not a market driver, and they are therefore under-used because they are not currently enforced by legislation. Despite these facts, it is still highly recommended for industry to use ICS security standards IEC/ISA 62443 as guidance as they are well rounded and designed specifically for ICS security. It is important to note however, that standards and guidelines are designed to cater for large and diverse audiences and may not apply to specific network architectures, or allow for the intricacies within individual organisations.

Poor ICS security posture is also a result of other contributing factors. Surveys determined that there is a general breakdown in communication between technical staff and policy-makers within industry. Upper management often has difficulties comprehending and interpreting technical security reports and do not fully appreciate or care for the risks presented by cyber threats. There is a perception within many organisations that budgeting for security presents a financial loss rather than an investment, despite the fact that cyber incidents result in significant financial and resource costs every year. This is also apparent in industry surveys, indicating that security changes are rarely made because of insufficient resources or budget. These issues stem from poor awareness network security posture, and of cyber risks.

Often, reluctance to make system security change stems from the operational nature of ICS; upgrades and modifications to existing control systems require the shutdown of industrial processes which may result in loss of profit or affect company reputation. There are other behavioural factors that work against positive system change, such as polarity with an organisation, and practical drift behaviour.

It is important for industry to improve awareness of modern security risks and common ICS vulnerabilities before improvements to ICSs can be justified and implemented, and this is achieved by the production and distribution of awareness reports such as this thesis, and / or through legislation and the enforcement of thorough security standards. Given an organisation is willing to implement security changes, it is crucial that effective CM strategies be employed. A report by ENISA highlights the need in industry for good ICS security practices, including the creation of security plan templates and the creation of a good practice guides for ICS security.

In order to create effective change management document templates, 10 characteristics for ‘best-in-class’ change management were identified; changes are business critical and strategically vital; they are fast and effective; they follow a user centred design; they are simple; they are standardized; they utilises in-depth process knowledge; they focus on implementation and behavioural adjustments; they are followed-up; processes are reviewed, and; processes are continuously improved.

The Pharmaceutical industry provides a good study of effective CM practices. In particular, the grading of change allows efficient allocation of time and resources. The chemical industry in the US implements in-depth security plan processes as required by the DHS. Both of these industries are heavily regulated and required to meet legislative requirements and implement controlled CM processes, and due to the nature of their products, are also obliged to follow strict CM procedures to ensure products are safe and behave as intended. Both industries were identified to have the above 10 good CM characteristics.

The development of suitable security metrics is crucial for achieving effective security CM processes, and is important for meeting internal demands such as; risk assessment processes, analysis of changes, and evaluation of security investments, and external demands such as; contractual requirements or future compliance with standards. It was determined that in order for security metrics to be effective, they must be relevant, unambiguous, direct, measurable and comprehensive. Studies revealed that simpler security metrics were more effective at conveying security levels to non-technical personnel, which prompted the development of a risk assessment table using two security metrics; attack potential index and security breach consequence index.

A local security risk assessment table was developed as a way of reducing complexity of assessments conducted by security engineers, and bridging the knowledge gap to upper management. This tool is used throughout the document templates and addresses communication issues within industry.

The final CM document templates were evaluated using the 10 confirmed ‘best-in-class’ CM characteristics, as follows; they are designed for the strategical benefit of the organisation; they are simple and fast to use; they are tailored for use by technical and non-technical employees; they incorporate standardised decision making processes, including risk assessments and grading of change; they require input by those with ICS process knowledge; they touch on behavioural resistance to change; changes are followed-up as part of a recursive quality checking process. The remaining two ‘best-in-class’ characteristics; the assessment of CM processes and allowance for document improvements are fall outside of the scope of this research project and will be outlined below as future work.

The study of current ICS security has raised some alarming concerns. While awareness of security issues within industry needs to improve, the implementation of security changes also needs attention and guidance. The study of CM processes within the pharmaceutical and chemical industries presented an ideal starting point with which to analyse good change management characteristics. The next step was to use these fundamental principles for the creation of a series of document templates which provide industry with a starting point for the development of effective change control processes and behavioural change programs.

## ~~ 8 Future Work

Future work in this space includes the expansion and deployment of change control documentation. Document templates can be further expanded and detailed, and their use and effectiveness within the field can be analysed. Upon receiving feedback on document usage and efficacy, documents can be improved accordingly. This process can continue periodically to allow the continuous improvement of documentation. In order to achieve this, version control can be used to keep track of document revisions.

Research using online resources will yield examples of version control systems that may be applied in this case. The final documentation could be structured in such a way that industry can refer to and build upon previous document utilisation in similar environments. In turn, industries will be able to contribute their own experiences, which will assist in refining the documentation for future use.

Methods for the deployment of documentation can be investigated. One method may be the implementing of a Creative Commons licencing structure. This will allow the legal distribution and use of all documentation. The licencing could include an ‘attribution’ element – so changes to the document requires acknowledgment by the creator; a ‘share-alike’ element – allows others to remix, adapt and build on the work assuming it is distributed correctly; and a ‘non-commercial’ element – to prevent other parties from using the documents for monetary gain [28].

Further investigation can be made into existing behavioural issues working against positive security changes. This may require a study into works written by industrial psychologists and business analysists. Document templates can be adapted to account for new findings.

## ~~ 10 Appendices

10.1 Change Request Form

Change Request Form (CRF) Document #: CR_____ Version #: Valid from: Valid to:

Company Name: Date / Time: Pages x of y: Details Change Requested by: Department:

Title: Mobile / Telephone:

Change Details Object of Change:

Description of Change:

Area Risk Assessment Security Risk Matrix (SHM)

	Security Breach Consequence Index (SBCI)


I II III IV V Attack Potential Index (API) A 1 2 5 7 25 B 3 4 10 19 26 C 6 9 14 20 27 D 8 11 15 22 28 E 12 16 18 23 29 F 13 17 21 24 30

Conduct a thorough security risk assessment using SBCI and API security metrics. Use the security risk matrix above to determine overall risk rating. Provide clear explanations for each given rating. If the change affected more than one area, add additional area risk assessment tables as required.

Area Risk Assessment Area 1 Area Name: Description Rating Type Rating Explanation Existing security rating SBCI API SRM Security rating after proposed change SBCI API SRM Comments / Notes: Name (Area Head): Signature: Date:

Area Risk Assessment Area X Add additional area risk assessments if necessary

Change Control Committee Behavioural Requirements

	Behavioural changes required?


	Major change
Minor change
No major or minor change


Decision

	The change is authorized. Time limit for the implementation:
The change is not authorized. Rationale:


Position Name Signature Date

10.2 Security Near Miss Form

Security Near Miss Form Document #: SNM_____ Version #: Valid from: Valid to:

Company Name: Date / Time: Pages x of y:

General Information Reported by: Department:

Title: Mobile / Telephone:

Incident Details Date / Time Detected: Date / Time Reported:

Incident Details:

Type of Incident:

Systems affected: Data Lost: Vulnerabilities Identified: Immediate Action Taken:

Revised Risk Assessment

Security Risk Matrix (SHM)

	Security Breach Consequence Index (SBCI)


I II III IV V Attack Potential Index (API) A 1 2 5 7 25 B 3 4 10 19 26 C 6 9 14 20 27 D 8 11 15 22 28 E 12 16 18 23 29 F 13 17 21 24 30

Conduct a thorough security risk assessment using SBCI and API security metrics. Use the security risk matrix above to determine overall risk rating. Provide clear explanations for each given rating. If the change affected more than one area, add additional area risk assessments as required.

Revised Area Risk Assessment: Area 1 In light of the recent security incident or near miss, complete relevant local area risk assessments.

Area Name: Description Rating Type Rating Explanation Previous security rating: SBCI API SRM Revised security rating: SBCI API SRM Comments / Notes: Name: Signature: Date:

Revised Area Risk Assessment: Area X Add additional area risk assessments if necessary

Investigate Remedial Actions Investigation Recommendations Recommendation Name Signature CRF

10.3 Change Evaluation Form

Change Evaluation Form Document #: CE_____ Version #: Valid from: Valid to:

Company Name: Date / Time: Pages x of y:

Original Change Requester Change Originally Requested by: Date of Request:

Title:

Change Details Change Request Number: CR_____ Date of Change:

Department:

Object of Change:

Description of Change: Details of Actions taken: Behavioural changes implemented:

Risk Assessments Security Risk Matrix (SHM)

	Security Breach Consequence Index (SBCI)


I II III IV V Attack Potential Index (API) A 1 2 5 7 25 B 3 4 10 19 26 C 6 9 14 20 27 D 8 11 15 22 28 E 12 16 18 23 29 F 13 17 21 24 30

Conduct a thorough security risk assessment using SBCI and API security metrics. Use the security risk matrix above to determine overall risk rating. Provide clear explanations for each given rating. If the change affected more than one area, add additional area risk assessment and evaluation tables as required.

Risk Assessment and Evaluation Area 1 Area Name: Description Rating Type Rating Explanation Previous security rating: SBCI API SRM Revised security rating: SBCI API SRM Change Evaluation:

Summary: Name: Signature: Date:

Risk Assessment and Evaluation Area X Add additional area risk assessments and evaluation tables as necessary

Change Evaluation Based upon the change evaluations provided by the above area assessments, conduct a final change evaluation.

Have the change objectives been achieved? Has the change improved system security or process efficiency as expected?

To what degree is the change still in effect?

Are behavioural changes still in effect?

Are additional changes required? CRF #: CR____

Change Control Committee Final Change Evaluation Conducted By Name Position Signature Date

10.4 Estonian Tour Report

The Estonian study tour was a very enjoyable experience, both personally and academically. The opportunity to share my research project with other academics and engineers within the cybersecurity field was extremely beneficial, as it provided me with valuable ideas and perspective which will drive my future research. In addition to this, we were given ample time to explore Tallinn and Estonian culture.

ICR Conference The Interdisciplinary Cyber Research (ICR) conference was held on the 2nd of July in the SOC building at Tallinn University. The conference brought together an array of scholars from technical and legal backgrounds to focus on social, political, legal and technical aspects of modern cyber security. Keynote speakers included Jaan Tallinn, one of the founding engineers of Skype, and Stephen Mason, barrister and researcher at the Institute of Advanced Legal Studies who both presented on the current and future technical and legal challenges (respectively) arising from the development of artificial intelligence, which I found to be very interesting. Topics presented by other speakers varied significantly, and as such the conference was divided into six sessions: Use and Abuse of the Internet, Technology and Emerging Threats, Crime and Digital Technologies, Internet of Things, E-Governance and Identity Theft and Verification. All electrical and electronic engineering students from the Adelaide delegation were given the opportunity to present their honors project at the conference, and the week leading up the event was spent in preparation and rehearsal. This allowed time to review, summarise and condense our current research projects down to short 15 minute presentations. Though presentations were done individually, all preparations were conducted as a group which allowed for efficient idea and opinion sharing and brainstorming.

Cyber Security Summer School

The cyber security summer school was held during the following week, and included an intensive workload of presentations, lectures, group activities and discussions relating to cyber security. The presentations were very insightful and grounding, and helped put various cyber security issues into perspective. There were also some more specific lectures on data forensics; looking at digital forensic tools and the handling of electronic evidence, and determining what factors are required to make forensic evidence admissible in a court of law. This material was immediately beneficial during the main group exercise. The school also included a practical component, where all participants were sorted into six groups and were designated either team USA or team Estonia. Each group included members with legal or technical background of varying skill levels, and was presented with a fabricated legal scenario to analyse throughout the week: Estonian minister, Mari-Liis and her husband were found dead in the streets of Detroit, their cause of death unknown. The manner in which the evidence was handled by US police was called into dispute by the Estonian government. All groups were given access to one major piece of evidence - Mari-Liis’ laptop hard-drive and were tasked to use this evidence to present on behalf of the country they represent. It was the responsibility of the technical team members within each group to search the hard-drive for supporting evidence that could be used by the group’s lawyers in a moot court held at the end of the week. This had to be achieved in such a way to ensure that evidence was admissible and relevant. This was a very enjoyable and challenging experience, and provided the chance to utilize the forensic skills and techniques detailed in recent lectures. It was also a unique experience working along professionals with a legal background. As an engineer, I found that it was very easy to get caught up in minor technical details relating to the murder itself, while losing site of the overall legal goal - which for team USA, was to defend the handling of evidence relating to the murders. It was important to have ongoing communication with the legal team on what evidence would be useful. The development of a detailed timeline was required to build an accurate picture of what most likely happened.

Excursions

In addition to the ICR conference and cyber security school, we had time to go on a few day trips; we were given a tour of Skype headquarters, we visited the KGB Museum and went up the Tallinn TV tower viewing platform. We visited Mektory, an innovation and business centre that promotes and assists start-up companies and young entrepreneurs. We went on a brief excursion to the NATO Cooperative Cyber Defence Centre of Excellence (CCDCOE) where we were given a presentation on Locked Shields, which is an international live-fire cyber defence exercise involving a defensive team - the Blue team, and an offensive team - the Red team. We visited the e-Estonia showroom, which contains information on how Estonian society has adopted a digital information infrastructure called X-Road, which is a virtual medium allowing the interconnection of multiple services in such a way that they are secure against cyber-attacks. This mean that an Estonian citizen doesn't need to provide the same information multiple times, as each service can access relevant information via the X-Road. It also provides a ‘one-stop shop’ for all services and streamlines otherwise laborious and tedious administrative processes. We also had enough free time to explore Tallinn’s historic centre - the Old Town, see the maritime museum, go souvenir shopping, and visit Helsinki for a day via the ferry lines.