First Response in a CyberSecurity Emergency

Disclaimer: This content is the result of my having survived several emergencies of varying effect sizes. Most of these ideas are rules of thumb, and not canned answers. As in any emergency your best outcome will be if you keep calm, consider many options, and flow from plan to backup plan as requried.

First Response to Intrusions



The defense strategy can be summed up by saying: You fix broken technologies with less technology. You do not fix broken technologies by introducing new, also broken, technologies.



Not all intrusions are equally damaging. However, in an on-going cybersecurity emergency, one can well assume that any intrusion is ill intended. Once an attacker gains root access to a machine and they are able to understand the puropse of the machine, they can access any of the electronic systems on the machine.

The first line of defense in a cybersecurity emergency is always to turn off electronic equipment that interfers with sustaining life & health. Any compromised machine that can interfer with the necessities of daily living should be replaced with a physical solution. This approach requires the cooperation of computer programmers and civil/mechanical engineers to create a physical solution.

For example, if computers are responsible for measuring water quality, injecting chemicals into the water supply for cleaning and health (such as floride, etc.), the security of these computers is critical for population health. Such a system should have physical lock-outs that protect the water-supply from poisoning, even in the event that the computer should be compromised.

For example, if cell phone signals are conditioning the speach centers of users, it is natural to roll-back the phone network to an earlier version of wireless. In order for this to happen, the doctors who observe the health impact need to clearly communicate with their patients and the phone companies in order to prevent un-necessary harm. This clear communication only happens if the doctors suspend judgment about mental illness and really listen to their patient's user experiences carefully. If problems started with the roll-out of the 4G network, it behoves the medical establishment and the telecom companies to communicate clearly on this issue.

The second line of defense in a cybersecurity emergency is to find an electronic work-around. (Note: A work-around is additional way of accomplishing a goal that uses the same software, but avoids the most obvious use-case. A use-case in software engineering is a design guideline, a sequence of steps taken by a user to accomplish a goal, which is given to the programmer who designs the software so that it can be used for the use-case.)

For example, if an attacker disables an ATM, the most obvious work-around would be for the user to enter the bank and use the teller. The teller likely accesses the same electronic system as the ATM. However, since the teller's use-case is different from the ATM use-case, the vulnerabilities are slightly different, and the work-around may succeed.

Most computer systems do have a user over-ride for most use-cases. This means that human users provide essential options for work-arounds. In the end, humans are responsible to keep the economy running, not computers. So, being polite, remaining calm, and finding the right person to do a work-around are essential skills in a cybersecurity emergency. Oddly enough, this layer of politeness can make it difficulty to detect the presence of an emergency situation, since people who are adapted to the emergency situation are already pursuing people-based work-arounds.

The third line of defense in an on-going cybersecurity emergency is to leave non-essential computers running, but contain the dammage that they can achieve. This last guide-line seems rather strange, on the face of it. Why not shut down all non-essential computers? Strangely enough, the logic behind this guide-line is that when many computers are running, the attacker must solve a big-data problem to map the internet and find the machines that are most critical to the infrastructure. To limit the damage of the attacker, we would like for them to work as hard as possible to accomplish their goals. Since there is no network map of the internet, and since most of the computers are un-named by DNS since they reside on private networks, attackers probably spend months or years searching for interesting machines to attack. Let's keep them busy with all the non-essential machines, while we fix the infrastructure critical machines and find work-arounds for the most critical systems.

Containing the damage that a compromised machine can do to other machines is a matter of using good security principles. More on this later. For now, we will say that strategic use of firewalls, routers, disabling USB ports, and disabling network services will contain most of the damage. In the worst intrusions, a machine should be taken entirely off-line or used very sparingly to contain potential damage. Again, in the spirit of a complex system, this decision should be made by each individual user or administrator who is in the best position to evaluate the potential damage and estimate the degree of intrusion.

The fourth line of defense in an on-going cybersecurity emergency is to end software roll-outs and stop building buggy systems. There may be a strong relationship between cybersecurity and tech bubbles, although this topic warants further academic study. In any event, the introduction of new software will only make the emergency worse. Users adapt to existing software, find work-arounds, and generally cope with daily life. When new, buggy software is added to the complex system of a city, the load on the users to learn a new system goes up. When the users also have to cope with new software that does not work to design specifications, then the load for finding work-arounds raises exponentially for software of reasonable complexity.

By the way, it is provable that the load raises exponentially. Recall that a computer program's logic flow can be modeled using a directed acyclic graph (dag). Suppose that every path in the graph contains some bug. Then the user is required to learn enough of the dag to find a work-around that uses another path in the graph. Any software of reasonable complexity has an exponential number of paths in the dag. If each has a bug, the user has tons of learning to do to accomodate the buggy software.

The fifth line of defense in an on-going cybersecurity emergency is to roll back to older software or older systems. Any system that the user knows really well will contribute more to user efficiency than a new system. Also, presumably the user already knows the bugs and work-arounds that are effective with the old system. When using a system that they know very well, users are also more able to distinguish intrusions from routine bugs in the system. In an cybersecurity emergency, knowledge of which machines are compromised is incredibly critical. Only users can tell you if the machine is behaving predicably (i.e. deterministically, which is the way that a clean machine behaves) as opposed to non-deterministically (which is the way that a compromised machine behaves). Since these changes can be incredibly subtle, users who can detect these changes are critical to the effort to maintain security.

This principle of finding work-arounds is a user-implemented version of the 'power of two choices' which is a probabilistic principle that is proven to improve fault tolerance [Mitzenmacher, 2001].

Indeed, at the moment, I am typing this text on a 2010 netbook that runs Linux. I am very familiar with the computer, the driver problems that it has in Linux, and with the Linux operating system. When I am on a network whose security I do not know well, I do not update my machine (i.e., I continue to use the 'older' version of the system with which I am well accointed). No updates when I travel, is a rule-of-thumb that I use after having numberous experiences with poisened updates. I certainly have access to newer computers and newer OSes, and have had other security issues with those machines.

This line-of-defense is quite similar to the old rule-of-thumb about not being a beta user. Most computer scientists are not necessarily new technology adopters. We know from hard experience that the first roll-out of a new technology means that all the users are still beta users. Those users are still helping to debug the software. If you want reliable technology, you purchase the stuff that is a year or two old and that has been debugged qutie well. If you are extremely conservative, you use only the oldest technologies.

I still have an old, very reliable cell phone from the early 2000's that I routinely pull out in emergencies. When my smart phone stops working and when there is no analog phone line available, this old cell phone works very reliably.

As another example, the USPS works incredibly reliably. This is because most of the mail collection and distribution is still done by hand, with very old technologies: hand-bags, trucks, and physical bins. Some of the routing of mail is done by computers, but mail-people are still responsible for hand-routing every piece of mail that the computer is unable to understand. I have had hand-addressed letters with the correct name and wrong address properly delivered. This kind of reliability is due to the excellent employees at the USPS.


Homework

  1. Have your family from other places send old, still working technology to you. This is the most rapid way to improve security.
  2. Ask your boss to halt introduction of a new beta software. Tell them that the health of the whole community depends on their choice.
  3. Contribute to tracking down the health impact of new technologies by crowd sourcing using a distributed algorithm:
  4. Algorithmically and statistically savvy computer scientists should think of other statistically rigorous distributed algorithms that will help the community. Please ask the professor or a colleague to verify the correctness of your algorithm, and make sure to communicate very clearly to non-technical people who help with the distributed process.
  5. Remind your boss of the 'power of two choices'. If your boss cannot roll back to an older technology, ask them to provide users with two ways to accomplish each critical use-case. The power of two choices is statistically proven to reduce errors and improve fault-tolerance.
  6. Theoretical computer scientists who specialize in randomized algorithms might be able to suggest other user-implemented algorithms that will improve security.


Mitzenmacher, M. The power of two choices in randomized load balancing, 2001. Parallel and Distributed Systems, IEEE Transactions on 12(10):1094-1104.

© 2015-2021 Intrepid Net Computing. All rights reserved.