Vulnerabilities of Agentless Monitoring Mechanisms
In this note I consider monitoring of computer systems in critical environments. In safety-critical environments, continuous monitoring of computers and other devices is necessary for assurance of safety. Vulnerabilities of agentless monitoring mechanisms are depicted, and it is shown why these mechanisms are not viable in safety-critical environments.
2 Continuous Monitoring
In safety-critical environments, machines must be monitored continuously in order to control the operations and to collect accurate statistics. In absence of accurate statistics the undecidable condition problem arises. Particularly subject to this problem are medical computer-controlled equipment and any other system upon which depends human-life (airplane systems, nuclear plant systems, etc.). It is necessary that even in case of network anomalies, the machine statistics are collected continuously, not only for logging and audit purposes, but primarily for realization of real safety. It is then essential that machine monitoring happens locally on the machine.
3 Vulnerabilities of Agentless Monitoring
Agentless monitoring is often seen as the most economic type of remote monitoring, but in many usage scenarios the savings of reduced agent deployment are obfuscated by the vulnerabilities that impose an augmented management burden.
3.1 Vulnerability To Network Disruption
If the network is disrupted no monitoring can happen. Both physical transmission lines and network equipment are vulnerable to disruption. Configurations of routers and switches are also vulnerable to disruption due to many causes.
There are many situations where it is possible that an agentless monitor cannot reach the remote monitored machine, while the machine and the application are still running. In these situations, the remote machine continues to provide the service while not being monitored, and thus statistics in that time interval are not collected by the remote monitoring console. This is exactly when the undecidable condition arises.
3.2 Vulnerability To Network Congestion
When there is a high load of traffic generated by many hosts, even in a switched network that eliminates the collision domain, there is an high probability of congestion. Congestion can cause packet loss which can result in retransmissions, which increase congestion. In a congested network the accuracy of statistics provided by agentless monitoring is not guaranteed, since packets that transport monitoring data can arrive too late or even be lost. Another example of undecidable condition.
3.3 Vulnerability To Power Outages
When a local power outage happens, a significant amount of time can pass before the remote monitoring console can communicate again with the monitored system. During this time the system is not monitored. The undecidable condition happens again.
4 Agent-based Monitoring
Now to the classic approach that will actually work in critical situations.
4.2 Reduction of Network Traffic
4.3 Remote Configuration
4.4 Tolerance To Network Disruption
A mechanism tolerant to disruption will adopt a peer-to-peer communication model, as opposed to a master/slave communication with a central monitor. Agent collaboration is an essential part of the model in order to allow agents to bypass network failures.