(Note: This documentation was written was when MAGI was based on the DETER Security Execution Environment, “SEER”. References to SEER have been removed or lightly updated. Generally in this documentation SEER is the same as MAGI.)
SEER/MAGI started as a command and control structure for DDoS experimentation where we wanted to control traffic generators, attack agents, defenses and basic data collection. A graphical interface was connected to a special control node which in turn was connected to the experiment nodes via a simple event system. As message size and format was restricted, so was the data that was transmitted. Commands sent to the node were key,value string pairs of variables along with string commands. Data collection would periodically send back packet counter data that was stored at the control node.
As the event system started to wilt under the pressure of the number of messages sent, a new messaging system was created to replace it, but it followed the same structure as the previous system to reduce the number of changes required to the rest of the software. New requirements were placed on the system and it grew accordingly but in directions not planned for. In the end, there were some inhereit complexities in the system that confused those who wanted to add to it. Some of these are:
- Separation of communication across the control node. A distinct messaging network grew on the outside of the control nodes but there was a separate multicast down/TCP up network between each control node and the experiment nodes it was repsonsible for. Messages from the GUI to the experiment nodes required software on the control node that could forward the messages on or process accordingly. There was no end-to-end transmission.
- Overly constrained framework. Based on the activities we initially used SEER for, the user was expected to add to it by inheriting from Agent, Collector or Aggregator. While features that easily fit into the given API’s could be written with only a few lines of code and automatically have a panel created in the GUI, there was a little too much magic and if the program to solve didn’t fit nicely, adding it become cumbersome.
- Software was automaticaly distributed to all nodes. Some software only runs on particular operating systems. Extra steps are needed to verify that failure of software install is actually an error (didn’t install on the system we need it on) or can be ignored (didn’t install on Windows but it isn’t used there).
- Demoware/Cancerware. As demos came and went, new concepts were pursued and this forced the development in one direction or the other. Hacks are sometimes created and never removed or properly integrated as planning for the next demonstration is already under way.
- Human Computer Interaction is not my forte. There are many people dedicated to the interaction of the human user and software. How to display information. How to request information. What colors to use. The primary developer is not one of these people.
- Tied to a language. Everyone has their preference for languages. Not everyone likes Python yet features added in the backend were pretty much required to be written in Python, at least some python glue and wrting a small file of glue to pass a couple items on to another process is just cumbersome.
At the highest level, we are starting with a single messaging system to connect all nodes in the experiment. User interfaces, experiment nodes and any other supporting nodes. There are several different underlying transports to the messaging system such as TCP and UDP multicast but the view of a node using the messaging system is that of a large cloud with all of the nodes in that cloud. They can be addressed individually with their name or as a group using a group name where nodes join and leave the group. Messages can be flagged for different types of delivery such as best effort, acknowledged, source ordered, etc.
The routing process on nodes that are in a position to route is hidden from the user process, however, the node can request that packets that aren’t destined for the local node are intercepted for the purposes of aggregation. The messaging system adds its own header to the data but does not care about the payload. Only the next layer (user process) should attempt aggregation on the payload.
So, now we have connected all the nodes in the experiment on a logical level. What next?
Each MAGI message also has a type field. A variable length string version of the TCP/UDP port number. The user process (daemon) on each node can dispatch the messages based on this information. On non user interface nodes, this dispatch process has a built in listener running for ‘EXEC’ messages. These messages contain code modules that are executed on the node and attached to the daemon as listeners for new types. Simple python snippets can be executed in a new thread and connected via a Queue. Non python code can be executed in its own process and connected via pipes or AF_UNIX sockets. The code used on the node is no longer automatically loaded. The code needed on each node is pushed as it is needed.
Select nodes in the experiment now have agents listening for something to do. They can be written in different languages so the old tricks of extracting information using python introspection is no longer possible. Each agent can define the interface in an TBD IDL format (like XPIDL). With this information, automatic creation of panels and user dialogs is possible again as is the formatting required of the payload. An agent can also chose to only define part of the interface or none at all. This is up to the module and its interaction with the user. A module may choose to interact with other modules on experiment nodes outside of the communication with the user interface.
The interfaces are no longer restricted to the set of variable types supported by the Agent class or simple events. This makes it much easier to create agents for simple actions like pushing, pulling files, more interesting actions like integrating software that has its own C&C system such as a botnet simulator, or obtuse opertaions like performing analysis that requires moving data between processing nodes. Using groups, the user interface can send a single message to command a series of nodes to all perform the same operation. For example, telling a group of nodes that will be acting an HTTP clients to start requesting pages at rate X.
It should be noted that with the Collector interface no longer present, there are no modules that volunteer their data to the user. This is important as we strive to create larger and larger experiments. Rather than volunteer data, the user interface must request data as its wants it or ask the modules to send data at periodic times and then tell them to stop later. These interfaces are to be defined by each agent module. The messaging system just moves the data where it needs to go.