.. _dataman: Data Management =============== Data Management is a very important aspect of experimentation, which is why the data management layer is a very important aspect of the MAGI framework. The following are some of the important terms that are used in context of MAGI's data management layer. - Sensor: MAGI agent that senses information and needs to store it. - Collector: Database server that can be used to store data. - Shard: In case of a distributed database setup, the data is partitioned and stored in multiple database servers. This concept od partitioning data is known as sharding, and each of the database servers is known as a shard. MAGI's data management layer is highly configurable, with experimenters having the ability to setup a centralized or a distributed database, and also configure, at the node level, where sensors collect data. In case of a distributed/sharded database setup, MAGI sets up a global database server. This server gives a holistic view of the database. MAGI data management uses MongoDB at its base. Data Manager Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^ The data management layer configuration is part of the MAGI's experiment level and node level configuration files. As mentioned earlier, MAGI's data management layer is highly configurable. More information about the same in available at :ref:`dbdl` MAGI's data management layer enables an experimenter to do the following. Sense and Collect ^^^^^^^^^^^^^^^^^ The following are the steps an agent developer should follow to populate MAGI's database 1. Import the database management utility .. code-block:: python from magi.util import database 2. Initialize a database collection passing it a unique name. We suggest using the agent name. Each agent implementation that extends from one of the predefined agents, like the DispatchAgent, has a variable “name” that stores the agent name. .. code-block:: python self.collection = database.getCollection(self.name) 3. Insert data. Each record can be inserted as a dictionary of key-value pairs. .. code-block:: python self.collection.insert({“key1” : “value1”, “key2”: “value2”}) .. note:: The db management utility inserts three other entries per record .. code-block:: None host: created: agent: Query and Analyze ^^^^^^^^^^^^^^^^^ In case of a distributed database setup, a user can connect to the mongo db server running on the global server node to get an experiment-wide view. However, in case of an unsharded setup, a user would have to connect to the appropriate collector based on the sensor-collector mapping to fetch data stored by a particular sensor. MAGI, by default, sets up an non-distributed database, with all the sensors collecting at the same collector. .. code-block:: bash $ more /proj//exp//experiment.conf dbdl: isDBEnabled: true isDBSharded: false sensorToCollectorMap: {__DEFAULT__: node-1} collectorPort: 27018 .. code-block:: bash > mongo node-1.myExperiment.myProject:27018 mongo> use magi switched to db magi mongo> db.experiment_data.find() { "agent" : "user_agent", "host" : "node-1", "created" : 1409075736.646182, "key1" : "value1", "key2" : "value2" } { "agent" : "user_agent", "host" : "node-2", "created" : 1409075737.514683, "key3" : "value3", "key4" : "value4" } In case of a distributed setup, the configuration file would have information about a global server host. An experimenter can connect to the global server to get an experiment wide view of the database, or connect to individual collectors to get their local view. .. code-block:: bash $ more /proj//exp//experiment.conf dbdl: isDBEnabled: true isDBSharded: true globalServerHost: node-1 globalServerPort: 27017 sensorToCollectorMap: {node-1: node-1, node-2: node-2, __DEFAULT__: node-1} collectorPort: 27018 And, for more advanced queries, you can refer the mongo documentation available at http://docs.mongodb.org/manual/tutorial/query-documents/.