.. _dataman:

Data Management
===============
Data Management is a very important aspect of experimentation, which is why
the data management layer is a very important aspect of the MAGI framework.

The following are some of the important terms that are used in context of
MAGI's data management layer.

- Sensor: MAGI agent that senses information and needs to store it.
- Collector: Database server that can be used to store data.
- Shard: In case of a distributed database setup, the data is partitioned and
  stored in multiple database servers. This concept od partitioning data is
  known as sharding, and each of the database servers is known as a shard.

MAGI's data management layer is highly configurable, with experimenters having
the ability to setup a centralized or a distributed database, and also
configure, at the node level, where sensors collect data.

In case of a distributed/sharded database setup, MAGI sets up a global database
server. This server gives a holistic view of the database.

MAGI data management uses MongoDB at its base.


Data Manager Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^
The data management layer configuration is part of the MAGI's
experiment level and node level configuration files.

As mentioned earlier, MAGI's data management layer is highly configurable.
More information about the same in available at :ref:`dbdl`

MAGI's data management layer enables an experimenter to do the following.

Sense and Collect
^^^^^^^^^^^^^^^^^
The following are the steps an agent developer should follow to populate
MAGI's database

1. Import the database management utility

    .. code-block:: python

       from magi.util import database

2. Initialize a database collection passing it a unique name. We suggest using
   the agent name. Each agent implementation that extends from one of the
   predefined agents, like the DispatchAgent, has a variable “name” that stores
   the agent name.

    .. code-block:: python

        self.collection = database.getCollection(self.name)

3. Insert data. Each record can be inserted as a dictionary of key-value pairs.

   .. code-block:: python

	   self.collection.insert({“key1” : “value1”, “key2”: “value2”})

   .. note:: 
	
       The db management utility inserts three other entries per record
   
       .. code-block:: None
   
           host: <node’s hostname>
           created: <record creation time>
           agent: <agent name>


Query and Analyze
^^^^^^^^^^^^^^^^^

In case of a distributed database setup, a user can connect to the mongo db
server running on the global server node to get an experiment-wide view.

However, in case of an unsharded setup, a user would have to connect to the
appropriate collector based on the sensor-collector mapping to fetch data
stored by a particular sensor.

MAGI, by default, sets up an non-distributed database, with all the sensors
collecting at the same collector.

.. code-block:: bash
    $ more /proj/<project>/exp/<experiment>/experiment.conf
    dbdl:
      isDBEnabled: true
      isDBSharded: false
      sensorToCollectorMap: {__DEFAULT__: node-1}
      collectorPort: 27018
      
.. code-block:: bash

    > mongo node-1.myExperiment.myProject:27018
    mongo> use magi
    switched to db magi
    mongo> db.experiment_data.find()
    { "agent" : "user_agent", "host" : "node-1", "created" : 1409075736.646182, 
    "key1" : "value1", "key2" : "value2" }
    { "agent" : "user_agent", "host" : "node-2", "created" : 1409075737.514683, 
    "key3" : "value3", "key4" : "value4" }


In case of a distributed setup, the configuration file would have information
about a global server host. An experimenter can connect to the global server
to get an experiment wide view of the database, or connect to individual
collectors to get their local view.

.. code-block:: bash
    $ more /proj/<project>/exp/<experiment>/experiment.conf
    dbdl:
      isDBEnabled: true
      isDBSharded: true
      globalServerHost: node-1
      globalServerPort: 27017
      sensorToCollectorMap: {node-1: node-1, node-2: node-2, __DEFAULT__: node-1}
      collectorPort: 27018

And, for more advanced queries, you can refer the mongo documentation available
at http://docs.mongodb.org/manual/tutorial/query-documents/.