11 June 2014

Big Data Collections With Mapdb

This article gives a short overview over the open source software MapDB which is now in version 1.0.3.

What is MapDB?

Original designed as a storage engine for an astronomical desktop application it had two design goals minimal overhead and simplicity. Over the time the engine had evolved and the third goal provide an alternative Java memory model was added. So now it is a storage engine which is specialized for big data collections and for that has some cool features.
For example:

Write to Heap, OffHeap, File or TempFile
Synchronization of Maps/TreeMaps/Sets and Queues

Maps can also be build with composite keys
bidirectional maps
synchronization between maps (in case you have a 1-N association)

Caching

expiration on disk usage, access or write time

Compression
Faceting aka Histogram
Simulated Auto-Increment
Transactions (Note: a single transaction can only be used once)
Querying

Small Example

The following example shows the simplicity in the context of IoT where i put 10 million temperature values into a collection which is backed by an off-heap and group the values into five groups (cold, fresh, warm, hot and burns). For filling the cache I also use auto increment.

 1 public class TemperatureRepository {
 2     private final Atomic.Long keyinc;
 3     private ConcurrentHashMap<String, Long> histogram;
 4     private HTreeMap<Long, Integer> temperatureMap;
 5 
 6     public TemperatureRepository() {
 7         //Create off-heap memory cache
 8         temperatureMap = DBMaker.newCache(1.0);
 9 
10         //Get Autoincrement counter
11         DB db = new DB(temperatureMap.getEngine());
12         keyinc = db.getAtomicLong("map_temp");
13 
14         // histogram, category is a key, count is a value
15         histogram = new ConcurrentHashMap<String, Long>(); //any map will do
16 
17         // bind histogram to primary map
18         // we need function which returns category for each map entry
19         Bind.histogram(temperatureMap, histogram, (key, value) -> {
20             String ret = null;
21 
22             if (value < 0) {
23                 ret = "cold";
24             } else if (value < 10) {
25                 ret = "fresh";
26             } else if (value < 20) {
27                 ret = "warm";
28             } else if (value < 30) {
29                 ret = "hot";
30             } else {
31                 ret = "burns";
32             }
33             return ret;
34         });
35     }
36 
37     public void add(int temperature) {
38         temperatureMap.put(keyinc.incrementAndGet(), temperature);
39     }
40 
41     public void printHistogram() {
42         System.out.println(histogram);
43     }
44 
45     public static void main(String[] args) {
46         TemperatureRepository temperatureRepository = new TemperatureRepository();
47         new Random().ints(-10,40).parallel().limit(1_000_000).forEach(e-> temperatureRepository.add(e));
48         temperatureRepository.printHistogram();
49     }
50 }

Fazit

Until now I haven't had the chance to use MapDB in a productive environment but on our playground at www.rapidpm.org it makes a very good impression.