The big data market continues to evolve, as I have written previously. Vendors are attempting to differentiate their offerings as they seek to encourage customers to pay for technology that they could potentially download for free.
MapR is one of those big data vendors. It entered the market in 2011 with a Hadoop distribution that used an alternative, POSIX-based file system that also offers HDFS API compatibility. While other Hadoop distribution vendors were chasing a volume-based business by using the Apache open source community surrounding Hadoop, MapR deliberately chose a more focused strategy of targeting enterprise capabilities and features. The MapR file system was designed to provide features that were missing in the early days of Hadoop, and it continues to form the backbone of the company’s offering today. I recently attended the company’s inaugural analyst day at its headquarters in San Jose to get an update on its progress and approach to the market.
The MapR file system (MapR-FS) provides NFS access, allowing organizations to load and share files from standard file storage systems. MapR-FS provides read-write capabilities, which typically are not available in Hadoop implementations. Using the NFS capabilities, MapR was able to provide high availability, snapshots and replication before other distributions. The company also claims NFS offers higher performance than HDFS since NFS is supported natively in the operating system. With these capabilities MapR secured some early enterprise deals with customers such as American Express and comScore that remain customers today.
In 2013, MapR added MapR-DB for NoSQL database capabilities running in the Hadoop cluster. MapR-DB uses the JSON interface for document database capabilities and provides an API for HBase applications. The high availability, snapshot and replication capabilities mentioned above are available for MapR-DB since MapR-FS is the underlying platform for both parts of the system.
Then in 2015, the company introduced MapR Streams for processing streaming event data, and it began to call the combined products the MapR Converged Data Platform since it offered batch processing via Hadoop, operational processing via NoSQL and streaming data. The last is a critical capability for Internet of Things (IoT) applications. Our recently completed IoT and operational intelligence benchmark research shows that nearly half (46%) of those implementing IoT applications consider it essential to have low or very low latency for processing events.
Like other parts of its platform, MapR Streams supports an open source API, in this case the Kafka API. Combining open source APIs with its proprietary products enables MapR to participate in and benefit from the open source ecosystem surrounding the big data market. The MapR platform also supports Apache Spark, which I have written about, and provides a SQL interface via Apache Drill.
The strategy seems to be working. Many of MapR’s early customers have continued to use its products and increased their investments in them. This approach allows MapR to have clearly differentiated products based on open source technology. However, it also creates some challenges. The divergence from the open source versions results in a smaller community for the MapR products and may cause it to be passed over by prospects who prefer to stay closer to the Apache version of Hadoop. Nevertheless, MapR has managed to establish a position as one of the top three Hadoop distributions. It also claims to be growing revenues significantly and shared some financial metrics under nondisclosure with the analysts in attendance.
MapR offers a robust platform that covers many of the big data requirements, which often require integration of separate products or open source projects. If you are considering a big data project, I recommend evaluating whether MapR meets your needs.
SVP & Research Director