A brief NoSQL primer:
There is a lot of discussion these days on the merits of selecting a data store that aligns with the unique requirements of your application. Gone, it would seem, are the days when Relational Databases are the de-facto option for persisting data for your application. The “movement” to recognize and promote this trend is called NoSQL, or “Not Only SQL”. Much has been written on the topic of NoSQL, and as @sogrady pointed out well over a year ago, it’s not going anywhere for a while. But, where is it going? What is it being used for? According to Todd Hoff in a post on the High Scalability Blog, there are numerous reasons “people throw around for using NoSQL”.
On the relationship between directories and NoSQL:
The Lightweight Directory Access Protocol, or LDAP, “is an application level protocol for reading and editing directories over an IP network.” Simply stated: LDAP is a protocol for accessing directories.
So, what are directories? Directories are data stores that have traditionally been used for storing personal or identity data (names, addresses, passwords, etc…). But, there is nothing preventing the storage of any type of object in a directory. Most directories ship with a standard schema that defines objects related to personal data (profiles, roles, organizations, etc…), but are also flexible in accommodating application specific schemas. So, directories are not schema-less, but unlike relational schemas, they are easily modifiable (over protocol) at run-time.
Most directories, our Directory included, are simply key/value stores underneath the covers. For our underlying data store, we choose the Berkeley DB Java Edition for many good reasons, but it really could be any key/value store. Key/value stores are at the heart of many of the most popular NoSQL projects (Dynamo, Cassandra, Riak, Voldemort, etc…). In fact, the relationship between Berkeley DB and NoSQL was eloquently laid out in a recent blog post by @gregburd from the Berkeley DB team:
“So, is Berkeley DB a "NoSQL" solution? Not really, but it certainly is a component of many of the existing NoSQL solutions out there. Forgetting all the noise about how NoSQL solutions are complex distributed databases when you boil them down to a single node you still have to store the data to some form of stable local storage. DBMs solved that problem a long time ago. NoSQL has more to do with the layers on top of the DBM; the distributed, sometimes-consistent, partitioned, scale-out storage that manage key/value or document sets and generally have some form of simple HTTP/REST-style network API. Does Berkeley DB do that? Not really.”
We at UnboundID couldn’t agree more with the statement above. Berkeley DB provides an excellent foundation upon which to build a “NoSQL” solution.
In fact, the following diagram highlights the layers that we have built on top of the DBM offered by Berkeley DB:
Here are some of the highlights of the “layers” we have added:
- A fully distributed data replication model – multi-master replication that supports eventual consistency with no single point of failure, compression over the WAN, encryption, etc…
- A complete LDAPv3 compliant interface – a standards compliant interface for accessing the data store that, in turn, makes it accessible to a host of third-party applications and clients
- Robust alerting, logging and monitoring – multiple channels and message formats for monitoring and troubleshooting the service
- High-availability, load balancing, and data partitioning layer – a proxy service that provides a separate layer for intelligently load balancing requests and partitioning or sharding large data sets into smaller, more manageable chunks
- Flexible data synchronization layer – a data synchronization servicethat supports synchronizing data bi-directionally in real-time to simplify the migration of legacy data stores, or to support on-going high speed data integration
- Some SQL-like goodness – support for database-like batch and interactive transactions, relational joins, and triggers
All of this, plus a whole lot more along with a proven track record of supporting some of the most high performance environments in the world.
So why am I going through the trouble of drawing a correlation between NoSQL and directories? Good question. Well, if you follow the rationale and argument laid out above, then by definition directories are NoSQL solutions and deserve their place alongside all the other solutions being lumped into this category. This argument has been made before. In fact, @mmullany covered a lot of this ground in his post “LDAP Directories: The Forgotten NoSQL”.
One of the salient points made in the post mentioned above is that the demise of directories as a more general-purpose data store rests heavily on the shoulders of LDAP. I agree with this assertion because let’s face it - the majority of the “cool kids” developing the types of applications that require NoSQL would balk at the notion of leveraging LDAP in developing their application. No surprise there. After all, there is nothing “cool” about ASN.1 and BER. That being said, LDAP is by no means obsolete –contrary to what many would like to believe – and still has an important role to play in the technology landscape. Lastly, I would argue that leveraging LDAP is a lot simpler than writing map reduce functions in Erlang.
Today, directories are typically relegated to solving big identity and security data problems. We’ve recently been pushing the boundaries of this legacy, and have plans to continue this trend as we mature in the marketplace. Further, while LDAP and directories are almost always lumped together as a single construct, it wouldn’t take much for us to provide a little separation and make using our data store a little more palatable to the LDAP-averse “cool kids”. JSON via REST anyone? Wait, is that still considered cool?