Roy Tang

Programmer, engineer, scientist, critic, gamer, dreamer, and kid-at-heart.

Blog Notes Photos Links Archives About

MarkLogic NoSQL

I recently attended a few training sessions for MarkLogic held at an office in a nearby business center. Now, I'll forgive you for not knowing what MarkLogic is, as even I hadn't heard of it before six months ago. MarkLogic is (apparently) the leading Enterprise NoSQL provider.
 
NoSQL is big and sexy right now because of the supposed advantages in handling big data, and large web companies like Google and Facebook use a lot of NoSQL in the backend. Most of the popular/well-known NoSQL solutions are open-source/free ones: MongoDB, Cassandra, CouchDB, and so on. But these aren't actually very popular on the enterprise side, hence "Enterprise NoSQL" isn't a very common phrase.
 
One of the reasons NoSQL isn't currently very popular for enterprise projects is that the popular open-source solutions such as MongoDB don't guarantee ACID transactions. In fact, MongoDB has the concept of "eventual consistency" for their distributed servers, which implies that they don't have real-time consistency.
 
MarkLogic does guarantee ACID transactions, along with government-grade security. Both of these are things that enterprise clients love. So that's the market they're in. Their highest-profile project to-date was the backend datastore for Healthcare.gov (also known as Obamacare/ACA). That project involved consolidating data with different structures from multiple sources and needed to scale up to a ridiculously high capacity, so it seemed tailor-fit for a big data level NoSQL solution.
 
During the second-to-last day of the training, Jason Hunter, CTO of Marklogic for Asia-Pacific, dropped in to answer some questions. He talked a bit about how MarkLogic started and how they got him on board, and a bit about their competition (some disparagement towards MongoDB and Oracle) and about Healthcare.gov.
 
He also had a really good sales pitch about why NoSQL is just a better approach compared to RDBMS. (Although it would have been more fun if he had given this talk with like an Oracle salesperson there to debate with him.)
 
One of his points was that RDBMS restrictions such as limits on column size are outdated. They were sensible in the old days when disk space was at a premium, but these days you don't need to limit how many characters you store in a last name field. MarkLogic stores all records as documents with no file size limit (AFAIK) to avoid such an issue. From experience developing web forms, there's the occasional client or system user who doesn't understand why we need to have character limits on fields. We also had this system where they wanted to be able to type unlimited-length rich text content (we stored it as HTML in the backend) in memoboxes. These were fields where they had to write stuff like notes and assessments and most of their data was pages of text. I feel like that sort of thing would have been a great use for NoSQL.
 
Another point he raised was that RDBMS systems were very bad at allowing the user to search like three columns at random. You needed to know in advance which three columns to index. For document stores like MarkLogic, typically the entire document is indexed so that problem is avoided. Of course, that means that functions like "sorting by last name" need a bit more setup. You need to build a range index in MarkLogic to sort by a specific field. So it's kind of a trade-off either way.
 
Now to be fair, Oracle does support indexed full-text search over any number of columns via Oracle Text. But it's not the default behavior and definitely not straightforward. I used to work with Oracle Text a lot in some of my older projects, and the amount of time it took to index any nontrivial amount of data often gave us a headache.
 
I should do a test sometime to determine how well MarkLogic's indexing performs. The story goes that MarkLogic started out as a document search application before they changed gears to become a database. They even approached VC's with the intent of competing with Google. They've had a lot of time to get good at this, so I have high expectations.
 
The MarkLogic server itself is an interesting piece of engineering. It's basically a document store, a search engine and an application server all rolled into one package. Upon installation you get some administrative web applications for configuration purposes. The admin interface seems robust and thorough. Contrast that with Oracle where you often find yourself needing to tinker around with configuration files and such
 
You can run web applications on the MarkLogic server itself. The supported languages are XQuery and server-side JavaScript. Odd choices I know. I suspect due to historical reasons they started out with XQuery, but the SJS side has the same capabilities (or so we're told). If you're not a fan of either option, you can also expose the server's functionality via a REST interface. They also provide existing Java and Node.JS APIs on top of that REST interface. All of this means you can deploy any kind of webapp in front of MarkLogic server.
 
The world is moving towards bigger data stores, so it's not unreasonable to think that NoSQL is on the way up and will be big players in the future. So I think the training was worth it (even if I did have to stay in Ortigas for a while). It's early still. MarkLogic might still turn out to be as painful to work with as Oracle was. But at the very least it's interesting to try a different approach to enterprise data storage. Looking forward to see what kind of applications we can build with this tech.
Posted by under post at #Software Development
Also on: tumblr twitter / 1 / 926 words

See Also

Comments

i can't believe i actually read the whole post. my scope is slowly exposing me to these dbs and servers… but so far all our dbs are in sql. i guess the 'No' in NoSQL got me. haha