Investigating NoSQL for EHR Systems: MongoDB

Anyone who has worked solely with relational databases will likely find his/her first encounter with NoSQL data stores to be unsettling.    Creating an ER diagram, normalizing tables, choosing primary keys, and setting relationships are all so ingrained in RDBMS users that it is difficult to separate the data modeling process from the underlying goal of data management.   When one has spent so much time building and understanding relational schema, moving to a schema-less data store feels completely wrong.

Many RDBMS users either dismiss NoSQL as a fad (not!) or consider it unnecessary. Having high-quality, fast RDBMS available makes it easy to cling to those notions and ignore the explosion in NoSQL databases.  My approach to NoSQL has not been focused on how they compare to RDBMS, but rather what types of problems they were created to solve.

What is NoSQL?
As with many technical terms, pinning down an exact meaning is difficult.   However, when looked at from a functional approach, a few common features are shared by most NoSQL systems—they are schema-less, tend not to be ACID-compliant, and are designed for distributed use across a large number of servers.   Most (if not all) NoSQL data stores were conceived as solutions to managing data generated by large-scale web applications.  In fact, Google, Amazon, and Facebook have entries in this space.   NoSQL data stores are generally lumped into various categories.  Two types, document and graph data stores, seem to address problems I am working on and will be the subjects of posts.

MongoDB is a document-oriented data store.  It was released in 2009 and uses JSON in its API.   MongoDB stores documents, which are JSON-like objects of arbitrary complexity.  The actual storage format is binary.   Each MongoDB document is automatically assigned a unique key when stored, which may be used to access the object.  Documents are stored in collections and collections reside in databases.    Collections are roughly analogous to tables and documents to records in RDBMS. MongoDB supports field indexes (not typical in NoSQL).  JavaScript is MongoDB’s native query language.

MongoDB documents are said to have dynamic schema, meaning that no documents are ever required to conform to any particular format or structure.    I have to admit that this was a bit jarring initially, but after a while, quite liberating.

Using MongoDB
Let’s take a look at basic CRUD (create, read, update, delete) operations.   JSON objects can be passed to Mongo for storage.   Here is a JSON object that stores patient information for a disease registry.  Notice that the problem list is an array.    This is a simple object, but objects can be more complex.  The largest document size that MongoDB stores by default is 16 MB.  Larger documents require special handling.

{  "UniqueID": "WH555444",
    “LastName": "Doe",
     "FirstName": "John",
     	   "Problems": [
      		 "Low back pain"	

For those who would like to try MongoDB, a copy may be downloaded here.  Be sure to choose the correct system (32-bit vs 64-bit) for your operating system.  If you would like to try MongoDB in your browser, go here for a basic tutorial.

Since the MongoDB uses JavaScript, we can save this document  by creating a variable to hold it before insertion into MongoDB.

var   testPatient =    {  "UniqueID": "WH555444",
    “LastName": "Doe",
     "FirstName": "John",
     	   "Problems": [
      		 "Low back pain"	

MongoDB will create a collection/database automatically if it doesn’t already exist.  Saving the document is simple.  I want to call the database that will be created “registry,” so all that is required to save this document is the command: (testPatient);

It is also possible to insert a patient directly using the insert command (i.e., no variable is used to hold the patient object prior to insertion).

{  "UniqueID": “LX111222”,
    “LastName”: "Doe",
     "FirstName": "Sally",
     	   "Problems": [
      		 "Chronic cough"

If you now execute the db.registry.find() command, both records will appear.   In addition, you will see that both have been assigned an object ID.   Here is the ID for the second document inserted:

{   “_id” : {   “$oid” : “5165e645cc93742c1604b1f8”   }

The object ID encodes a timestamp and other information related to the server where the database resides.

Simply calling the find command without any parameters will return all documents as a result set. Choosing specific records could be done by supplying a parameter.

db.registry.find()         [Returns all documents]

db.registry.find({UniqueID: “LX111222”});         [Returns all documents for a specific MR#]

db.registry.find({Problems: “Rash”})         [Returns all documents where the patient has “Rash” in the problem list]

Updates are also simple.  Changing the last name of Mr. Doe can be accomplished by issuing the following command.

 db.registry.update( { LastName:”Doe”}, { $set: { LastName: “Smith” } } );

The last of the basic commands, deletion, is accomplished as follows:

                   db.registry.remove({LastName: “Smith”});

This command will remove all document where “Smith” appears as the last name.  Of course, multiple criteria or the object ID can be used to choose a specific record.

One fact that becomes immediately obvious when experimenting with MongoDB is the flexibility one has with documents.  Documents are not required to have the same number of fields; nor must all documents be of the same basic structure.  For example, if one is storing test results, a CBC with differential, Chem-7, and MRI report could be inserted into the same collection.

MongoDB has proven its worth in numerous web applications, but what about health care?   Within an EHR, transactions and ACID capabilities are important for legal and patient safety purposes, and I would not use MongoDB for storing critical data (it doesn’t support transactions).  However, there are many healthcare data storage requirements that are not critical.  MongoDB is designed for fast response times for large data sets. This might make it ideal for knowledge bases, online journals, a forms (e.g., surveys) repository, or even as storage for externally-generated clinical information that cannot be ported easily to a RDBMS EHR.   Such information could be linked to a patient’s active EHR profile, and remain searchable, but remain separate, for legal and clinical reasons, from more trusted in-house generated data.

Another thing to keep in mind is that we no longer live in an age where applications are monolithic LAN-based, client/server systems.   Systems with modular architectures that make use of APIs could store data in more than one type of database.  For example, a workflow engine that uses a graph database could be integrated into an EHR while all critical patient data were kept in a RDBMS.   EBM knowledge bases could reside in MongoDB and be  linked to the EHR via an API.

Most NoSQL databases are less than six years old. It will take a while for them to reach the maturity level that RDBMS have achieved.  Even so, they offer new ways of thinking  about and doing data management that push the envelope not only in terms of data access and storage, but also software architecture and design.    Indeed, the challenges that NoSQL  systems present to traditional application design approaches ultimately may prove to be as valuable as the data storage options they  provide.



  1. You may want to take a look at my recent presentation – see:

    which pulls together many of the topics I discuss in my blog articles.

    Healthcare has long been successfully using a NoSQL database – one that was developed long before NoSQL became a fashionable moniker. Unfortunately that database technology has been dismissed as a consequence of criticism of its associated (but now optional) language.

    1. Rob, thanks for your comment. You nailed it in the presentation. I first used MUMPS in 1986 and hated the language. Are there database drivers for PHP/Ruby/Python?

Leave a Reply

Your email address will not be published. Required fields are marked *