After enduring the usual software development delays, I am moving ahead with the data store evaluations mentioned in the post Databases a la Carte. There are three immediate goals. The first is obtaining a deeper knowledge of MySQL, which is more or less a practice-makes-perfect situation. Thus far, I have been getting by on my knowledge of SQL Server, but now it is time to dig around inside MySQL to learn more about configuration, security, and its main gotchas.
Becoming better acquainted with a few of the major NoSQL databases is the second goal. One of Amazon’s NoSQL offerings will probably be the first taken out for a spin. Having spent the past year using local and regular ISP servers for development and testing, I am itching to try a purely cloud-based development/delivery platform. The final goal is testing workflow ideas with a graph database. I have located two promising systems (both have free editions), which, on first-pass, seem to have all of the required features.
While interested in trying new technologies, I have no desire to be on the bleeding edge. Acceptable data stores must be reliable, have good data protection/integrity features, be easy to connect to, and offer good security features. Since the release of Oracle in 1979, relational systems have climbed to the top of the database market. As the relational model is the current gold standard for modern data stores, features common to production-quality relational database management systems will be used as a starting point for evaluations. This is not to imply that relational data stores are innately superior to other types. Rather, it means that new data stores under consideration must provide capabilities equal to or better than MySQL or SQL Server. (Of course, in health care and banking, MUMPS is doing quite well against standard relational systems.)
Database management systems are not simply great places to store information until it is needed. They are also feature-rich software systems that provide a range of mechanisms for protecting data from accidental or deliberate alteration. These features are the most important for me, so they are at the top of my checklist.
Data Integrity Features
Data integrity is a key issue for any application developer. Ensuring that data are trustworthy requires capabilities such as range checking, format checking, conditional actions, etc. The table below offers a few examples of integrity checks commonly used for clinical data.
|Range check||Validation, abnormal lab values|
|Format check||Social security numbers, email addresses|
|Conditional action||Audit trails—when a record is inserted, updated, read, or deleted an entry is written to an audit table.|
Constraints are the simplest form of integrity check. Constraints operate at the table level and are used to ensure that added data conform to field requirements. Triggers are more complex than constraints. Like constraints, triggers are table-based. They contain procedural code and may be used for a variety of tasks, including those for which constraints are used. Both constraints and triggers fire automatically in response to table events (e.g., reads, updates, delete, etc.), and are not controlled by the application accessing the database. Often, triggers are used to create audit trails.
Audit trails are chronological records of database events. Audit trails may be as detailed as the database developer desires. For example, when a record is inserted into a table, a trigger may be used to capture the date/time, the person, or the fields affected. Stored procedures are similar in capability to triggers. A key difference is that, while stored procedures are stored in the database, they can be called from a software application. Another feature that differentiates triggers and stored procedures is that the latter can return record sets. This makes stored procedures a great way to grab data from a database for processing in an application.
A transaction is a unit of activity or work within a database. Transactions are used to prevent data inconsistencies, especially when a database action changes information in the database. Order entry provides a good example of the value of transactions. For example, suppose you are entering a set of complex orders for an ICU patient. Care quality would be diminished if some of those orders were written to the database and others were lost. Transactions have two major components — commit and rollback. When a complex order set is entered, the database will attempt to write all orders to the appropriate tables. If all of the writes (insertions) are successful, then the transaction is allowed to terminate successfully, and the orders are committed to the database and become permanent. However, if any errors occur and some writes are unsuccessful, a rollback occurs and none of the orders are stored. The acronym ACID (i.e., atomic, consistent, isolated, durable) is used to describe the ideal behavior of transactions.
Data security is always a challenge. At a minimum, I expect a DBMS to provide role-based access to data and encryption capability. Both features are essential in clinical databases.
Aside from data integrity features, I will be looking at the development tools available for each DBMS under consideration. RDBMS have good diagramming and modeling tools that can generate a database directly from the design. Systems that offer similar tools and functionality will receive greater consideration. Since many NoSQL databases are open source, documentation quality and system reliability are important. Fortunately, there are well-documented versions of NoSQL databases offered by Google, Amazon, and Microsoft as well as commercial products, so I will start my evaluations with these offerings.
I am eager to get back to these evaluations. When done, I hope to have an improved working knowledge of MySQL; to have identified a graph database suitable for testing Petri net ideas; and to have a simple application running on the Amazon cloud. Bring on the databases!
Hoffer JA, Prescott MB, Topi H. Modern Database Management. Eleventh ed. Upper Saddle River, NJ: Prentice Hall, 2012. This is my favorite database book; it is the reference used for this post.