XML is everywhere—data exchange, configuration files, web services, electronic documents–everywhere. Its ubiquity is well-earned because it addresses, quite well, the problem of sharing data between disparate computer systems. Released in 1998 as a WC3 recommendation, XML is an acronym for eXtensible Markup Language. Actually, XML refers to a family of technologies that provide extremely powerful data exchange and manipulation capabilities. However, one can grasp the utility of XML with only a basic understanding of XML documents and XML Schemas.
My investigations of XML grew out of an interest in data exchange. XML allows one to create structured data that can be validated and passed between systems with ease. One example of this usage is a web service. A web service is a way to share information between applications in real-time over a network. For example, in an EHR, the demographics module might act as a service provider to other modules. In such a scenario, a preventive medicine module might require a patient’s age and gender. The preventive medicine module would then send a request for this information to the demographics module using a web service. Importantly, the two interacting modules need not be physically proximate (i.e. on the same computer).
XML usage is not limited to web services. The beauty of XML is that it allows the creation of a system of tags, which together, define a unique markup language for whatever kind of data one wishes to share. XML documents are plain text, making them human-readable and easy to create. XML Schemas are a type of XML document, and they define the contents of an XML file. The contents of an XML document can be quickly checked for errors or missing information using its associated XML Schema.
Let’s look at an XML document example. Suppose there is a clinical research project that is registering patients with type-2 diabetes for an observational study. The research team has decided to collect the following information from all of the organization’s associated clinics: ID, name, date of initial diagnosis, comorbid conditions, initial HgbA1c, LDL cholesterol: result date, serum creatinine: result date, and care location. Below is an XML document designed to hold this information.
<Patient> <UniqueID>WH555444</UniqueID> <Name> <LastName>Doe</LastName> <FirstName>John</FirstName> <Middle>Frank</Middle> </Name> <IntakeCriteria> <DmDxDate>2012/03/12</DmDxDate> <InitialHgbA1c>6.5</InitialHgbA1c> <CoMorbid>Hypertension</CoMorbid> <CoMorbid>CAD</CoMorbid> </IntakeCriteria> <Labs> <LDLCholestrol> <LDLLevel>122.5</LDLLevel> <LDLResultDate>2012/07/06</LDLResultDate> </LDLCholestrol> <SerumCreatinine> <CreatinineLevel>1.4</ CreatinineLevel > <CreatinineResultDate>12/07/08</CreatinineResultDate> </SerumCreatinine> </Labs> <CareLocation> <FacilityName>East Side Clinic</FacilityName> <ContactEmail>firstname.lastname@example.org</ContactEmail> </CareLocation> </Patient>
This XML document consists of a root element (Patient) and five major child elements (UniqueID, Name, IntakeCriteria, Labs, CareLocation). Aside from UniqueID, the child elements, in turn, have sub-elements. XML documents are hierarchical and are processed as trees (i.e., starting with the root element and then reading child elements until all are read). Medical data are hierarchical, which makes XML a good fit when organizing clinical data for sharing.
Note that XML documents manage data types. Therefore, when data moves from one device to another, dates are read as dates, numbers as numbers, etc. Data validation is done by means of an XML Schema.
Every XML file has an associated schema that describes its contents. The schema for the example XML document may be found here. Look at the listing for the IntakeCriteria element.
xsd:element name="IntakeCriteria"> <xsd:complexType> <xsd:sequence> <xsd:element name="DmDxDate" type="xsd:dateTime" /> <xsd:element name="InitialHgbA1c" type="xsd:decimal" /> <xsd:element maxOccurs="unbounded" name="CoMorbid" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:element>
Notice how the schema requires specific data types for the DmDxDate and InitialHgbA1c elements. Any attempt to place data of another type in either field would cause an error during validation, resulting in the rejection of the XML file. This is a great error trapping tool and a critical feature for information exchange. The flexibility that using XML accords can be seen in how the CoMorbid element is managed. Observe the “maxOccurs” variable’s value (“unbounded”). It indicates that there is no maximum limit to the number of comorbid conditions that can be submitted with the current schema. However, if desired, maxOccurs could be set to a specific value and any entries beyond that number would be ignored. Finally, note the “sequence” tag, which tells the exact order in which each element must appear in a valid XML document that is linked to this schema. Using schemas, it is possible to tightly control the content of XML documents–helping to assure the quality of the data they contain.
If you are thinking that XML is great, but do not see how you might use it, think again. Every major relational database management system allows for importing and exporting data in XML form. In practical terms, this means that sharing data is more an exercise in what to share–not how to share it. XML files are flexible–they can be emailed, placed on a jump drive or, since they are plain text, even cut and pasted.
XML was designed as a system-independent means of sharing data between computer systems. The fact that it is everywhere testifies to its effectiveness. Some things simply make sense; XML is one of those things.