Data Tools: XML Design

Good XML Design Documentation for improved performance

Five questions must be asked before designing an XML data document (Font, 2010):

  1. Will this document be part of a solution?
  2. Will this document have design standards that must be followed?
  3. What part may change over time?
  4. To what extent is human readability or machine readability important?
  5. Will there be a massive amount of data? Does file size matter?

All XML data documents should be versioned, and key stakeholders should be involved in the XML data design process (Font, 2010).

A few rules (not a comprehensive list) on making a good XML design:

  1. Be consistent with your design and design for extensions by multiple people (Google, 2008; Font; 2010).
  2. Reuse existing XML formats (Google, 2008)
  3. Tag each unit of information, maintain a minimal amount of text that can be processed as a whole (Harold, 2003)
    1. An element has a start tag and an end tag only <menu></menu> but an attribute describes an element inside of a tag <menu-item portion-size =”500” portion-units=”g”></menu-item>; therefore an attribute provides some properties to the element (Harold, 2003; Ogbuji, 2004).
    2. The principle of core content: Know the difference of when to use an element versus an attribute: use elements when the information is an essential part of the material, and use an attribute if the information is peripheral or incidental to the main message. Essentially “Data goes in elements, metadata in attributes” (Ogbuji, 2004; Oracle, n.d.).
      1. Elements must be in a namespace, and attributes shouldn’t be in a namespace (Google, 2008)
    3. Avoid implicit structures, which occurs by the addition of white space (Harold, 2003)
      1. This can be seen easily with names, where white spaces are seen between the first name, middle name, and last name. Ogbuji (2004b) suggested to use well-established elements like: <firstname/>; <othername/>; <surname/>; <forename/>; <rolename/>; <namelink/>; <genname/>; and <addname/> to address the eccentricities of a person’s name from various cultures.
      2. Post office addresses pose this same issue, so Ogbuji (2004b) suggested these established elements: <street/>; <postcode/>; <pob/>; <city/>; <state/>; <country/>; <otheraddr/>; <phone/>; <fax/>; and <email/>/
    4. Use a standard and accepted element reference guide like DocBook Element Reference (Walsh & Muellner, 2006). Or something similar and stick with that convention.
      1. Use published standard abbreviations for constructing names (Google, 2008; Walsh & Muellner, 2006)
    5. Avoid using hyphens “-“ in your naming convention (Font, 2010)
    6. Avoid the use of boolean values (Google, 2008)
    7. Keep the document structure readable (Principle of readability), do not make it too troublesome to process or read (Harold, 2003; Ogbuji 2004). For example, use elements for readability and understandability by humans, and attributes for machine digest (Ogbuji, 2004).
    8. Comments should not be used to contain data, but rather to dos (Google, 2008)

Example of an XML Document (W3 Schools, n.d.)

<?xml version="1.0" encoding="UTF-8"?>
 
 <shiporder orderid="889923"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="shiporder.xsd">
   <orderperson>John Smith</orderperson>
   <shipto>
     <name>Ola Nordmann</name>
     <address>Langgt 23</address>
     <city>4000 Stavanger</city>
     <country>Norway</country>
   </shipto>
   <item>
     <title>Empire Burlesque</title>
     <note>Special Edition</note>
     <quantity>1</quantity>
     <price>10.90</price>
   </item>
   <item>
     <title>Hide your heart</title>
     <quantity>1</quantity>
     <price>9.90</price>
   </item>
 </shiporder>

Analysis of XML design document from the user’s perspective for improved performance

It should be best to have <shipto/> information to contain only the address, not just the two major datasets like name and address, which represents designing for extendibility as in Rule 1.  The tags <note/> and <price/> should be an attribute to the <title> per Rule 3a and 3b. Rule 4a. was not followed for <name>Ola Nordmann</name>.  Quantity is not an attribute of <item> thus should be a child element of <item> per Rule 4b. Tags like <name/>; <item/>; <quantity/>; and <price/> do not follow a naming convention as stated in Rule 5., but they could come from a naming convention that is internal to this company, so this one is hard to evaluate without much more information. Rules 6-9 were kept in this example.

References