Pages

Monday, 14 March 2011

XML Files

Introduction

This week I will talk about XML and its uses, but first I will make a brief introduction the XML task. The first part of the task involves creating an XML file that can keep the following information about student projects:
  • Student Name
  • Student ID
  • Project Title
  • Project Category
  • Abstract
  • Date Submitted
The use of both elements and attributes is necessary for this exercise. The XML code must be verified. A DTD schema must be created such that the content of the XML file can be validated through DTD validation.

Introduction to XML and related Types

XML is basically a form of text file that uses a structure to contain information. XML stands for Extensible Markup Language. The reason for this name is that XML unlike HTML has no defined set of tags, you must define your own tags. The possibility of creating customized tags can be used to help define data and thus allow for easier data transmission, validation and interpretation between applications and organisations. If structured correctly, XML can be easily readable even to non computer expert users as XML was designed to be self-descriptive.

XML is a way to structure and send data, but does nothing on its own. You cannot load a piece of XML into the browser and expect the browser to understand it. You can however transfer information between applications that understand the structure of a particular XML file. XML has a tree structure in the sense that it has only one parent element which can have many child elements, which can have their own child elements and so on.

Since XML has allows for the liberty of creating ones own structure, validating the XML file content is not enough when it is used for communication between different systems. The structure must also be verified such that it matches the one expected on the receiving end. This is were DTD kicks in. DTD defines the document structure with a list of allowed elements and attributes. DTD can be placed inside the XML file or linked by an external reference. An alternative to DTD  is the XML Schema. An XML Schema is a description file of a type of XML Document.

The XML Document


1
2
3
4
5
6
7
8
9
10

<?xml version="1.0"?>
<project>
   <student name="Mark" ID="762" >
      <projecttitle>Web Development</projecttitle>
      <projectcategory>IT</projectcategory>
      <projectabstract>usage of Javascript and CSS Styling</projectabstract>
      <submitteddate>2011-12-12Z</submitteddate>
   </student>
</project>

The table above shows the XML structure I choose for this particular exercise. As it can be seen in the table, project is the first element  of the document. The element student is the child of project and contains child elements of its own. Student has two attributes which are; name and ID. Assuming projects are per student, project's title, category, abstract, and submitted date are contained within student. I chose this particular structure because I find it easy to read and understand.

DTD (Document Type Definition)


1
2
3
4
5
6
7
8
9
10
11
12

<!DOCTYPE project [
<!ELEMENT project (student)>
<!ELEMENT student (projecttitle,projectcategory,projectabstract,submitteddate)>
<!ELEMENT projecttitle (#PCDATA)>
<!ELEMENT projectcategory (#PCDATA)>
<!ELEMENT projectabstract (#PCDATA)>
<!ELEMENT submitteddate (#PCDATA)>
<!ATTLIST student name CDATA #REQUIRED>
<!ATTLIST student ID CDATA #REQUIRED>
]>


The table above shows the DTD Schema of the XML Document illustrated in the previous section. <!ELEMENT is the tag that describes the elements. It should be immediately followed by the elements name and a pair of brackets. The brackets should contain the name of the sub-elements or the type of data expected. PCDATA stand for parsed character data which means that the text inside the elements will be considered as mark up and any tags inside the text will be expanded. After the declaration of the elements, the attributes need to be declared. <!ATTLIST is the tag used for attributes. The name of the element having the particular attribute is then entered along with the attribute name. In this case CDATA is used. This type of data is threated as normal characters and is not parsed. The text #REQUIRED imply that the element is required and must be included. The DTD was tested with a validator offered by w3schools.com.

XML Schema


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
   <xs:element name="project">
        <xs:complexType>
             <xs:sequence>
                  <xs:element name="student" >
                       <xs:complexType>
                           <xs:sequence>
                               <xs:element name="projecttitle" type="xs:string"/>
                               <xs:element name="projectcategory" type="xs:string"/>
                               <xs:element name="projectabstract" type="xs:string"/>
                               <xs:element name="submitteddate" type="xs:date"/>
                           </xs:sequence>
                           <xs:attribute name="name" type="xs:string" use="required" />
                           <xs:attribute name="ID" type="xs:string" use="required"/>
                       </xs:complexType>
                  </xs:element>
              </xs:sequence>
        </xs:complexType>
   </xs:element>
</xs:schema>


In Addition to the DTD, I created the respective XML Schema. XML Schema is a more powerful way to define your XML document structure and limitations than DTD. These schemas are XML documents themselves which reference the XML Schema Namespace, and even have their own DTD. They improve from DTD as they also check for a wider range of element types which are common to todays programming languages. Writing this type of schema is slightly more difficult and results in a larger volume of code than DTD.  The schema shows that an element project of type complex-type contains an element student. this element contains, a sequence of other elements and two attributes. 'use="required"' is used instead of #REQUIRED.  On line 15 the element submitteddate is declared as date. The date is accepted in yyyy-mm-dd format. In line 8 in the XML file, the submitted date is 2011-12-12Z, where Z shows that the time is UTC time. The XML schema and XML Document was tested at http://xmltools.corefiling.com/schemaValidate/.

Conclusion

Both DTD and XML Schema are good validators that can verify the correct structure of your XML file. An important factor in the choice of which technology to use, is obviously the type of validation required. In certain situations DTD could not test for all the requirements and thus should be avoided. On the other hand DTD could avoid some of the complication provided by the XML schema and can be ideal for some situations.

No comments:

Post a Comment