AI & Data

Working with XML Data

Lecture 5

Parsing and extracting data from XML documents using XPath and Python libraries for data analysis

How to handle XML?

XML = eXtensible Markup Language

XML: an example

<breakfast_menu>
	<food>
		<name>Belgian Waffles</name>
        <price>$5.95</price>
		<description>Two of our famous Belgian Waffles with...</description>
		<calories>650</calories>
    </food>
</breakfast_menu>

XML: targeting element thanks to XPath

XML: Read a document using python (native option)

import xml.etree.ElementTree as xmlReader
# read from wml
tree = xmlReader.parse('menu.xml')

XML: Read a document using python (with lxml)

from lxml import etree
# read from xml
tree = etree.parse('menu.xml')
root = tree.getroot()
print(root)

lxml has a more extensive support of XPath, and it is really convenient

XML: get the elements in a list using xpath

elems = root.findall('./food')
data = [[elem.find("./name").text,
         elem.find("./price").text
         ] for elem in elems]

print(data)

XPATH 101:

Exercise : load this xml file from your preferred python environment, then do the same in Orange (using Python Script widget)

Bind xml results to an Orange data table

XML exercises with Orange

XML exercises with Orange (2)

Slide Overview