How To Read XML Files In Python?

In this tutorial, I will explain how to read XML files in Python. As a developer working on a project, I came across a scenario where I needed to read XML. I explored various ways to accomplish this task and will share my findings with suitable examples.

This Tutorial Covers:

XML Structure

Before getting into the code, let’s take a moment to understand the structure of an XML file in Python. XML uses tags to define elements, attributes, and their relationships. Here’s an example of a simple XML file representing a customer’s information:

<?xml version="1.0" encoding="UTF-8"?>
<customer>
  <name>John Doe</name>
  <email>johndoe@example.com</email>
  <address>
    <street>123 Main St</street>
    <city>New York</city>
    <state>NY</state>
    <zip>10001</zip>
  </address>
</customer>

In this example, we have a root element called <customer>, which contains child elements like <name>, <email>, and <address>. The <address> element further contains child street, city, state, and zip code elements.

Read How to Skip the First Line in a File in Python?

Parse XML with ElementTree

Python provides a built-in library called xml.etree.ElementTree that makes it easy to parse and navigate XML data. Let’s see how we can use it to read the XML file and extract the desired information.

First, we need to import the ElementTree class from the xml.etree.ElementTree module:

import xml.etree.ElementTree as ET

Next, we’ll assume that the XML file is named customer.xml and is located in the same directory as our Python script. We can parse the XML file using the parse() function:

tree = ET.parse(r"customer.xml")
root = tree.getroot()

The parse() function reads the XML file and returns an ElementTree object. We then call the getroot() method to get the root element of the XML tree, which in this case is the <customer> element.

Check out How to Split a File into Multiple Files in Python?

Access XML Elements and Attributes

Once we have the root element, we can access its child elements using the find() or findall() methods. Python find() method returns the first matching element, while findall() returning a list of all matching elements.

For example, to extract the customer’s name and email, we can do the following:

name = root.find('name').text
email = root.find('email').text
print(f"Name: {name}")
print(f"Email: {email}")

Output:

Name: John Doe
Email: johndoe@example.com

You can see the output in the screenshot below.

To access the child elements of the <address> element, we can use a similar approach:

address = root.find('address')
street = address.find('street').text
city = address.find('city').text
state = address.find('state').text
zip_code = address.find('zip').text
print(f"Address: {street}, {city}, {state} {zip_code}")

Output:

Address: 123 Main St, New York, NY 10001

You can see the output in the screenshot below.

Read How to Read Tab-Delimited Files in Python?

Deal with Namespaces

Sometimes XML files in Python use namespaces to avoid naming conflicts between elements. Namespaces are defined using the xmlns attribute. Here’s an example:

<root xmlns="http://example.com">
  <child>Value</child>
</root>

To handle namespaces, we need to pass a namespace dictionary to the find() or findall() methods. The dictionary should map the namespace prefix to the namespace URI.

namespace = {'ns': 'http://example.com'}
value = root.find('ns:child', namespace).text
print(value)

Output:

Value

You can see the output in the screenshot below.

Check out How to Unzip a File in Python?

Iterate over XML Elements

When dealing with multiple elements of the same name, we can use a loop to iterate over them. Let’s consider an example where we have multiple <product> elements inside a <products> element:

<products>
  <product>
    <name>iPhone 12</name>
    <price>999.99</price>
  </product>
  <product>
    <name>MacBook Pro</name>
    <price>1999.99</price>
  </product>
</products>

To iterate over the <product> elements and extract their names and prices, we can use a for loop:

products = root.findall('product')
for product in products:
    name = product.find('name').text
    price = product.find('price').text
    print(f"Product: {name}, Price: ${price}")

Output:

Product: iPhone 12, Price: $999.99
Product: MacBook Pro, Price: $1999.99

Check out How to Get the File Size in MB using Python?

Handle Errors and Exceptions

When working with XML files, it’s important to handle potential errors and exceptions gracefully. Some common issues include:

File not found: If the specified XML file doesn’t exist, you’ll get a FileNotFoundError. Make sure to catch this exception and provide an appropriate error message.
Malformed XML: If the XML file is not well-formed or contains syntax errors, you’ll get an xml.etree.ElementTree.ParseError. Catch this exception and handle it accordingly.
Element not found: If you try to access an element that doesn’t exist using find(), it will return None. Check for None before accessing the element’s attributes or text to avoid AttributeError exceptions.

Here’s an example of handling exceptions:

try:
    tree = ElementTree.parse('customer.xml')
    root = tree.getroot()
    # Process the XML data
except FileNotFoundError:
    print("XML file not found.")
except ElementTree.ParseError:
    print("Invalid XML file.")

Read How to Check If a File Exists and Create It If Not in Python?

Conclusion

In this tutorial, I explained how to read XML files in Python. I discussed XML structure, parsing XML with ElementTree, accessing XML elements and attributes, dealing with namespace, iterating over XML elements, and handling errors and exceptions.

How to Read XML Files in Python?

XML Structure

Parse XML with ElementTree

Access XML Elements and Attributes

Deal with Namespaces

Iterate over XML Elements

Handle Errors and Exceptions

Conclusion

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends