How can I convert XML to a Pandas DataFrame efficiently?-Python Tutorial-php.cn

How can I convert XML to a Pandas DataFrame efficiently?

Barbara Streisand

Release： 2024-11-30 19:46:11

Original

716 people have browsed it

How can I convert XML to a Pandas DataFrame efficiently?

Converting XML to a Pandas DataFrame Efficiently

XML files can often contain valuable data that can be analyzed using tools such as Pandas. To convert an XML file to a DataFrame, an effective approach can be found below:

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

Copy after login

Explanation:

The iter_docs generator function iterates over the XML document, extracting author attributes, document attributes, and the content of the document's text node into a dictionary.
The Pandas DataFrame is then constructed from a list of dictionaries generated by the iter_docs function.

Additional Notes:

The example XML provided in the question assumes a single author. If there are multiple authors, an additional generator function, iter_author, can be used to iterate over each author and yield all their respective document dictionaries. This would require modifying the last line of the example code to:

doc_df = pd.DataFrame(list(iter_author(etree)))

Copy after login

For further guidance on working with XML in Python, refer to the ElementTree tutorial in the xml library documentation.

The above is the detailed content of How can I convert XML to a Pandas DataFrame efficiently?. For more information, please follow other related articles on the PHP Chinese website!