Home > Backend Development > PHP Tutorial > 如何解析300M+的XML文件?

如何解析300M+的XML文件?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
Release: 2016-06-06 20:35:05
Original
1340 people have browsed it

背景:
1、手上有几个大的xml文件,基本都在300M至600M之间;
2、XML内容包括title,co-author,abstract,Affiliation等;
3、用的是xmlreader进行解析;

遇到的问题:
如果解析所有内容,经常只能把XML文件的一部分解析出来,似乎是内存不够的迹象;
如果只把title或Affiliation单独解析出来,就能全部解析XML文件;

附上代码:

<code>set_time_limit(0);
header("Content-Type: text/html;charset=utf-8");
$num=0;
$reader = new XMLReader();
$reader->open("JACS.xml");
while ($reader->read()) {

    if($reader->nodeType==XMLREADER::ELEMENT) {
             if ($reader->localName == "PubmedArticle") {
                  $num++;
                  echo 'Number:'.$num;
                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "PubDate") {
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Year") {
                                                   $reader->read();
                                                   echo 'PublicationDate:'.$reader->value.' ';
                                                   break;
                                               }

                                         }
                                     }
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Month") {
                                                   $reader->read();
                                                   echo $reader->value.' ';
                                                   break;
                                               }

                                         }
                                     }
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Day") {
                                                   $reader->read();
                                                   echo $reader->value;
                                                   break;
                                               }

                                         }
                                     }
                                     echo '<br>';
                                     break;
                                   }

                            }
                        }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "Title") {
                                    $reader->read();
                                    echo 'JournalName:'.$reader->value.'<br>';
                                    break;
                                }

                       }
                    }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "ArticleTitle") {
                                    $reader->read();
                                    echo 'ArticleTitle:'.$reader->value.'<br>';
                                    break;
                                }

                       }
                    }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "AbstractText") {
                                    $reader->read();
                                    echo 'Abstract:'.$reader->value.'<br><br>';
                                    break;
                                }

                       }
                    }




                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "Affiliation") {
                                    $reader->read();
                                    echo 'Affiliation:'.$reader->value.'<br><br>';
                                    break;
                                }

                       }
                    }

                  }
                }
            }
    $reader->close();
}
</code>
Copy after login
Copy after login

回复内容:

背景:
1、手上有几个大的xml文件,基本都在300M至600M之间;
2、XML内容包括title,co-author,abstract,Affiliation等;
3、用的是xmlreader进行解析;

遇到的问题:
如果解析所有内容,经常只能把XML文件的一部分解析出来,似乎是内存不够的迹象;
如果只把title或Affiliation单独解析出来,就能全部解析XML文件;

附上代码:

<code>set_time_limit(0);
header("Content-Type: text/html;charset=utf-8");
$num=0;
$reader = new XMLReader();
$reader->open("JACS.xml");
while ($reader->read()) {

    if($reader->nodeType==XMLREADER::ELEMENT) {
             if ($reader->localName == "PubmedArticle") {
                  $num++;
                  echo 'Number:'.$num;
                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "PubDate") {
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Year") {
                                                   $reader->read();
                                                   echo 'PublicationDate:'.$reader->value.' ';
                                                   break;
                                               }

                                         }
                                     }
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Month") {
                                                   $reader->read();
                                                   echo $reader->value.' ';
                                                   break;
                                               }

                                         }
                                     }
                                     while ($reader->read()){
                                         if ($reader->nodeType == XMLREADER::ELEMENT) {
                                               if ($reader->localName == "Day") {
                                                   $reader->read();
                                                   echo $reader->value;
                                                   break;
                                               }

                                         }
                                     }
                                     echo '<br>';
                                     break;
                                   }

                            }
                        }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "Title") {
                                    $reader->read();
                                    echo 'JournalName:'.$reader->value.'<br>';
                                    break;
                                }

                       }
                    }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "ArticleTitle") {
                                    $reader->read();
                                    echo 'ArticleTitle:'.$reader->value.'<br>';
                                    break;
                                }

                       }
                    }

                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "AbstractText") {
                                    $reader->read();
                                    echo 'Abstract:'.$reader->value.'<br><br>';
                                    break;
                                }

                       }
                    }




                    while ($reader->read()) {
                       if ($reader->nodeType == XMLREADER::ELEMENT) {
                                if ($reader->localName == "Affiliation") {
                                    $reader->read();
                                    echo 'Affiliation:'.$reader->value.'<br><br>';
                                    break;
                                }

                       }
                    }

                  }
                }
            }
    $reader->close();
}
</code>
Copy after login
Copy after login

可以参考一下 这个 PHP处理比较大的XML文件

为啥要装那么大 txt打开那么大也死机了 多分几个文件吧

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Issues
objective-c - socket sends data in xml format
From 1970-01-01 08:00:00
0
0
0
How to parse and process HTML/XML in PHP?
From 1970-01-01 08:00:00
0
0
0
How to parse and process HTML/XML in PHP?
From 1970-01-01 08:00:00
0
0
0
How to parse and process HTML/XML using PHP?
From 1970-01-01 08:00:00
0
0
0
Update xml namespace with data from PHP form
From 1970-01-01 08:00:00
0
0
0
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template