如何解析300M+的XML文件?
Jun 06, 2016 pm 08:35 PM
背景:
1、手上有几个大的xml文件,基本都在300M至600M之间;
2、XML内容包括title,co-author,abstract,Affiliation等;
3、用的是xmlreader进行解析;
遇到的问题:
如果解析所有内容,经常只能把XML文件的一部分解析出来,似乎是内存不够的迹象;
如果只把title或Affiliation单独解析出来,就能全部解析XML文件;
附上代码:
<code>set_time_limit(0); header("Content-Type: text/html;charset=utf-8"); $num=0; $reader = new XMLReader(); $reader->open("JACS.xml"); while ($reader->read()) { if($reader->nodeType==XMLREADER::ELEMENT) { if ($reader->localName == "PubmedArticle") { $num++; echo 'Number:'.$num; while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "PubDate") { while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Year") { $reader->read(); echo 'PublicationDate:'.$reader->value.' '; break; } } } while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Month") { $reader->read(); echo $reader->value.' '; break; } } } while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Day") { $reader->read(); echo $reader->value; break; } } } echo '<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Title") { $reader->read(); echo 'JournalName:'.$reader->value.'<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "ArticleTitle") { $reader->read(); echo 'ArticleTitle:'.$reader->value.'<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "AbstractText") { $reader->read(); echo 'Abstract:'.$reader->value.'<br><br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Affiliation") { $reader->read(); echo 'Affiliation:'.$reader->value.'<br><br>'; break; } } } } } } $reader->close(); } </code>
回复内容:
背景:
1、手上有几个大的xml文件,基本都在300M至600M之间;
2、XML内容包括title,co-author,abstract,Affiliation等;
3、用的是xmlreader进行解析;
遇到的问题:
如果解析所有内容,经常只能把XML文件的一部分解析出来,似乎是内存不够的迹象;
如果只把title或Affiliation单独解析出来,就能全部解析XML文件;
附上代码:
<code>set_time_limit(0); header("Content-Type: text/html;charset=utf-8"); $num=0; $reader = new XMLReader(); $reader->open("JACS.xml"); while ($reader->read()) { if($reader->nodeType==XMLREADER::ELEMENT) { if ($reader->localName == "PubmedArticle") { $num++; echo 'Number:'.$num; while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "PubDate") { while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Year") { $reader->read(); echo 'PublicationDate:'.$reader->value.' '; break; } } } while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Month") { $reader->read(); echo $reader->value.' '; break; } } } while ($reader->read()){ if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Day") { $reader->read(); echo $reader->value; break; } } } echo '<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Title") { $reader->read(); echo 'JournalName:'.$reader->value.'<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "ArticleTitle") { $reader->read(); echo 'ArticleTitle:'.$reader->value.'<br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "AbstractText") { $reader->read(); echo 'Abstract:'.$reader->value.'<br><br>'; break; } } } while ($reader->read()) { if ($reader->nodeType == XMLREADER::ELEMENT) { if ($reader->localName == "Affiliation") { $reader->read(); echo 'Affiliation:'.$reader->value.'<br><br>'; break; } } } } } } $reader->close(); } </code>
可以参考一下 这个 PHP处理比较大的XML文件
为啥要装那么大 txt打开那么大也死机了 多分几个文件吧

Artikel Panas

Alat panas Tag

Artikel Panas

Tag artikel panas

Notepad++7.3.1
Editor kod yang mudah digunakan dan percuma

SublimeText3 versi Cina
Versi Cina, sangat mudah digunakan

Hantar Studio 13.0.1
Persekitaran pembangunan bersepadu PHP yang berkuasa

Dreamweaver CS6
Alat pembangunan web visual

SublimeText3 versi Mac
Perisian penyuntingan kod peringkat Tuhan (SublimeText3)

Topik panas

Panduan Pemasangan dan Naik Taraf PHP 8.4 untuk Ubuntu dan Debian

Cara Menyediakan Kod Visual Studio (Kod VS) untuk Pembangunan PHP
