Best Way to Parse RSS/Atom Feeds with PHP
Magpie RSS is a popular library for parsing RSS and Atom feeds in PHP, but it's known to fail when encountering malformed feeds. Hence, alternative options may be necessary.
One of the recommended alternatives is to use PHP's built-in SimpleXML functions. SimpleXML provides an intuitive structure for parsing XML documents, including RSS and Atom feeds. It also detects and handles XML warnings and errors. If an error occurs, the feed source can be cleaned using a tool like HTML Tidy before attempting to parse it again.
Here's a simple class using SimpleXML to parse an RSS feed:
class BlogPost { var $date; var $ts; var $link; var $title; var $text; } class BlogFeed { var $posts = array(); function __construct($file_or_url) { $file_or_url = $this->resolveFile($file_or_url); if (!($x = simplexml_load_file($file_or_url))) return; foreach ($x->channel->item as $item) { $post = new BlogPost(); $post->date = (string) $item->pubDate; $post->ts = strtotime($item->pubDate); $post->link = (string) $item->link; $post->title = (string) $item->title; $post->text = (string) $item->description; // Create summary as a shortened body and remove images, // extraneous line breaks, etc. $post->summary = $this->summarizeText($post->text); $this->posts[] = $post; } } private function resolveFile($file_or_url) { if (!preg_match('|^https?:|', $file_or_url)) $feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url; else $feed_uri = $file_or_url; return $feed_uri; } private function summarizeText($summary) { $summary = strip_tags($summary); // Truncate summary line to 100 characters $max_len = 100; if (strlen($summary) > $max_len) $summary = substr($summary, 0, $max_len) . '...'; return $summary; } }
This class provides methods for loading and parsing an RSS feed, extracting and storing individual posts, and summarizing post text for display purposes. By using SimpleXML, this class can effectively and reliably handle well-formed and malformed RSS feeds.
The above is the detailed content of What\'s the Best Way to Parse RSS/Atom Feeds in PHP Using SimpleXML?. For more information, please follow other related articles on the PHP Chinese website!