diff options
Diffstat (limited to 'vendor/fguillot/picofeed/docs/grabber.markdown')
-rw-r--r-- | vendor/fguillot/picofeed/docs/grabber.markdown | 35 |
1 files changed, 25 insertions, 10 deletions
diff --git a/vendor/fguillot/picofeed/docs/grabber.markdown b/vendor/fguillot/picofeed/docs/grabber.markdown index 6a7dd2ada..2098b25d0 100644 --- a/vendor/fguillot/picofeed/docs/grabber.markdown +++ b/vendor/fguillot/picofeed/docs/grabber.markdown @@ -6,33 +6,48 @@ The web scraper is useful for feeds that display only a summary of articles, the How the content grabber works? ------------------------------ -1. Try with rules first (xpath patterns) for the domain name (see `PicoFeed\Rules\`) +1. Try with rules first (XPath queries) for the domain name (see `PicoFeed\Rules\`) 2. Try to find the text content by using common attributes for class and id 3. Finally, if nothing is found, the feed content is displayed -**The best results are obtained with Xpath rules file.** +**The best results are obtained with XPath rules file.** How to use the content scraper? ------------------------------- +Before parsing all items, just call the method `$parser->enableContentGrabber()`: + ```php -use PicoFeed\Reader; +use PicoFeed\Reader\Reader; +use PicoFeed\PicoFeedException; + +try { -$reader = new Reader; -$reader->download('http://www.egscomics.com/rss.php'); + $reader = new Reader; -$parser = $reader->getParser(); + // Return a resource + $resource = $reader->download('http://www.egscomics.com/rss.php'); -if ($parser !== false) { + // Return the right parser instance according to the feed format + $parser = $reader->getParser( + $resource->getUrl(), + $resource->getContent(), + $resource->getEncoding() + ); - $parser->enableContentGrabber(); // <= Enable the content grabber + // Enable content grabber before parsing items + $parser->enableContentGrabber(); + + // Return a Feed object $feed = $parser->execute(); - // ... +} +catch (PicoFeedException $e) { + // Do Something... } ``` When the content scraper is enabled, everything will be slower. -For each item a new HTTP request is made and the HTML downloaded is parsed with XML/Xpath. +**For each item a new HTTP request is made** and the HTML downloaded is parsed with XML/XPath. Configuration ------------- |