diff options
author | Bernhard Posselt <dev@bernhard-posselt.com> | 2015-01-27 09:29:09 +0100 |
---|---|---|
committer | Bernhard Posselt <dev@bernhard-posselt.com> | 2015-01-27 09:29:09 +0100 |
commit | 73f65c8fbadbdd2098448e77b6d3f0464ad8613e (patch) | |
tree | f22ba63a222fb4f7d05427b661f3c008170047fd /vendor/fguillot/picofeed/docs | |
parent | be37aed9f5d923fe16e264c6ffc97db08503b791 (diff) |
update picofeed
Diffstat (limited to 'vendor/fguillot/picofeed/docs')
m--------- | vendor/fguillot/picofeed | 0 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/config.markdown | 286 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/debugging.markdown | 86 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/exceptions.markdown | 28 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/favicon.markdown | 81 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/feed-creation.markdown | 74 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/feed-parsing.markdown | 226 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/grabber.markdown | 136 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/image-proxy.markdown | 66 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/installation.markdown | 67 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/opml-export.markdown | 46 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/opml-import.markdown | 19 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/tests.markdown | 14 |
13 files changed, 0 insertions, 1129 deletions
diff --git a/vendor/fguillot/picofeed b/vendor/fguillot/picofeed new file mode 160000 +Subproject 0a1d0d3950f7f047dc8fb1d80aa6296e15f306d diff --git a/vendor/fguillot/picofeed/docs/config.markdown b/vendor/fguillot/picofeed/docs/config.markdown deleted file mode 100644 index 75546abd1..000000000 --- a/vendor/fguillot/picofeed/docs/config.markdown +++ /dev/null @@ -1,286 +0,0 @@ -Configuration -============= - -How to use the Config object ----------------------------- - -To change the default parameters, you have to use the Config class. -Create a new instance and pass it to the Reader object like that: - -```php -use PicoFeed\Reader\Reader; -use PicoFeed\Config\Config; - -$config = new Config; -$config->setClientUserAgent('My custom RSS Reader') - ->setProxyHostname('127.0.0.1') - ->setProxyPort(8118); - -$reader = new Reader($config); -... -``` - -HTTP Client parameters ----------------------- - -### Connection timeout - -- Method name: `setClientTimeout()` -- Default value: 10 seconds -- Argument value: number of seconds (integer) - -```php -$config->setClientTimeout(20); // 20 seconds -``` - -### User Agent - -- Method name: `setClientUserAgent()` -- Default value: `PicoFeed (https://github.com/fguillot/picoFeed)` -- Argument value: string - -```php -$config->setClientUserAgent('My RSS reader'); -``` - -### Maximum HTTP redirections - -- Method name: `setMaxRedirections()` -- Default value: 5 -- Argument value: integer - -```php -$config->setMaxRedirections(10); -``` - -### Maximum HTTP body response size - -- Method name: `setMaxBodySize()` -- Default value: 2097152 (2MB) -- Argument value: value in bytes (integer) - -```php -$config->setMaxBodySize(10485760); // 10MB -``` - -### Proxy hostname - -- Method name: `setProxyHostname()` -- Default value: empty -- Argument value: string - -```php -$config->setProxyHostname('proxy.example.org'); -``` - -### Proxy port - -- Method name: `setProxyPort()` -- Default value: 3128 -- Argument value: port number (integer) - -```php -$config->setProxyPort(8118); -``` - -### Proxy username - -- Method name: `setProxyUsername()` -- Default value: empty -- Argument value: string - -```php -$config->setProxyUsername('myuser'); -``` - -### Proxy password - -- Method name: `setProxyPassword()` -- Default value: empty -- Argument value: string - -```php -$config->setProxyPassword('mysecret'); -``` - -Content grabber ---------------- - -### Connection timeout - -- Method name: `setGrabberTimeout()` -- Default value: 10 seconds -- Argument value: number of seconds (integer) - -```php -$config->setGrabberTimeout(20); // 20 seconds -``` - -### User Agent - -- Method name: `setGrabberUserAgent()` -- Default value: `PicoFeed (https://github.com/fguillot/picoFeed)` -- Argument value: string - -```php -$config->setGrabberUserAgent('My content scraper'); -``` - -Parser ------- - -### Hash algorithm used for item id generation - -- Method name: `setParserHashAlgo()` -- Default value: `sha256` -- Argument value: any value returned by the function `hash_algos()` (string) -- See: http://php.net/hash_algos - -```php -$config->setParserHashAlgo('sha1'); -``` - -### Disable item content filtering - -- Method name: `setContentFiltering()` -- Default value: true (filtering is enabled by default) -- Argument value: boolean - -```php -$config->setContentFiltering(false); -``` - -### Timezone - -- Method name: `setTimezone()` -- Default value: UTC -- Argument value: See https://php.net/manual/en/timezones.php (string) -- Note: define the timezone for items/feeds - -```php -$config->setTimezone('Europe/Paris'); -``` - -Logging -------- - -### Timezone - -- Method name: `setTimezone()` -- Default value: UTC -- Argument value: See https://php.net/manual/en/timezones.php (string) -- Note: define the timezone for the logging class - -```php -$config->setTimezone('Europe/Paris'); -``` - -Filter ------- - -### Set the iframe whitelist (allowed iframe sources) - -- Method name: `setFilterIframeWhitelist()` -- Default value: See the Filter class source code -- Argument value: array - -```php -$config->setFilterIframeWhitelist(['http://www.youtube.com', 'http://www.vimeo.com']); -``` - -### Define HTML integer attributes - -- Method name: `setFilterIntegerAttributes()` -- Default value: See the Filter class source code -- Argument value: array - -```php -$config->setFilterIntegerAttributes(['width', 'height']); -``` - -### Add HTML attributes automatically - -- Method name: `setFilterAttributeOverrides()` -- Default value: See the Filter class source code -- Argument value: array - -```php -$config->setFilterAttributeOverrides(['a' => ['target' => '_blank']); -``` - -### Set the list of required attributes for tags - -- Method name: `setFilterRequiredAttributes()` -- Default value: See the Filter class source code -- Argument value: array -- Note: If the required attributes are not there, the tag is stripped - -```php -$config->setFilterRequiredAttributes(['a' => 'href', 'img' => 'src']); -``` - -### Set the resource blacklist (Ads blocker) - -- Method name: `setFilterMediaBlacklist()` -- Default value: See the Filter class source code -- Argument value: array -- Note: Tags are stripped if they have those URLs - -```php -$config->setFilterMediaBlacklist(['feeds.feedburner.com', 'share.feedsportal.com']); -``` - -### Define which attributes are used for external resources - -- Method name: `setFilterMediaAttributes()` -- Default value: See the Filter class source code -- Argument value: array - -```php -$config->setFilterMediaAttributes(['src', 'href']); -``` - -### Define the scheme whitelist - -- Method name: `setFilterSchemeWhitelist()` -- Default value: See the Filter class source code -- Argument value: array -- See: http://en.wikipedia.org/wiki/URI_scheme - -```php -$config->setFilterSchemeWhitelist(['http://', 'ftp://']); -``` - -### Define the tags and attributes whitelist - -- Method name: `setFilterWhitelistedTags()` -- Default value: See the Filter class source code -- Argument value: array -- Note: Only those tags are allowed everything else is stripped - -```php -$config->setFilterWhitelistedTags(['a' => ['href'], 'img' => ['src', 'title']]); -``` - -### Define a image proxy url - -- Method name: `setFilterImageProxyUrl()` -- Default value: Empty -- Argument value: string - -```php -$config->setFilterImageProxyUrl('http://myproxy.example.org/?url=%s'); -``` - -### Define a image proxy callback - -- Method name: `setFilterImageProxyCallback()` -- Default value: null -- Argument value: Closure - -```php -$config->setFilterImageProxyCallback(function ($image_url) { - $key = hash_hmac('sha1', $image_url, 'secret'); - return 'https://mypublicproxy/'.$key.'/'.urlencode($image_url); -}); -```
\ No newline at end of file diff --git a/vendor/fguillot/picofeed/docs/debugging.markdown b/vendor/fguillot/picofeed/docs/debugging.markdown deleted file mode 100644 index a9f8ab163..000000000 --- a/vendor/fguillot/picofeed/docs/debugging.markdown +++ /dev/null @@ -1,86 +0,0 @@ -Debugging -========= - -Logging -------- - -PicoFeed log in memory the execution flow, if a feed doesn't work correctly it's easy to see what is wrong. - -### Reading messages - -```php -use PicoFeed\Logging\Logger; - -// All messages are stored inside an Array -print_r(Logger::getMessages()); -``` - -You will got an output like that: - -```php -Array -( - [0] => Fetch URL: http://petitcodeur.fr/feed.xml - [1] => Etag: - [2] => Last-Modified: - [3] => cURL total time: 0.711378 - [4] => cURL dns lookup time: 0.001064 - [5] => cURL connect time: 0.100733 - [6] => cURL speed download: 74825 - [7] => HTTP status code: 200 - [8] => HTTP headers: Set-Cookie => start=R2701971637; path=/; expires=Sat, 06-Jul-2013 05:16:33 GMT - [9] => HTTP headers: Date => Sat, 06 Jul 2013 03:55:52 GMT - [10] => HTTP headers: Content-Type => application/xml - [11] => HTTP headers: Content-Length => 53229 - [12] => HTTP headers: Connection => close - [13] => HTTP headers: Server => Apache - [14] => HTTP headers: Last-Modified => Tue, 02 Jul 2013 03:26:02 GMT - [15] => HTTP headers: ETag => "393e79c-cfed-4e07ee78b2680" - [16] => HTTP headers: Accept-Ranges => bytes - .... -) -``` - -### Remove messages - -All messages are stored in memory, if you need to clear them just call the method `Logger::deleteMessages()`: - -```php -Logger::deleteMessages(); -``` - -Command line utility -==================== - -PicoFeed provides a basic command line tool to debug feeds quickly. -The tool is located in the root directory project. - -### Usage - -```bash -$ ./picofeed -Usage: -./picofeed feed <feed-url> # Parse a feed a dump the ouput on stdout -./picofeed debug <feed-url> # Display all logging messages for a feed -./picofeed item <feed-url> <item-id> # Fetch only one item -./picofeed nofilter <feed-url> <item-id> # Fetch an item but with no content filtering -``` - -### Example - -```bash -$ ./picofeed debug https://linuxfr.org/ -Exception thrown ===> "Invalid SSL certificate" -Array -( - [0] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Fetch URL: https://linuxfr.org/ - [1] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Etag provided: - [2] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Last-Modified provided: - [3] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL total time: 1.850634 - [4] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL dns lookup time: 0.00093 - [5] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL connect time: 0.115213 - [6] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL speed download: 0 - [7] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL effective url: https://linuxfr.org/ - [8] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL error: SSL certificate problem: Invalid certificate chain -) -``` diff --git a/vendor/fguillot/picofeed/docs/exceptions.markdown b/vendor/fguillot/picofeed/docs/exceptions.markdown deleted file mode 100644 index 399ba3ef6..000000000 --- a/vendor/fguillot/picofeed/docs/exceptions.markdown +++ /dev/null @@ -1,28 +0,0 @@ -Exceptions -========== - -All exceptions inherits from the standard `Exception` class. - -### Library Exceptions - -- `PicoFeed\PicoFeedException`: Base class exception for the library - -### Client Exceptions - -- `PicoFeed\Client\ClientException`: Base exception class for the Client class -- `PicoFeed\Client\InvalidCertificateException`: Invalid SSL certificate -- `PicoFeed\Client\InvalidUrlException`: Malformed URL, page not found (404), unable to establish a connection -- `PicoFeed\Client\MaxRedirectException`: Maximum of HTTP redirections reached -- `PicoFeed\Client\MaxSizeException`: The response size exceeds to maximum allowed -- `PicoFeed\Client\TimeoutException`: Connection timeout - -### Parser Exceptions - -- `PicoFeed\Parser\ParserException`: Base exception class for the Parser class -- `PicoFeed\Parser\MalformedXmlException`: XML Parser error - -### Reader Exceptions - -- `PicoFeed\Reader\ReaderException`: Base exception class for the Reader -- `PicoFeed\Reader\SubscriptionNotFoundException`: Unable to find a feed for the given website -- `PicoFeed\Reader\UnsupportedFeedFormatException`: Unable to detect the feed format diff --git a/vendor/fguillot/picofeed/docs/favicon.markdown b/vendor/fguillot/picofeed/docs/favicon.markdown deleted file mode 100644 index 1ac3ee1fc..000000000 --- a/vendor/fguillot/picofeed/docs/favicon.markdown +++ /dev/null @@ -1,81 +0,0 @@ -Favicon fetcher -=============== - -Find and download the favicon ------------------------------ - -```php -use PicoFeed\Reader\Favicon; - -$favicon = new Favicon; - -// The icon link is https://bits.wikimedia.org/favicon/wikipedia.ico -$icon_link = $favicon->find('https://en.wikipedia.org/'); -$icon_content = $favicon->getContent(); -``` - -PicoFeed will try first to find the favicon from the meta tags and fallback to the `favicon.ico` located in the website's root if nothing is found. - -- `Favicon::find()` returns the favicon absolute url or an empty string if nothing is found. -- `Favicon::getContent()` returns the favicon file content (binary content) - -When the HTML page is parsed, relative links and protocol relative links are converted to absolute url. - -Get Favicon file type ---------------------- - -It's possible to fetch the image type, this information come from the Content-Type HTTP header: - -```php -$favicon = new Favicon; -$favicon->find('http://example.net/'); - -echo $favicon->getType(); - -// Will output the content type, by example "image/png" -``` - -Get the Favicon as Data URI ---------------------------- - -You can also get the whole image as Data URI. -It's useful if you want to store the icon in your database and avoid too many HTTP requests. - -```php -$favicon = new Favicon; -$favicon->find('http://example.net/'); - -echo $favicon->getDataUri(); - -// Output something like that: data:image/png;base64,iVBORw0KGgoAAAANSUh..... -``` - -See: http://en.wikipedia.org/wiki/Data_URI_scheme - -Check if a favicon link exists ------------------------------- - -```php -use PicoFeed\Reader\Favicon; - -$favicon = new Favicon; - -// Return true if the file exists -var_dump($favicon->exists('http://php.net/favicon.ico')); -``` - -Use personalized HTTP settings ------------------------------- - -Like other classes, the Favicon class support the Config object as constructor argument: - -```php -use PicoFeed\Config\Config; -use PicoFeed\Reader\Favicon; - -$config = new Config; -$config->setClientUserAgent('My RSS Reader'); - -$favicon = new Favicon($config); -$favicon->find('https://github.com'); -```
\ No newline at end of file diff --git a/vendor/fguillot/picofeed/docs/feed-creation.markdown b/vendor/fguillot/picofeed/docs/feed-creation.markdown deleted file mode 100644 index 35a24a9d7..000000000 --- a/vendor/fguillot/picofeed/docs/feed-creation.markdown +++ /dev/null @@ -1,74 +0,0 @@ -Feed creation -============= - -PicoFeed can also generate Atom and RSS feeds. - -Generate RSS 2.0 feed ----------------------- - -```php -use PicoFeed\Syndication\Rss20; - -$writer = new Rss20(); -$writer->title = 'My site'; -$writer->site_url = 'http://boo/'; -$writer->feed_url = 'http://boo/feed.atom'; -$writer->author = array( - 'name' => 'Me', - 'url' => 'http://me', - 'email' => 'me@here' -); - -$writer->items[] = array( - 'title' => 'My article 1', - 'updated' => strtotime('-2 days'), - 'url' => 'http://foo/bar', - 'summary' => 'Super summary', - 'content' => '<p>content</p>' -); - -$writer->items[] = array( - 'title' => 'My article 2', - 'updated' => strtotime('-1 day'), - 'url' => 'http://foo/bar2', - 'summary' => 'Super summary 2', - 'content' => '<p>content 2 © 2015</p>', - 'author' => array( - 'name' => 'Me too', - ) -); - -$writer->items[] = array( - 'title' => 'My article 3', - 'url' => 'http://foo/bar3' -); - -echo $writer->execute(); -``` - -Generate Atom feed ------------------- - -```php -use PicoFeed\Syndication\Atom; - -$writer = new Atom(); -$writer->title = 'My site'; -$writer->site_url = 'http://boo/'; -$writer->feed_url = 'http://boo/feed.atom'; -$writer->author = array( - 'name' => 'Me', - 'url' => 'http://me', - 'email' => 'me@here' -); - -$writer->items[] = array( - 'title' => 'My article 1', - 'updated' => strtotime('-2 days'), - 'url' => 'http://foo/bar', - 'summary' => 'Super summary', - 'content' => '<p>content</p>' -); - -echo $writer->execute(); -``` diff --git a/vendor/fguillot/picofeed/docs/feed-parsing.markdown b/vendor/fguillot/picofeed/docs/feed-parsing.markdown deleted file mode 100644 index d00e08364..000000000 --- a/vendor/fguillot/picofeed/docs/feed-parsing.markdown +++ /dev/null @@ -1,226 +0,0 @@ -Feed parsing -============ - -Parsing a subscription ----------------------- - -```php -use PicoFeed\Reader\Reader; -use PicoFeed\PicoFeedException; - -try { - - $reader = new Reader; - - // Return a resource - $resource = $reader->download('http://linuxfr.org/news.atom'); - - // Return the right parser instance according to the feed format - $parser = $reader->getParser( - $resource->getUrl(), - $resource->getContent(), - $resource->getEncoding() - ); - - // Return a Feed object - $feed = $parser->execute(); - - // Print the feed properties with the magic method __toString() - echo $feed; -} -catch (PicoFeedException $e) { - // Do Something... -} -``` - -- The Reader class is the entry point for feed reading -- The method `download()` fetch the remote content and return a resource, an instance of `PicoFeed\Client\Client` -- The method `getParser()` returns a Parser instance according to the feed format Atom, Rss 2.0... -- The parser itself returns a `Feed` object that contains feed and item properties - -Output: - -```bash -Feed::id = tag:linuxfr.org,2005:/news -Feed::title = LinuxFr.org : les dépêches -Feed::feed_url = http://linuxfr.org/news.atom -Feed::site_url = http://linuxfr.org/news -Feed::date = 1415138079 -Feed::language = en-US -Feed::description = -Feed::logo = -Feed::items = 15 items -Feed::isRTL() = false ----- -Item::id = 38d8f48284fb03940cbb3aff9101089b81e44efb1281641bdd7c3e7e4bf3b0cd -Item::title = openSUSE 13.2 : nouvelle version du caméléon disponible ! -Item::url = http://linuxfr.org/news/opensuse-13-2-nouvelle-version-du-cameleon-disponible -Item::date = 1415122640 -Item::language = en-US -Item::author = Syvolc -Item::enclosure_url = -Item::enclosure_type = -Item::isRTL() = false -Item::content = 18307 bytes -.... -``` - -Get the list of available subscriptions for a website ------------------------------------------------------ - -The example below will returns all available subscriptions for the website: - -```php -use PicoFeed\Reader\Reader; - -try { - - $reader = new Reader; - $resource = $reader->download('http://www.cnn.com'); - - $feeds = $reader->find( - $resource->getUrl(), - $resource->getContent() - ); - - print_r($feeds); -} -catch (PicoFeedException $e) { - // Do something... -} -``` - -Output: - -```php -Array -( - [0] => http://rss.cnn.com/rss/cnn_topstories.rss - [1] => http://rss.cnn.com/rss/cnn_latest.rss -) -``` - -Feed discovery and parsing --------------------------- - -This example will discover automatically the subscription and parse the feed: - -```php -try { - - $reader = new Reader; - $resource = $reader->discover('http://linuxfr.org'); - - $parser = $reader->getParser( - $resource->getUrl(), - $resource->getContent(), - $resource->getEncoding() - ); - - $feed = $parser->execute(); - echo $feed; -} -catch (PicoFeedException $e) { -} -``` - -HTTP caching ------------- - -PicoFeed supports HTTP caching to avoid unnecessary processing. - -1. After the first download, save in your database the values of the Etag and LastModified HTTP headers -2. For the next requests, provide those values to the `download()` method and check if the feed was modified or not - -Here an example: - -```php -try { - - // Fetch from your database the previous values of the Etag and LastModified headers - $etag = '...'; - $last_modified = '...'; - - $reader = new Reader; - - // Provide those values to the download method - $resource = $reader->download('http://linuxfr.org/news.atom', $last_modified, $etag); - - // Return true if the remote content has changed - if ($resource->isModified()) { - - $parser = $reader->getParser( - $resource->getUrl(), - $resource->getContent(), - $resource->getEncoding() - ); - - $feed = $parser->execute(); - - // Save your feed in your database - // ... - - // Store the Etag and the LastModified headers in your database for the next requests - $etag = $resource->getEtag(); - $last_modified = $resource->getLastModified(); - - // ... - } - else { - - echo 'Not modified, nothing to do!'; - } -} -catch (PicoFeedException $e) { - // Do something... -} -``` - - -Feed and item properties ------------------------- - -```php -// Feed object -$feed->getId(); // Unique feed id -$feed->getTitle(); // Feed title -$feed->getFeedUrl(); // Feed url -$feed->getSiteUrl(); // Website url -$feed->getDate(); // Feed last updated date -$feed->getLanguage(); // Feed language -$feed->getDescription(); // Feed description -$feed->getLogo(); // Feed logo (can be a large image, different from icon) -$feed->getItems(); // List of item objects - -// Item object -$feed->items[0]->getId(); // Item unique id (hash) -$feed->items[0]->getTitle(); // Item title -$feed->items[0]->getUrl(); // Item url -$feed->items[0]->getDate(); // Item published date (timestamp) -$feed->items[0]->getLanguage(); // Item language -$feed->items[0]->getAuthor(); // Item author -$feed->items[0]->getEnclosureUrl(); // Enclosure url -$feed->items[0]->getEnclosureType(); // Enclosure mime-type (audio/mp3, image/png...) -$feed->items[0]->getContent(); // Item content (filtered or raw) -$feed->items[0]->isRTL(); // Return true if the item language is Right-To-Left -``` - -RTL language detection ----------------------- - -Use the method `Item::isRTL()` to test if an item is RTL or not: - -```php -var_dump($item->isRTL()); // true or false -``` - -Known RTL languages are: - -- Arabic (ar-**) -- Farsi (fa-**) -- Urdu (ur-**) -- Pashtu (ps-**) -- Syriac (syr-**) -- Divehi (dv-**) -- Hebrew (he-**) -- Yiddish (yi-**) diff --git a/vendor/fguillot/picofeed/docs/grabber.markdown b/vendor/fguillot/picofeed/docs/grabber.markdown deleted file mode 100644 index b99b756ed..000000000 --- a/vendor/fguillot/picofeed/docs/grabber.markdown +++ /dev/null @@ -1,136 +0,0 @@ -Web scraper -=========== - -The web scraper is useful for feeds that display only a summary of articles, the scraper can download and parse the full content from the original website. - -How the content grabber works? ------------------------------- - -1. Try with rules first (XPath queries) for the domain name (see `PicoFeed\Rules\`) -2. Try to find the text content by using common attributes for class and id -3. Finally, if nothing is found, the feed content is displayed - -**The best results are obtained with XPath rules file.** - -Standalone usage ----------------- - -```php -<?php - -use PicoFeed\Client\Grabber; - -$grabber = new Grabber($item_url); -$grabber->download(); -$grabber->parse(); - -// Get raw HTML content -echo $grabber->getRawContent(); - -// Get relevant content -echo $grabber->getContent(); - -// Get filtered relevant content -echo $grabber->getFilteredContent(); -``` - -Fetch full item contents during feed parsing --------------------------------------------- - -Before parsing all items, just call the method `$parser->enableContentGrabber()`: - -```php -<?php - -use PicoFeed\Reader\Reader; -use PicoFeed\PicoFeedException; - -try { - - $reader = new Reader; - - // Return a resource - $resource = $reader->download('http://www.egscomics.com/rss.php'); - - // Return the right parser instance according to the feed format - $parser = $reader->getParser( - $resource->getUrl(), - $resource->getContent(), - $resource->getEncoding() - ); - - // Enable content grabber before parsing items - $parser->enableContentGrabber(); - - // Return a Feed object - $feed = $parser->execute(); -} -catch (PicoFeedException $e) { - // Do Something... -} -``` - -When the content scraper is enabled, everything will be slower. -**For each item a new HTTP request is made** and the HTML downloaded is parsed with XML/XPath. - -Configuration -------------- - -### Enable content grabber for items - -- Method name: `enableContentGrabber()` -- Default value: false (content grabber is disabled by default) -- Argument value: none - -```php -$parser->enableContentGrabber(); -``` - -### Ignore item urls for the content grabber - -- Method name: `setGrabberIgnoreUrls()` -- Default value: empty (fetch all item urls) -- Argument value: array (list of item urls to ignore) - -```php -$parser->setGrabberIgnoreUrls(['http://foo', 'http://bar']); -``` - -How to write a grabber rules file? ----------------------------------- - -Add a PHP file to the directory `PicoFeed\Rules`, the filename must be the same as the domain name: - -Example with the BBC website, `www.bbc.co.uk.php`: - -```php -<?php -return array( - 'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833', - 'body' => array( - '//div[@class="story-body"]', - ), - 'strip' => array( - '//script', - '//form', - '//style', - '//*[@class="story-date"]', - '//*[@class="story-header"]', - '//*[@class="story-related"]', - '//* |