diff options
author | Bernhard Posselt <dev@bernhard-posselt.com> | 2015-01-27 09:31:40 +0100 |
---|---|---|
committer | Bernhard Posselt <dev@bernhard-posselt.com> | 2015-01-27 09:31:40 +0100 |
commit | 8241180c6ce0cb19255d70a3394f891e08182542 (patch) | |
tree | 325996a06d9896567957871cc0f34865c46118da /vendor/fguillot/picofeed/docs | |
parent | 73f65c8fbadbdd2098448e77b6d3f0464ad8613e (diff) |
dont use picofeed submodule
Diffstat (limited to 'vendor/fguillot/picofeed/docs')
m--------- | vendor/fguillot/picofeed | 0 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/config.markdown | 286 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/debugging.markdown | 102 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/exceptions.markdown | 28 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/favicon.markdown | 96 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/feed-creation.markdown | 74 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/feed-parsing.markdown | 226 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/grabber.markdown | 136 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/image-proxy.markdown | 67 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/installation.markdown | 68 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/opml-export.markdown | 46 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/opml-import.markdown | 19 | ||||
-rw-r--r-- | vendor/fguillot/picofeed/docs/tests.markdown | 14 |
13 files changed, 1162 insertions, 0 deletions
diff --git a/vendor/fguillot/picofeed b/vendor/fguillot/picofeed deleted file mode 160000 -Subproject 0a1d0d3950f7f047dc8fb1d80aa6296e15f306d diff --git a/vendor/fguillot/picofeed/docs/config.markdown b/vendor/fguillot/picofeed/docs/config.markdown new file mode 100644 index 000000000..75546abd1 --- /dev/null +++ b/vendor/fguillot/picofeed/docs/config.markdown @@ -0,0 +1,286 @@ +Configuration +============= + +How to use the Config object +---------------------------- + +To change the default parameters, you have to use the Config class. +Create a new instance and pass it to the Reader object like that: + +```php +use PicoFeed\Reader\Reader; +use PicoFeed\Config\Config; + +$config = new Config; +$config->setClientUserAgent('My custom RSS Reader') + ->setProxyHostname('127.0.0.1') + ->setProxyPort(8118); + +$reader = new Reader($config); +... +``` + +HTTP Client parameters +---------------------- + +### Connection timeout + +- Method name: `setClientTimeout()` +- Default value: 10 seconds +- Argument value: number of seconds (integer) + +```php +$config->setClientTimeout(20); // 20 seconds +``` + +### User Agent + +- Method name: `setClientUserAgent()` +- Default value: `PicoFeed (https://github.com/fguillot/picoFeed)` +- Argument value: string + +```php +$config->setClientUserAgent('My RSS reader'); +``` + +### Maximum HTTP redirections + +- Method name: `setMaxRedirections()` +- Default value: 5 +- Argument value: integer + +```php +$config->setMaxRedirections(10); +``` + +### Maximum HTTP body response size + +- Method name: `setMaxBodySize()` +- Default value: 2097152 (2MB) +- Argument value: value in bytes (integer) + +```php +$config->setMaxBodySize(10485760); // 10MB +``` + +### Proxy hostname + +- Method name: `setProxyHostname()` +- Default value: empty +- Argument value: string + +```php +$config->setProxyHostname('proxy.example.org'); +``` + +### Proxy port + +- Method name: `setProxyPort()` +- Default value: 3128 +- Argument value: port number (integer) + +```php +$config->setProxyPort(8118); +``` + +### Proxy username + +- Method name: `setProxyUsername()` +- Default value: empty +- Argument value: string + +```php +$config->setProxyUsername('myuser'); +``` + +### Proxy password + +- Method name: `setProxyPassword()` +- Default value: empty +- Argument value: string + +```php +$config->setProxyPassword('mysecret'); +``` + +Content grabber +--------------- + +### Connection timeout + +- Method name: `setGrabberTimeout()` +- Default value: 10 seconds +- Argument value: number of seconds (integer) + +```php +$config->setGrabberTimeout(20); // 20 seconds +``` + +### User Agent + +- Method name: `setGrabberUserAgent()` +- Default value: `PicoFeed (https://github.com/fguillot/picoFeed)` +- Argument value: string + +```php +$config->setGrabberUserAgent('My content scraper'); +``` + +Parser +------ + +### Hash algorithm used for item id generation + +- Method name: `setParserHashAlgo()` +- Default value: `sha256` +- Argument value: any value returned by the function `hash_algos()` (string) +- See: http://php.net/hash_algos + +```php +$config->setParserHashAlgo('sha1'); +``` + +### Disable item content filtering + +- Method name: `setContentFiltering()` +- Default value: true (filtering is enabled by default) +- Argument value: boolean + +```php +$config->setContentFiltering(false); +``` + +### Timezone + +- Method name: `setTimezone()` +- Default value: UTC +- Argument value: See https://php.net/manual/en/timezones.php (string) +- Note: define the timezone for items/feeds + +```php +$config->setTimezone('Europe/Paris'); +``` + +Logging +------- + +### Timezone + +- Method name: `setTimezone()` +- Default value: UTC +- Argument value: See https://php.net/manual/en/timezones.php (string) +- Note: define the timezone for the logging class + +```php +$config->setTimezone('Europe/Paris'); +``` + +Filter +------ + +### Set the iframe whitelist (allowed iframe sources) + +- Method name: `setFilterIframeWhitelist()` +- Default value: See the Filter class source code +- Argument value: array + +```php +$config->setFilterIframeWhitelist(['http://www.youtube.com', 'http://www.vimeo.com']); +``` + +### Define HTML integer attributes + +- Method name: `setFilterIntegerAttributes()` +- Default value: See the Filter class source code +- Argument value: array + +```php +$config->setFilterIntegerAttributes(['width', 'height']); +``` + +### Add HTML attributes automatically + +- Method name: `setFilterAttributeOverrides()` +- Default value: See the Filter class source code +- Argument value: array + +```php +$config->setFilterAttributeOverrides(['a' => ['target' => '_blank']); +``` + +### Set the list of required attributes for tags + +- Method name: `setFilterRequiredAttributes()` +- Default value: See the Filter class source code +- Argument value: array +- Note: If the required attributes are not there, the tag is stripped + +```php +$config->setFilterRequiredAttributes(['a' => 'href', 'img' => 'src']); +``` + +### Set the resource blacklist (Ads blocker) + +- Method name: `setFilterMediaBlacklist()` +- Default value: See the Filter class source code +- Argument value: array +- Note: Tags are stripped if they have those URLs + +```php +$config->setFilterMediaBlacklist(['feeds.feedburner.com', 'share.feedsportal.com']); +``` + +### Define which attributes are used for external resources + +- Method name: `setFilterMediaAttributes()` +- Default value: See the Filter class source code +- Argument value: array + +```php +$config->setFilterMediaAttributes(['src', 'href']); +``` + +### Define the scheme whitelist + +- Method name: `setFilterSchemeWhitelist()` +- Default value: See the Filter class source code +- Argument value: array +- See: http://en.wikipedia.org/wiki/URI_scheme + +```php +$config->setFilterSchemeWhitelist(['http://', 'ftp://']); +``` + +### Define the tags and attributes whitelist + +- Method name: `setFilterWhitelistedTags()` +- Default value: See the Filter class source code +- Argument value: array +- Note: Only those tags are allowed everything else is stripped + +```php +$config->setFilterWhitelistedTags(['a' => ['href'], 'img' => ['src', 'title']]); +``` + +### Define a image proxy url + +- Method name: `setFilterImageProxyUrl()` +- Default value: Empty +- Argument value: string + +```php +$config->setFilterImageProxyUrl('http://myproxy.example.org/?url=%s'); +``` + +### Define a image proxy callback + +- Method name: `setFilterImageProxyCallback()` +- Default value: null +- Argument value: Closure + +```php +$config->setFilterImageProxyCallback(function ($image_url) { + $key = hash_hmac('sha1', $image_url, 'secret'); + return 'https://mypublicproxy/'.$key.'/'.urlencode($image_url); +}); +```
\ No newline at end of file diff --git a/vendor/fguillot/picofeed/docs/debugging.markdown b/vendor/fguillot/picofeed/docs/debugging.markdown new file mode 100644 index 000000000..1356e0f72 --- /dev/null +++ b/vendor/fguillot/picofeed/docs/debugging.markdown @@ -0,0 +1,102 @@ +Debugging +========= + +Logging +------- + +PicoFeed can log **in memory** the execution flow, if a feed doesn't work correctly it's easy to see what is wrong. + +### Enable/disable logging + +The logging is **disabled by default** to avoid unnecessary memory usage. + +Enable logging: + +```php +use PicoFeed\Logging\Logger; + +Logger::enable(); + +// or change the flag value + +Logger::$enable = true; +``` + +### Reading messages + +```php +use PicoFeed\Logging\Logger; + +// All messages are stored inside an Array +print_r(Logger::getMessages()); +``` + +You will got an output like that: + +```php +Array +( + [0] => Fetch URL: http://petitcodeur.fr/feed.xml + [1] => Etag: + [2] => Last-Modified: + [3] => cURL total time: 0.711378 + [4] => cURL dns lookup time: 0.001064 + [5] => cURL connect time: 0.100733 + [6] => cURL speed download: 74825 + [7] => HTTP status code: 200 + [8] => HTTP headers: Set-Cookie => start=R2701971637; path=/; expires=Sat, 06-Jul-2013 05:16:33 GMT + [9] => HTTP headers: Date => Sat, 06 Jul 2013 03:55:52 GMT + [10] => HTTP headers: Content-Type => application/xml + [11] => HTTP headers: Content-Length => 53229 + [12] => HTTP headers: Connection => close + [13] => HTTP headers: Server => Apache + [14] => HTTP headers: Last-Modified => Tue, 02 Jul 2013 03:26:02 GMT + [15] => HTTP headers: ETag => "393e79c-cfed-4e07ee78b2680" + [16] => HTTP headers: Accept-Ranges => bytes + .... +) +``` + +### Remove messages + +All messages are stored in memory, if you need to clear them just call the method `Logger::deleteMessages()`: + +```php +Logger::deleteMessages(); +``` + +Command line utility +==================== + +PicoFeed provides a basic command line tool to debug feeds quickly. +The tool is located in the root directory project. + +### Usage + +```bash +$ ./picofeed +Usage: +./picofeed feed <feed-url> # Parse a feed a dump the ouput on stdout +./picofeed debug <feed-url> # Display all logging messages for a feed +./picofeed item <feed-url> <item-id> # Fetch only one item +./picofeed nofilter <feed-url> <item-id> # Fetch an item but with no content filtering +``` + +### Example + +```bash +$ ./picofeed debug https://linuxfr.org/ +Exception thrown ===> "Invalid SSL certificate" +Array +( + [0] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Fetch URL: https://linuxfr.org/ + [1] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Etag provided: + [2] => [2014-11-08 14:04:14] PicoFeed\Client\Curl Last-Modified provided: + [3] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL total time: 1.850634 + [4] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL dns lookup time: 0.00093 + [5] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL connect time: 0.115213 + [6] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL speed download: 0 + [7] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL effective url: https://linuxfr.org/ + [8] => [2014-11-08 14:04:16] PicoFeed\Client\Curl cURL error: SSL certificate problem: Invalid certificate chain +) +``` diff --git a/vendor/fguillot/picofeed/docs/exceptions.markdown b/vendor/fguillot/picofeed/docs/exceptions.markdown new file mode 100644 index 000000000..399ba3ef6 --- /dev/null +++ b/vendor/fguillot/picofeed/docs/exceptions.markdown @@ -0,0 +1,28 @@ +Exceptions +========== + +All exceptions inherits from the standard `Exception` class. + +### Library Exceptions + +- `PicoFeed\PicoFeedException`: Base class exception for the library + +### Client Exceptions + +- `PicoFeed\Client\ClientException`: Base exception class for the Client class +- `PicoFeed\Client\InvalidCertificateException`: Invalid SSL certificate +- `PicoFeed\Client\InvalidUrlException`: Malformed URL, page not found (404), unable to establish a connection +- `PicoFeed\Client\MaxRedirectException`: Maximum of HTTP redirections reached +- `PicoFeed\Client\MaxSizeException`: The response size exceeds to maximum allowed +- `PicoFeed\Client\TimeoutException`: Connection timeout + +### Parser Exceptions + +- `PicoFeed\Parser\ParserException`: Base exception class for the Parser class +- `PicoFeed\Parser\MalformedXmlException`: XML Parser error + +### Reader Exceptions + +- `PicoFeed\Reader\ReaderException`: Base exception class for the Reader +- `PicoFeed\Reader\SubscriptionNotFoundException`: Unable to find a feed for the given website +- `PicoFeed\Reader\UnsupportedFeedFormatException`: Unable to detect the feed format diff --git a/vendor/fguillot/picofeed/docs/favicon.markdown b/vendor/fguillot/picofeed/docs/favicon.markdown new file mode 100644 index 000000000..b5021690d --- /dev/null +++ b/vendor/fguillot/picofeed/docs/favicon.markdown @@ -0,0 +1,96 @@ +Favicon fetcher +=============== + +Find and download the favicon +----------------------------- + +```php +use PicoFeed\Reader\Favicon; + +$favicon = new Favicon; + +// The icon link is https://bits.wikimedia.org/favicon/wikipedia.ico +$icon_link = $favicon->find('https://en.wikipedia.org/'); +$icon_content = $favicon->getContent(); +``` + +PicoFeed will try first to find the favicon from the meta tags and fallback to the `favicon.ico` located in the website's root if nothing is found. + +- `Favicon::find()` returns the favicon absolute url or an empty string if nothing is found. +- `Favicon::getContent()` returns the favicon file content (binary content) + +When the HTML page is parsed, relative links and protocol relative links are converted to absolute url. + +Download a know favicon +----------------------- +It's possible to download a known favicon using the second optional parameter of Favicon::find(). The link to the favicon can be a relative or protocol relative url as well, but it has to be relative to the specified website. + +If the requested favicon could not be found, the HTML of the website is parsed instead, with the fallback to the `favicon.ico` located in the website's root. + +```php +use PicoFeed\Reader\Favicon; + +$favicon = new Favicon; + +$icon_link = $favicon->find('https://en.wikipedia.org/','https://bits.wikimedia.org/favicon/wikipedia.ico'); +$icon_content = $favicon->getContent(); +``` + +Get Favicon file type +--------------------- + +It's possible to fetch the image type, this information come from the Content-Type HTTP header: + +```php +$favicon = new Favicon; +$favicon->find('http://example.net/'); + +echo $favicon->getType(); + +// Will output the content type, by example "image/png" +``` + +Get the Favicon as Data URI +--------------------------- + +You can also get the whole image as Data URI. +It's useful if you want to store the icon in your database and avoid too many HTTP requests. + +```php +$favicon = new Favicon; +$favicon->find('http://example.net/'); + +echo $favicon->getDataUri(); + +// Output something like that: data:image/png;base64,iVBORw0KGgoAAAANSUh..... +``` + +See: http://en.wikipedia.org/wiki/Data_URI_scheme + +Check if a favicon link exists +------------------------------ + +```php +use PicoFeed\Reader\Favicon; + +$favicon = new Favicon; + +// Return true if the file exists +var_dump($favicon->exists('http://php.net/favicon.ico')); +``` + +Use personalized HTTP settings +------------------------------ + +Like other classes, the Favicon class support the Config object as constructor argument: + +```php +use PicoFeed\Config\Config; +use PicoFeed\Reader\Favicon; + +$config = new Config; +$config->setClientUserAgent('My RSS Reader'); + +$favicon = new Favicon($config); +$favicon->find('https://github.com'); +```
\ No newline at end of file diff --git a/vendor/fguillot/picofeed/docs/feed-creation.markdown b/vendor/fguillot/picofeed/docs/feed-creation.markdown new file mode 100644 index 000000000..35a24a9d7 --- /dev/null +++ b/vendor/fguillot/picofeed/docs/feed-creation.markdown @@ -0,0 +1,74 @@ +Feed creation +============= + +PicoFeed can also generate Atom and RSS feeds. + +Generate RSS 2.0 feed +---------------------- + +```php +use PicoFeed\Syndication\Rss20; + +$writer = new Rss20(); +$writer->title = 'My site'; +$writer->site_url = 'http://boo/'; +$writer->feed_url = 'http://boo/feed.atom'; +$writer->author = array( + 'name' => 'Me', + 'url' => 'http://me', + 'email' => 'me@here' +); + +$writer->items[] = array( + 'title' => 'My article 1', + 'updated' => strtotime('-2 days'), + 'url' => 'http://foo/bar', + 'summary' => 'Super summary', + 'content' => '<p>content</p>' +); + +$writer->items[] = array( + 'title' => 'My article 2', + 'updated' => strtotime('-1 day'), + 'url' => 'http://foo/bar2', + 'summary' => 'Super summary 2', + 'content' => '<p>content 2 © 2015</p>', + 'author' => array( + 'name' => 'Me too', + ) +); + +$writer->items[] = array( + 'title' => 'My article 3', + 'url' => 'http://foo/bar3' +); + +echo $writer->execute(); +``` + +Generate Atom feed +------------------ + +```php +use PicoFeed\Syndication\Atom; + +$writer = new Atom(); +$writer->title = 'My site'; +$writer->site_url = 'http://boo/'; +$writer->feed_url = 'http://boo/feed.atom'; +$writer->author = array( + 'name' => 'Me', + 'url' => 'http://me', + 'email' => 'me@here' +); + +$writer->items[] = array( + 'title' => 'My article 1', + 'updated' => strtotime('-2 days'), + 'url' => 'http://foo/bar', + 'summary' => 'Super summary', + 'content' => '<p>content</p>' +); + +echo $writer->execute(); +``` diff --git a/vendor/fguillot/picofeed/docs/feed-parsing.markdown b/vendor/fguillot/picofeed/docs/feed-parsing.markdown new file mode 100644 index 000000000..d00e08364 --- /dev/null +++ b/vendor/fguillot/picofeed/docs/feed-parsing.markdown @@ -0,0 +1,226 @@ +Feed parsing +============ + +Parsing a subscription +---------------------- + +```php +use PicoFeed\Reader\Reader; +use PicoFeed\PicoFeedException; + +try { + + $reader = new Reader; + + // Return a resource + $resource = $reader->download('http://linuxfr.org/news.atom'); + + // Return the right parser instance according to the feed format + $parser = $reader->getParser( + $resource->getUrl(), + $resource->getContent(), + $resource->getEncoding() + ); + + // Return a Feed object + $feed = $parser->execute(); + + // Print the feed properties with the magic method __toString() + echo $feed; +} +catch (PicoFeedException $e) { + // Do Something... +} +``` + +- The Reader class is the entry point for feed reading +- The method `download()` fetch the remote content and return a resource, an instance of `PicoFeed\Client\Client` +- The method `getParser()` returns a Parser instance according to the feed format Atom, Rss 2.0... +- The parser itself returns a `Feed` object that contains feed and item properties + +Output: + +```bash +Feed::id = tag:linuxfr.org,2005:/news +Feed::title = LinuxFr.org : les dépêches +Feed::feed_url = http://linuxfr.org/news.atom +Feed::site_url = http://linuxfr.org/news +Feed::date = 1415138079 +Feed::language = en-US +Feed::description = +Feed::logo = +Feed::items = 15 items +Feed::isRTL() = false +---- +Item::id = 38d8f48284fb03940cbb3aff9101089b81e44efb1281641bdd7c3e7e4bf3b0cd +Item::title = openSUSE 13.2 : nouvelle version du caméléon disponible ! +Item::url = http://linuxfr.org/news/opensuse-13-2-nouvelle-version-du-cameleon-disponible +Item::date = 1415122640 +Item::language = en-US +Item::author = Syvolc +Item::enclosure_url = +Item::enclosure_type = +Item::isRTL() = false +Item::content = 18307 bytes +.... +``` + +Get the list of available subscriptions for a website +----------------------------------------------------- + +The example below will returns all available subscriptions for the website: + +```php +use PicoFeed\Reader\Reader; + +try { + + $reader = new Reader; + $resource = $reader->download('http://www.cnn.com'); + + $feeds = $reader->find( + $resource->getUrl(), + $resource->getContent() + ); + + print_r($feeds); +} +catch (PicoFeedException $e) { + // Do something... +} +``` + +Output: + +```php +Array +( + [0] => http://rss.cnn.com/rss/cnn_topstories.rss + [1] => http://rss.cnn.com/rss/cnn_latest.rss +) +``` + +Feed discovery and parsing +-------------------------- + +This example will discover automatically the subscription and parse the feed: + +```php +try { + + $reader = new Reader; + $resource = $reader->discover('http://linuxfr.org'); + + $parser = $reader->getParser( + $resource->getUrl(), + $resource->getContent(), + $resource->getEncoding() + ); + + $feed = $parser->execute(); + echo $feed; +} +catch (PicoFeedException $e) { +} +``` + +HTTP caching +------------ + +PicoFeed supports HTTP caching to avoid unnecessary processing. + +1. After the first download, save in your database the values of the Etag and LastModified HTTP headers +2. For the next requests, provide those values to the `download()` method and check if the feed was modified or not + +Here an example: + +```php +try { + + // Fetch from your database the previous values of the Etag and LastModified headers + $etag = '...'; + $last_modified = '...'; + + $reader = new Reader; + + // Provide those values to the download method + $resource = $reader->download('http://linuxfr.org/news.atom', $last_modified, $etag); + + // Return true if the remote content has changed + if ($resource->isModified()) { + + $parser = $reader->getParser( + $resource->getUrl(), + $resource->getContent(), + $resource->getEncoding() + ); + + $feed = $parser->execute(); + + // Save your feed in your database + // ... + + // Store the Etag and the LastModified headers in your database for the next requests + $etag = $resource->getEtag(); + $last_modified = $resource->getLastModified(); + + // ... + } + else { + + echo 'Not modified, nothing to do!'; + } +} +catch (PicoFeedException $e) { + // Do something... +} +``` + + +Feed and item properties +------------------------ + +```php +// Feed object +$feed->getId(); // Unique feed id +$feed->getTitle(); // Feed title +$feed->getFeedUrl(); // Feed url +$feed->getSiteUrl(); // Website url +$feed->getDate(); // Feed last updated date +$feed->getLanguage(); // Feed language +$feed->getDescription(); // Feed description +$feed->getLogo(); // Feed logo (can be a large image, different from icon) +$feed->getItems(); // List of item objects + +// Item object +$feed->items[0]->getId(); // Item unique id (hash) +$feed->items[0]->getTitle(); // Item title +$feed->items[0]->getUrl(); // Item url +$feed->items[0]->getDate(); // Item published date (timestamp) +$feed->items[0]->getLanguage(); // Item language +$feed->items[0]->getAuthor(); // Item author +$feed->items[0]->getEnclosureUrl(); // Enclosure url +$feed->items[0]->getEnclosureType(); // Enclosure mime-type (audio/mp3, image/png...) +$feed->items[0]->getContent(); // Item content (filtered or raw) +$feed->items[0]->isRTL(); // Return true if the item language is Right-To-Left +``` + +RTL language detection +---------------------- + +Use the method `Item::isRTL()` to test if an item is RTL or not: + +```php +var_dump($item->isRTL()); // true or false +``` + +Known RTL languages are: + +- Arabic (ar-**) +- Farsi (fa-**) +- Urdu (ur-**) +- Pashtu (ps-**) +- Syriac (syr-**) +- Divehi (dv-**) +- Hebrew (he-**) +- Yiddish (yi-**) diff --git a/vendor/fguillot/picofeed/docs/grabber.markdown b/vendor/fguillot/picofeed/docs/grabber.markdown new file mode 100644 index 000000000..b99b756ed --- /dev/null +++ b/vendor/fguillot/picofeed/docs/grabber.markdown @@ -0,0 +1,136 @@ +Web scraper +=========== + +The web scraper is useful for feeds that display only a summary of articles, the scraper can download and parse the full content from the original website. + +How the content grabber works? +------------------------------ + +1. Try with rules first (XPath queries) for the domain name (see `PicoFeed\Rules\`) +2. Try to find the text content by using common attributes for class and id +3. Finally, if nothing is found, the feed content is displayed + +**The best results are obtained with XPath rules file.** + +Standalone usage +---------------- + +```php +<?php + +use PicoFeed\Client\Grabber; + +$grabber = new Grabber($item_url); +$grabber->download(); +$grabber->parse(); + +// Get raw HTML content +echo $grabber->getRawContent(); + +// Get relevant content +echo $grabber->getContent(); + +// Get filtered relevant content +echo $grabber->getFilteredContent(); +``` + +Fetch full item contents during feed parsing +-------------------------------------------- + +Before parsing all items, just call the method `$parser->enableContentGrabber()`: + +```php +<?php + +use PicoFeed\Reader\Reader; +use PicoFeed\PicoFeedException; + +try { + + $reader = new Reader; + + // Return a resource + $resource = $reader->download('http://www.egscomics.com/rss.php'); + + // Return the right parser instance according to the feed format + $parser = $reader->getParser( + $resource->getUrl(), + $resource->getContent(), + $resource->getEncoding() + ); + + // Enable content grabber before parsing items + $parser->enableContentGrabber(); + + // Return a Feed object + $feed = $parser->execute(); +} +catch (PicoFeedException $e) { + // Do Something... +} +``` + +When the content scraper is enabled, everything will be slower. +**For each item a new HTTP request is made** and the HTML downloaded is parsed with XML/XPath. + +Configuration +------------- + +### Enable content grabber for items + +- Method name: `enableContentGrabber()` +- Default value: false (content grabber is disabled by default) +- Argument value: none + +```php +$parser->enableContentGrabber(); +``` + |