Forum Moderators: open
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 AutoPager/0.5.2.2 (http://www.teesoft.info/)"
AutoPager is a Firefox extension which automatically loads the next page of a site inline when you reach the end of the current page for infinite scrolling of content. By default AutoPager works with a ton of sites...
...except my site, it went BOINK!
I bounce everything with http:// in the user agent.
Too bad software hacks, deal with it ;)
I bounce everything with http:// in the user agent.
How come you dislike http:// in a UA?
A half dozen unique visitors are using it each of the last 2 days. I really don't understand how this thing makes for a better web experience. Seems like some users will try anything just because somebody made it?
2.) The UAs are platform- and FF-specific. For example:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:1.9.0.12) Gecko/2009070609 Firefox/3.0.12 AutoPager/0.5.2.2 (http://www.teesoft.info/)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 AutoPager/0.5.2.2 (http://www.teesoft.info/)
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 GTB5 AutoPager/0.5.2.2 (http://www.teesoft.info/)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko/2009060308 Ubuntu/9.04 (jaunty) Firefox/3.0.11 AutoPager/0.5.2.2 (http://www.teesoft.info/)
3.) Like Bill, I also block all UAs containing http, and then I !whitelist only a very few legit robots/crawlers hailing from their legit IPs/Hosts. It's simply easier to limit ALL new or iffy or misbehaving apps' access to (a cgi-served full Disallow version of) robots.txt until which time I decide to authorize them.
Therefore, one access-control logic implementation is that if the UA string is not on the whitelist of UAs allowed to contain "http://" and if it does contain "http://" then it gets immediately kicked to the curb with a 403 response.
So on some of my sites, known robots' UA strings are allowed to contain "http://" while all other UA strings (such as those for browsers) are not, so they get a 403. But since these attempts can still be logged, there's always the opportunity for redemption if we look at the logged UA string and go visit the robot's 'bot page' to see what it's doing.
Beware of thinking that someone else's approach to access control is "wrong" just because of the necessarily-simplified descriptions posted here; There are a million ways to do things, and while some are objectively "wrong," there are many more that are subjectively "just right" for an individual Webmaster, the fact that the same methods may be subjectively "all wrong" for your own site... So there are lots of different approaches reflected/implied in the threads posted here, not necessarily right or wrong, just different.
Jim
Therefore, one access-control logic implementation is that if the UA string is not on the whitelist of UAs allowed to contain "http://" and if it does contain "http://" then it gets immediately kicked to the curb with a 403 response.
Exactly.
If a bot is to let us know about its bot page wherein it tells us what it's doing, how to stop it, slow it down, and things like that, how else do you propose it do so?
I agree we need the URL in the UA to find those bot pages.
However, I can use that same URL in the UA to help ID and stop unauthorized bots as well.
URLs in the UA cuts both ways ;)
I installed it and took it for a spin on a couple sites, including my own sites. While it does not live up to its promise of loading consecutive web pages at *my* sites (because of my intentional architecture) it does offer several manipulative tools that I do not want altering my web pages, although this type of "ripping" mentality is out of the Genie's bottle nowadays.
Anyway, I'm blocking it. Over the last week I only see a half dozen daily users with it. If I see an increasing trend in use, I may reevaluate this decision.
Beware of thinking that someone else's approach to access control is "wrong" just because of the necessarily-simplified descriptions posted here
Anyway, I'm blocking it.
The AutoPager Firefox extension automatically loads the next page of a site inline when you reach the end of the current page for infinite scrolling of content.
It includes a adblock similar features to allow you filter out the ads from the contents in the loaded page contents.
It works well with most of the greasemonkey scripts.
By default AutoPager works with a ton of sites, including Lifehacker, the New York Times, Digg, and, of course, Google. If you want to add your own custom autopaging to unsupported sites, the site wizard feature makes it easy (first pick the Next link, then pick only the content you want loaded. The site workshop provide more features like auto discovery the links and content.
It's configuration is base on XPath. You can find there is a built in function to create a XPath by click some links on the pages. This extension will import online configuration from this sources,these configurations includes support for some widely used sites and some general support for forums.
You can also share your site rules by click public button in the setting dialog.
They obnoxiously, if cleverly, turned this prefetcher loose on Google searches/results with proper names very specific to our site and our site's name -- in other words, not accidental. They could've simply come in the front door but no-o-o: They ran not one but seven similar searches using:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729) AutoPager/0.5.2.2 (http://www.teesoft.info/)
All results were met with 403 (which states no auto-anything). Finally they stopped throwing themselves against a barred door and switched to, and browsed normally without, FF+AutoPager:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
A pain for them? Maybe. A (thwarted) strain on us? Definitely.