Forum Moderators: open

Message Too Old, No Replies

AutoPager

Egads

         

incrediBILL

8:14 am on Aug 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the category of more junk you don't want messing with your website we find AutoPager.

"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 AutoPager/0.5.2.2 (http://www.teesoft.info/)"

AutoPager is a Firefox extension which automatically loads the next page of a site inline when you reach the end of the current page for infinite scrolling of content. By default AutoPager works with a ton of sites...

...except my site, it went BOINK!

I bounce everything with http:// in the user agent.

Too bad software hacks, deal with it ;)

GaryK

4:14 pm on Aug 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I bounce everything with http:// in the user agent.

For me, at least, an http:// in a UA means a link to a bot page. I like seeing those pages provided they give me some useful information. Often it's just enough information to decide I should ban them. But at least it gave them a shot at not being banned.

How come you dislike http:// in a UA?

keyplyr

5:52 pm on Aug 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I agree w/ Gary about favoring the bot info URL in the UA string. However I'm on the fence with this gizmo whether it's something I want messing with how my web pages do or do not load. Seems like it would create false positives for page loads?

A half dozen unique visitors are using it each of the last 2 days. I really don't understand how this thing makes for a better web experience. Seems like some users will try anything just because somebody made it?

Pfui

6:43 pm on Aug 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) Last June, I started blocking UAs containing this pre-fetching/pre-loading/free-loading plugin's names. Since that time I've never had a single real person touch base upon being met with a special 403.

2.) The UAs are platform- and FF-specific. For example:

Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:1.9.0.12) Gecko/2009070609 Firefox/3.0.12 AutoPager/0.5.2.2 (http://www.teesoft.info/)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 AutoPager/0.5.2.2 (http://www.teesoft.info/)
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 GTB5 AutoPager/0.5.2.2 (http://www.teesoft.info/)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko/2009060308 Ubuntu/9.04 (jaunty) Firefox/3.0.11 AutoPager/0.5.2.2 (http://www.teesoft.info/)

3.) Like Bill, I also block all UAs containing http, and then I !whitelist only a very few legit robots/crawlers hailing from their legit IPs/Hosts. It's simply easier to limit ALL new or iffy or misbehaving apps' access to (a cgi-served full Disallow version of) robots.txt until which time I decide to authorize them.

incrediBILL

8:06 pm on Aug 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How come you dislike http:// in a UA?

Who said I dislike the http in the UA?

I love it, makes it easier to spot junk to bounce!

Anything that attempts to advertise in the UA gets kicked to the curb unless it's whitelisted.

Very effective method.

GaryK

6:15 pm on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I bounce everything with http:// in the user agent.

This made it seem to me you dislike http:// in the UA.

If a bot is to let us know about its bot page wherein it tells us what it's doing, how to stop it, slow it down, and things like that, how else do you propose it do so?

jdMorgan

6:53 pm on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember that incrediBill and many others (myself included) use whitelisting and not blacklisting, as he notes above.

Therefore, one access-control logic implementation is that if the UA string is not on the whitelist of UAs allowed to contain "http://" and if it does contain "http://" then it gets immediately kicked to the curb with a 403 response.

So on some of my sites, known robots' UA strings are allowed to contain "http://" while all other UA strings (such as those for browsers) are not, so they get a 403. But since these attempts can still be logged, there's always the opportunity for redemption if we look at the logged UA string and go visit the robot's 'bot page' to see what it's doing.

Beware of thinking that someone else's approach to access control is "wrong" just because of the necessarily-simplified descriptions posted here; There are a million ways to do things, and while some are objectively "wrong," there are many more that are subjectively "just right" for an individual Webmaster, the fact that the same methods may be subjectively "all wrong" for your own site... So there are lots of different approaches reflected/implied in the threads posted here, not necessarily right or wrong, just different.

Jim

incrediBILL

10:12 pm on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Therefore, one access-control logic implementation is that if the UA string is not on the whitelist of UAs allowed to contain "http://" and if it does contain "http://" then it gets immediately kicked to the curb with a 403 response.

Exactly.

If a bot is to let us know about its bot page wherein it tells us what it's doing, how to stop it, slow it down, and things like that, how else do you propose it do so?

I agree we need the URL in the UA to find those bot pages.

However, I can use that same URL in the UA to help ID and stop unauthorized bots as well.

URLs in the UA cuts both ways ;)

keyplyr

5:43 am on Aug 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[back to topic]

I installed it and took it for a spin on a couple sites, including my own sites. While it does not live up to its promise of loading consecutive web pages at *my* sites (because of my intentional architecture) it does offer several manipulative tools that I do not want altering my web pages, although this type of "ripping" mentality is out of the Genie's bottle nowadays.

Anyway, I'm blocking it. Over the last week I only see a half dozen daily users with it. If I see an increasing trend in use, I may reevaluate this decision.

GaryK

5:03 pm on Aug 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Beware of thinking that someone else's approach to access control is "wrong" just because of the necessarily-simplified descriptions posted here

I don't need to be reminded about the basics, Jim. I never said Bill's methods were wrong. I was just trying to understand his seeming conflicting statements better. With his last post he's cleared that up for me. :)

Anyway, I'm blocking it.

Same here. But more because of creating false page loads that skew my reports than anything else.

jdMorgan

6:33 pm on Aug 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> several manipulative tools that I do not want altering my web pages

What 'tools' does it have?

Jim

GaryK

9:15 pm on Aug 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From the AutoPage page addons.mozilla.org:

The AutoPager Firefox extension automatically loads the next page of a site inline when you reach the end of the current page for infinite scrolling of content.
It includes a adblock similar features to allow you filter out the ads from the contents in the loaded page contents.
It works well with most of the greasemonkey scripts.
By default AutoPager works with a ton of sites, including Lifehacker, the New York Times, Digg, and, of course, Google. If you want to add your own custom autopaging to unsupported sites, the site wizard feature makes it easy (first pick the Next link, then pick only the content you want loaded. The site workshop provide more features like auto discovery the links and content.
It's configuration is base on XPath. You can find there is a built in function to create a XPath by click some links on the pages. This extension will import online configuration from this sources,these configurations includes support for some widely used sites and some general support for forums.
You can also share your site rules by click public button in the setting dialog.

Pfui

10:03 am on Sep 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I mentioned upthread that 'I've never had a single real person touch base upon being met with a special 403.' I still haven't. But a bit ago I saw a real person behind the wheel...

They obnoxiously, if cleverly, turned this prefetcher loose on Google searches/results with proper names very specific to our site and our site's name -- in other words, not accidental. They could've simply come in the front door but no-o-o: They ran not one but seven similar searches using:

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729) AutoPager/0.5.2.2 (http://www.teesoft.info/)

All results were met with 403 (which states no auto-anything). Finally they stopped throwing themselves against a barred door and switched to, and browsed normally without, FF+AutoPager:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618)

A pain for them? Maybe. A (thwarted) strain on us? Definitely.