Forum Moderators: phranque

Message Too Old, No Replies

Barring Teleport

It's very naughty!

         

Matt Probert

7:06 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We are having trouble with thieves sucking our entire site using Teleport Pro/1.29.

Testing Teleport reveals it uses anonymous user-agent fields (my test showed 'Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0)') while the session that caught my attention identified itself as 'Teleport Pro/1.29'

It doesn't respect robots.txt

It does run multiple threaded server side scripts which place a heck of a strain on the server.

Anyone any bright ideas on barring this pest?

Matt

Dijkgraaf

7:16 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



See
Blocking Badly Behaved Bots [webmasterworld.com]

Matt Probert

10:23 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the suggestion. That looks like a very effective solution, unfortunately we get over 50,000 page requests a day, so the processing of that script would be a problem.

Matt

Dijkgraaf

10:26 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it always coming from the same IP address?
If so you can try blocking that.

inbound

4:32 pm on Feb 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



50,000 page requests per day means that you will probably peak at 1 to 2 page requests a second, that's not a problem for this script (I've tested it on our dedicated server on much larger loads than that).

I'm seriously considering using a version of this on a large yellow-pages style site (with 14 million pages in a directory structure - bot fodder). All of the testing that I've done suggests that if you are on a budget then this script is a very good solution.

Remember that it also allows you to 'white flag' known IP ranges (thus reducing the added processing requirements when being hit by the big search engines). There is also a positive effect on processing requirements by blocking rogue bots, if a bot is sucking up pages then it's much more likely to slow page delivery if unchecked.

Matt Probert

5:50 pm on Feb 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



50,000 page requests per day means that you will probably peak at 1 to 2 page requests a second

Believe me, it's not that linear. We regularly see 800 simultaneous connections (yes I know, they're not all page requests, most of them are for images and the like) and 12 or more simultaneous scripts being run (these can be an issue).

Please don't misunderstand me. I'm sure that script is excellent, it's just not right for our particular circumstances. Thanks anyway.

Matt