homepage Welcome to WebmasterWorld Guest from 54.227.34.0
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
followthatpage
Not on my site!
Mokita

5+ Year Member



 
Msg#: 4384802 posted 11:08 am on Nov 8, 2011 (gmt 0)

Seen freshly today:

Agent: www.followthatpage.com
Host: followthatpage.com

Didn't get anything from my site ... except a 403.

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4384802 posted 11:20 am on Nov 8, 2011 (gmt 0)

Looks like a potential source for abuse, especially with the 'bulk upload' feature to upload tons of pages to track.

Block block block...

Um, what IP did it crawl from? 82.161.140.128?

Mokita

5+ Year Member



 
Msg#: 4384802 posted 11:40 am on Nov 8, 2011 (gmt 0)

Sorry Bill, can't give you a definitive IP, as my logs (on a shared server) only showed the following:

followthatpage.com - - [08/Nov/2011:xx:49:59 +xx00] "GET / HTTP/1.0" 302 219 "-" "www.followthatpage.com"
followthatpage.com - - [08/Nov/2011:xx:50:00 +xx00] "GET /403.htm HTTP/1.0" 200 181 "-" "www.followthatpage.com"

Robtex gives 80.126.0.111 as the IP, 80.126.0.0/15 (XS4ALL Internet) as the range:

[robtex.com...]

topr8

WebmasterWorld Senior Member topr8 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4384802 posted 1:21 pm on Nov 8, 2011 (gmt 0)

he says he obeys robots.txt is this true?

as i whitelist in robots.txt then it would by default be blocked, but i haven't seen this bot yet, so i don't know if it does.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4384802 posted 2:21 pm on Nov 8, 2011 (gmt 0)

whitelist in robots.txt then it would by default be blocked


I hope that's not your only line of defense (...because robots.txt blocks only those bots that read AND heed it -- a minority on my sites nowadays).

Mokita

5+ Year Member



 
Msg#: 4384802 posted 7:35 pm on Nov 8, 2011 (gmt 0)

he says he obeys robots.txt is this true?


It didn't request robots.txt.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4384802 posted 8:18 pm on Nov 8, 2011 (gmt 0)

It's not actually a crawler, technically only crawlers need to obey robots.txt

This is more of a link checker type of thing, one page requested, one page checked, not a crawl.

topr8

WebmasterWorld Senior Member topr8 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4384802 posted 7:27 am on Nov 9, 2011 (gmt 0)

>>I hope that's not your only line of defense

absolutely not, however some bots as you know obey it, so i consider it worth using.

>>It's not actually a crawler, technically only crawlers need to obey robots.txt

he says on his site that he obeys robots.txt, i was just wondering if he actually did.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved