Welcome to WebmasterWorld Guest from 34.201.121.213

Forum Moderators: Ocean10000

Message Too Old, No Replies

Flight Deck Bot (experimental)

Experiment fail

     
12:26 am on Jul 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


UA: Flight Deck Bot (experimental)
robots.txt: No
IP: 172.203.51.205 (Rackspace USA, reverse DNS flightdeckreports.com)

Only fetched the root page, but even so, experiment FAIL, as it didn't fetch robots.txt
3:05 am on July 16, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


172.203.51.205 is in an AOL IP range, not Rackspace.

rDNS: accb33cd.ipt.aol.com
4:53 pm on July 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1988
votes: 73


173.203.51.205 as Flight Deck Bot (experimental) - (Slicehost RACKS-8(Rackspace Hosting RSCP-NET-4))

Nuked on a first TRY.
7:13 pm on July 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2041
votes: 1


Came around a bit ago:

mail.flightdeckreports.com
Flight Deck Bot (experimental)

robots.txt? NO

Goo shows that subdomain's same rIP = 173.203.51.205 (San Antonio Slicehost; a.k.a. Rackspace).

FWIW: Goo for flightdeckreports.com shows viewable cached versions of their admin log files.....
8:50 pm on July 16, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3254
votes: 18


Same here from a couple of days ago. Couldn't find any real evidence of their existence (eg web site) at the time and nothing except logs in google.
6:47 am on July 21, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2041
votes: 1


UA's changed, bot's bad conduct hasn't:

mail.flightdeckreports.com
Flight Deck Bot 1.3 beta (http://www.flightdeckreports.com/bot)

robots.txt? NO
10:30 pm on July 21, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


On their web page "Yes. While we do crawl the home page of your site, we do not crawl beyond that if your robots.txt file prohibits it."
How it does that without actually getting the robots.txt file? Unless it is using a totally different UA and IP address (which is bad practice).
10:48 pm on July 24, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1988
votes: 73


In a similar fasion, 20 stories UP, 173.203.71.246 is causing some mess on one of my sites for the past couple of days. Requesting everything from phpinfo.php - /cgi-bin/cgihelper.cgi to sending binary data in request body. pesky little fellow.
12:44 am on Aug 3, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Well this one isn't that pesky, as it has only asked for the root page so far. Just visited me again with the new UA as mentioned above, still no robots.txt
7:46 pm on Oct 22, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 11, 2004
posts: 62
votes: 0


Hello All,

This is my bot and by reading this thread it looks like I need to get it to check the robots.txt file. I wasn't sure if I really had to since I'm only visiting the homepage.

Thanks and see you all at Pubcon!
Jeff
12:45 am on Oct 23, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2041
votes: 1


The standard Disallow --

User-agent: *
Disallow: /

-- means everything is off-limits, including home pages, so thanks in advance for coding your bot to read and heed robots.txt, ditto robots META tags.

FWIW:

The Robots Exclusion Standard, a.k.a. the Robots Exclusion Protocol, dates back to the mid 1990s. [en.wikipedia.org...] See also: "The Web Robots Pages" [robotstxt.org...]
1:05 am on Dec 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2041
votes: 1


@devitnow, I'm eagerly awaiting your bot finally reading/heeding robots.txt, etc. You're up to v1.9 now, and sporting another name change --

mail.flightdeckreports.com
FlightDeckReports Bot 1.9 beta (http://www.flightdeckreports.com/bot)
robots.txt? NO

-- so here's hoping your very next update will respect the standard (& sites that do). Thank you!