Forum Moderators: open

Message Too Old, No Replies

Flight Deck Bot (experimental)

Experiment fail

         

Dijkgraaf

12:26 am on Jul 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



UA: Flight Deck Bot (experimental)
robots.txt: No
IP: 172.203.51.205 (Rackspace USA, reverse DNS flightdeckreports.com)

Only fetched the root page, but even so, experiment FAIL, as it didn't fetch robots.txt

keyplyr

3:05 am on Jul 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



172.203.51.205 is in an AOL IP range, not Rackspace.

rDNS: accb33cd.ipt.aol.com

blend27

4:53 pm on Jul 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



173.203.51.205 as Flight Deck Bot (experimental) - (Slicehost RACKS-8(Rackspace Hosting RSCP-NET-4))

Nuked on a first TRY.

Pfui

7:13 pm on Jul 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Came around a bit ago:

mail.flightdeckreports.com
Flight Deck Bot (experimental)

robots.txt? NO

Goo shows that subdomain's same rIP = 173.203.51.205 (San Antonio Slicehost; a.k.a. Rackspace).

FWIW: Goo for flightdeckreports.com shows viewable cached versions of their admin log files.....

dstiles

8:50 pm on Jul 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Same here from a couple of days ago. Couldn't find any real evidence of their existence (eg web site) at the time and nothing except logs in google.

Pfui

6:47 am on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA's changed, bot's bad conduct hasn't:

mail.flightdeckreports.com
Flight Deck Bot 1.3 beta (http://www.flightdeckreports.com/bot)

robots.txt? NO

Dijkgraaf

10:30 pm on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On their web page "Yes. While we do crawl the home page of your site, we do not crawl beyond that if your robots.txt file prohibits it."
How it does that without actually getting the robots.txt file? Unless it is using a totally different UA and IP address (which is bad practice).

blend27

10:48 pm on Jul 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In a similar fasion, 20 stories UP, 173.203.71.246 is causing some mess on one of my sites for the past couple of days. Requesting everything from phpinfo.php - /cgi-bin/cgihelper.cgi to sending binary data in request body. pesky little fellow.

Dijkgraaf

12:44 am on Aug 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well this one isn't that pesky, as it has only asked for the root page so far. Just visited me again with the new UA as mentioned above, still no robots.txt

devitnow

7:46 pm on Oct 22, 2010 (gmt 0)

10+ Year Member



Hello All,

This is my bot and by reading this thread it looks like I need to get it to check the robots.txt file. I wasn't sure if I really had to since I'm only visiting the homepage.

Thanks and see you all at Pubcon!
Jeff

Pfui

12:45 am on Oct 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The standard Disallow --

User-agent: *
Disallow: /

-- means everything is off-limits, including home pages, so thanks in advance for coding your bot to read and heed robots.txt, ditto robots META tags.

FWIW:

The Robots Exclusion Standard, a.k.a. the Robots Exclusion Protocol, dates back to the mid 1990s. [en.wikipedia.org...] See also: "The Web Robots Pages" [robotstxt.org...]

Pfui

1:05 am on Dec 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@devitnow, I'm eagerly awaiting your bot finally reading/heeding robots.txt, etc. You're up to v1.9 now, and sporting another name change --

mail.flightdeckreports.com
FlightDeckReports Bot 1.9 beta (http://www.flightdeckreports.com/bot)
robots.txt? NO

-- so here's hoping your very next update will respect the standard (& sites that do). Thank you!