Welcome to WebmasterWorld Guest from 54.161.64.174

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Bots Picking up weird files

Google, MSN mainly, but Slurp sometimes too

     
6:49 pm on Dec 27, 2005 (gmt 0)

10+ Year Member



Googlebot and MSNbot have both been crawling my site looking for files that do not exist. I have been hoping that they would give up after sometime but they haven't. I have done a redirect for one file, but it still isn't taking. I wish I had more to say, but that is all I have for now.

I'll update with better examples.

7:08 pm on Dec 27, 2005 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Have you seen any with almost random-letter filenames?

or any with 404probe as a part of the URL?

7:22 pm on Dec 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google, MSN mainly, but Slurp sometimes too

That looks as if there are links pointing to those files. And as long as those links exist, bots won't give up asking.
7:27 pm on Dec 27, 2005 (gmt 0)

10+ Year Member



Yeah, I am thinking there are links out there, but I can't find them. Any tips on how to locate these links? I promise as soon as I get another weird one I will post it here.
7:42 pm on Dec 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could try searching for www.example.com strangefilename.html
2:26 pm on Dec 28, 2005 (gmt 0)

10+ Year Member



Perfect, I found one BIG reason for my problems. Some research team in Mannheim was using my robots.txt file for research purposes without asking! They put the links online ... but incorrectly! Geez!

Okay ... but then I have things like this ...

They were trying to access http://www.example.com/Folder name/'name of client' from (Direct Request)

They were using the following browser:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; YPC 3.0.0; .NET CLR 1.0.3705)

First, notice that there are spaces, not %20 which is what the original file has (and the original file name doesn't look anything like this).

Any ideas?

2:41 pm on Jan 6, 2006 (gmt 0)

10+ Year Member



A visitor to your site just got a 404 error.

They were trying to access http://www.example.com/services_custom_reporting.htm from (Direct Request)

They were using the following browser:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
---------------------------------------------
This page has never existed, I can't find a link to it on the web by search, anyone know how I can get a hold of where Googlebot is getting this link from?

 

Featured Threads

Hot Threads This Week

Hot Threads This Month