Bots Picking up weird files

Forum Moderators: DixonJones

Message Too Old, No Replies

Bots Picking up weird files

Google, MSN mainly, but Slurp sometimes too

Kate82

6:49 pm on Dec 27, 2005 (gmt 0)

Googlebot and MSNbot have both been crawling my site looking for files that do not exist. I have been hoping that they would give up after sometime but they haven't. I have done a redirect for one file, but it still isn't taking. I wish I had more to say, but that is all I have for now.

I'll update with better examples.

g1smd

7:08 pm on Dec 27, 2005 (gmt 0)

Have you seen any with almost random-letter filenames?

or any with 404probe as a part of the URL?

Span

7:22 pm on Dec 27, 2005 (gmt 0)

Google, MSN mainly, but Slurp sometimes too

That looks as if there are links pointing to those files. And as long as those links exist, bots won't give up asking.

Kate82

7:27 pm on Dec 27, 2005 (gmt 0)

Yeah, I am thinking there are links out there, but I can't find them. Any tips on how to locate these links? I promise as soon as I get another weird one I will post it here.

Span

7:42 pm on Dec 27, 2005 (gmt 0)

You could try searching for www.example.com strangefilename.html

Kate82

2:26 pm on Dec 28, 2005 (gmt 0)

Perfect, I found one BIG reason for my problems. Some research team in Mannheim was using my robots.txt file for research purposes without asking! They put the links online ... but incorrectly! Geez!

Okay ... but then I have things like this ...

They were trying to access http://www.example.com/Folder name/'name of client' from (Direct Request)

They were using the following browser:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; YPC 3.0.0; .NET CLR 1.0.3705)

First, notice that there are spaces, not %20 which is what the original file has (and the original file name doesn't look anything like this).

Any ideas?

Kate82

2:41 pm on Jan 6, 2006 (gmt 0)

A visitor to your site just got a 404 error.

They were trying to access http://www.example.com/services_custom_reporting.htm from (Direct Request)

They were using the following browser:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
---------------------------------------------
This page has never existed, I can't find a link to it on the web by search, anyone know how I can get a hold of where Googlebot is getting this link from?