Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: open

Message Too Old, No Replies

Spidering pages that do not exist

     
12:49 pm on Feb 6, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:May 20, 2003
posts:493
votes: 0


I just checked my logs and saw that Jeeves is spidering a whole bunch of pages that don't exist and never existed. What it's doing is taking a valid file name and adding a space and a 1 to it, something like "/filename.html 1" but it also took "/ 4" and "/ 19" which don't exist at all. It had no problem getting my robots.txt file though. Any reason why this would happen?

Jennifer

1:47 pm on Feb 7, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2001
posts:2488
votes: 0


Hi Jennifer,

Is it definetly AJ or is teoma's crawlers, or could it be bogus!

8:04 pm on Feb 7, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:May 20, 2003
posts:493
votes: 0


Unfortunately, this thing has dropped out of my error log, but if I see it again, I'll be sure to check the IP. Thanks for the suggestion.

Jennifer

8:57 pm on Feb 7, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:May 14, 2002
posts:378
votes: 0


It was AJ for sure. I saw this in the logs this week also. I've seen almost every engine do this at one time or another. My conclusion is that they are testing 404s.
11:44 pm on Feb 7, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:May 20, 2003
posts:493
votes: 0


Hmm, why would they want to test 404's?

Jennifer

12:15 am on Feb 8, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Sept 19, 2002
posts:269
votes: 0


We get lots of this kind "GET /example.html%201 HTTP/1.0", agent "Mozilla/2.0 (compatible; Ask Jeeves/Teoma)", IP points to UUnet.