Forum Moderators: open
216.39.48.2 - - [15/Nov/2002:00:08:14 -0500] "GET /company/wedding_gifts/wedding_guestbooks/flower_girl_baskets/bridesmaid_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.171 - - [15/Nov/2002:00:08:14 -0500] "GET /ring_bearer_pillows/wedding_guestbooks/wedding_guestbooks/flower_girl_gifts/css/wedding_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.81 - - [15/Nov/2002:00:08:14 -0500] "GET /ring_bearer_pillows/wedding_guestbooks/wedding_gifts/flower_girl_baskets/wedding_gifts/wedding_shower_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.2 - - [15/Nov/2002:00:08:20 -0500] "GET /ring_bearer_pillows/wedding_paintings/wedding_shower_gifts/resources/css/main HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.81 - - [15/Nov/2002:00:08:20 -0500] "GET /bridal_accessories/ring_bearer_pillows/wedding_guestbooks/css/wedding_paintings/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.171 - - [15/Nov/2002:00:08:22 -0500] "GET /company/bridesmaid_gifts/ring_bearer_pillows/wedding_shower_gifts/ring_bearer_pillows/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.2 - - [15/Nov/2002:00:08:28 -0500] "GET /ring_bearer_pillows/flower_girl_gifts/wedding_gifts/flower_girl_baskets/wedding_guestbooks/bridal_accessories/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
It's looking for files that don't exist by sticking my entire top nav in to one path. It has been coming by three tines a day for a week or so AFTER I PAID FOR INCLUSION to Altavista and has been looking for this concocted path.
My husband keeps telling me I need to do something with my HUGE error log and I keep telling him he needs to disallow the Scooter from our robots.txt. However, I paid for AV inclusion and I just want to get spidered once.
Any help out there?
Brina
I would guess that it is a mistake on your side, somwhere your linking is confusing it.
Does scooter request these from the get go or does it get what it wants a couple of times and then start with all of these non-existant pages? Compare your error logs with your access logs, it might give you an idea of where things are going wrong.
This is just something I noticed, and may not help at all, but here goes...
Based on your short logfile sample, the links that seem to confuse Scooter (as resolved by my browser) are all of the form:
yourdomain/subdirectory/
That is, they appear as "/subdirectory/" in your source code; There is no page name following the subdirectory name, and you rely on the server to default to "index.html" or "main.shtm" or whatever. You might want to try adding "index.html" (or whatever is appropriate) to these links - or maybe to half of them - and see if that helps.
Again, your log file example didn't include them, but I'd also be interested to know if Scooter chokes on the ones that also have a page anchor, in the form:
yourdomain/subdirectory/#anchor
- again, without the page name.
Might be worth a try, and if this changes or fixes the behaviour, you might want to report this to AV crawlsupport. Somewhat surprisingly, I have found them to be responsive recently.
Jim
Well, those URLs should work, but that doesn't mean all spiders will be able to handle them. For example, I have had a few problems with home-brew spiders issuing requests like
"GET /top_dir/sub_dir/sub_sub_dir/../../other_top_page.html HTTP/1.1" If you do figure out the problem and it's something that should work, please do report it to AV crawl support: Others who have used the same technique and aren't dialed-in to WebmasterWorld will be suffering the same problem.
Jim