Scooter 3.2 IS Ruining My Marriage - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Scooter 3.2 IS Ruining My Marriage

Someone tell me what is going on please

brina

4:07 pm on Nov 16, 2002 (gmt 0)

10+ Year Member

Recent excerpt of my access log:

216.39.48.2 - - [15/Nov/2002:00:08:14 -0500] "GET /company/wedding_gifts/wedding_guestbooks/flower_girl_baskets/bridesmaid_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.171 - - [15/Nov/2002:00:08:14 -0500] "GET /ring_bearer_pillows/wedding_guestbooks/wedding_guestbooks/flower_girl_gifts/css/wedding_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.81 - - [15/Nov/2002:00:08:14 -0500] "GET /ring_bearer_pillows/wedding_guestbooks/wedding_gifts/flower_girl_baskets/wedding_gifts/wedding_shower_gifts/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.2 - - [15/Nov/2002:00:08:20 -0500] "GET /ring_bearer_pillows/wedding_paintings/wedding_shower_gifts/resources/css/main HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.81 - - [15/Nov/2002:00:08:20 -0500] "GET /bridal_accessories/ring_bearer_pillows/wedding_guestbooks/css/wedding_paintings/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.171 - - [15/Nov/2002:00:08:22 -0500] "GET /company/bridesmaid_gifts/ring_bearer_pillows/wedding_shower_gifts/ring_bearer_pillows/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"
216.39.48.2 - - [15/Nov/2002:00:08:28 -0500] "GET /ring_bearer_pillows/flower_girl_gifts/wedding_gifts/flower_girl_baskets/wedding_guestbooks/bridal_accessories/index.shtml HTTP/1.1" 206 51072 "-" "Scooter/3.2"

It's looking for files that don't exist by sticking my entire top nav in to one path. It has been coming by three tines a day for a week or so AFTER I PAID FOR INCLUSION to Altavista and has been looking for this concocted path.

My husband keeps telling me I need to do something with my HUGE error log and I keep telling him he needs to disallow the Scooter from our robots.txt. However, I paid for AV inclusion and I just want to get spidered once.

Any help out there?

Brina

littleman

5:26 pm on Nov 16, 2002 (gmt 0)

Wow, I wonder if you have some looping links in your site somewhere. It would be interesting to run your site through a link extractor and see if it comes up with a similar pattern.

jatar_k

5:42 pm on Nov 16, 2002 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Somehow poor scooter is getting lost. Try a link extractor or you could try the Sim Spider [searchengineworld.com], might help.

I would guess that it is a mistake on your side, somwhere your linking is confusing it.

Does scooter request these from the get go or does it get what it wants a couple of times and then start with all of these non-existant pages? Compare your error logs with your access logs, it might give you an idea of where things are going wrong.

weesnich

6:43 pm on Nov 16, 2002 (gmt 0)

10+ Year Member

Or try (if you are running windows) xenu, a free linkchecker, and have it check only your site. You should get a very simple sitemap from it as well.
http://home.snafu.de/tilman/xenulink.html

brina

11:11 pm on Nov 16, 2002 (gmt 0)

10+ Year Member

Thanks everyone. That will definately get me going in the right direction.

Brina

brina

3:41 pm on Nov 17, 2002 (gmt 0)

10+ Year Member

I ran the site through SiteSim (very cool little tool, BTW) and downloaded the Xenu and used it to check all my links. I fixed all the issues they came up with and as of this morning, Scooter is back in its loop.

Any other suggestions? Anyone care to look at my home page code?

Brina

jdMorgan

4:32 pm on Nov 17, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Brina,

This is just something I noticed, and may not help at all, but here goes...

Based on your short logfile sample, the links that seem to confuse Scooter (as resolved by my browser) are all of the form:

yourdomain/subdirectory/

That is, they appear as "/subdirectory/" in your source code; There is no page name following the subdirectory name, and you rely on the server to default to "index.html" or "main.shtm" or whatever. You might want to try adding "index.html" (or whatever is appropriate) to these links - or maybe to half of them - and see if that helps.

Again, your log file example didn't include them, but I'd also be interested to know if Scooter chokes on the ones that also have a page anchor, in the form:

yourdomain/subdirectory/#anchor

- again, without the page name.

Might be worth a try, and if this changes or fixes the behaviour, you might want to report this to AV crawlsupport. Somewhat surprisingly, I have found them to be responsive recently.

Jim

brina

5:00 pm on Nov 17, 2002 (gmt 0)

10+ Year Member

You know, that might be it. It was reccomended to me to make my main directory links in that format for better spidering. How ironic. I will try that and report back in case anyone else ever has this issue.

Brina

jdMorgan

6:07 pm on Nov 17, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Brina,

Well, those URLs should work, but that doesn't mean all spiders will be able to handle them. For example, I have had a few problems with home-brew spiders issuing requests like

"GET /top_dir/sub_dir/sub_sub_dir/../../other_top_page.html HTTP/1.1"

when I used relative URLs on one of my sites. They failed to resolve the "../" relative links.

If you do figure out the problem and it's something that should work, please do report it to AV crawl support: Others who have used the same technique and aren't dialed-in to WebmasterWorld will be suffering the same problem.

Jim