Forum Moderators: DixonJones

Message Too Old, No Replies

Metacarta and FAST-WebCrawler

Robos.txt?

         

pendanticist

11:59 pm on Dec 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Greetings,

Check this out.

This first one added both the number 9 and percent sign thus also issuing a 404 error code.

The file is supposed to be Marketing_Yourself_Resume.html


66.77.73.213 - - [05/Dec/2002:15:26:47 -0800] "GET /Marketing_Yourself_R%e9sum%e9.html HTTP/1.0" 404 2140 "-" "FAST-WebCrawler/3.6 (atw-crawler at fast dot no; [fast.no...]

And this one asks for Robos.txt.

66.28.23.147 - - [05/Dec/2002:15:29:48 -0800] "GET /robos.txt HTTP/1.0" 404 2140 "-" "metabot (crawler@metacarta.com)"

Is there anything I can do to prevent such occurances or is this something misconfigured on their end?

I allow all in my robots.txt.

Thank You.

Pendanticist.

pendanticist

12:06 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh, here's another one.

66.77.73.63 - - [05/Dec/2002:14:18:42 -0800] "GET /%22 HTTP/1.0" 404 2140 "-" "FAST-WebCrawler/3.6 (atw-crawler at fast dot no; [fast.no...]

I have no idea what this one is looking for.

You recon FAST has found some really righteous smoke somewhere? <he said scratching his head>

Lately I've found a whole slew of 404s in my stats, the majority of which are bots/spiders as opposed to typed in.

Thanks.

Pendanticist.

sugarkane

7:52 pm on Dec 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could well be a glitch on their end - especially the robos.txt one. Another possibility is that a site is incorrectly linking to yours, and the spider is following these links resulting in the 404s.

jdMorgan

8:03 pm on Dec 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pendanticist,

The first one looks like someone may have linked using "resume" with accented e's, and confused the 'bot.

I'd report the robos.txt request to Fast, especially if you see it again - It could heve been a logging glitch, though.

Jim

pendanticist

1:08 am on Dec 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks sugarkane.


sugarkane
Could well be a glitch on their end - especially the robos.txt one. Another possibility is that a site is incorrectly linking to yours, and the spider is following these links resulting in the 404s.

The only time I've seen an incorrect linking method actually turned out to be a mid-western university professor who'd copied the entire content of a particular index (including my root URL at the bottom of my pages) and placed the link on his page. In that case there was the replacement of the e's as Jim has noted, but it did not render a 404, rather appeared simply as an inbound link.

Don't ask me how, but that same link the professor had up on his website, also listed as a backlink in Google, MSN, Hotbot and two or three others...replete with % signs. Go figure.


jdMorgan
The first one looks like someone may have linked using "resume" with accented e's, and confused the 'bot.

Thanks Jim,

I'm more inclined to agree with you here. A couple of examples:

Marketing_Yourself_R%e9sum%e9.html HTTP/1.0" 404 2140 "-" "FAST-WebCrawler/3.6 (atw-crawler at fast dot no; [fast.no...]

FAST does this many times and not just with %.

Marketing_Yourself_R%C3%A9sum%C3%A9.html HTTP/1.0" 404 2140 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

This is googles first time.

You'll notice both bots got confused, uhm, differently?


I'd report the robos.txt request to Fast, especially if you see it again - It could heve been a logging glitch, though.

Next time I see it Jim, I will.

On another note, slurp keeps digging up file names I changed out better than two years ago.

/BookM.html HTTP/1.0" 404 2140 "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]

I dunno what the deal is, but it sure does get a might confusing. That's why I put Redirect permanents in my .htaccess file so the bots/spiders and my viewers would be seamlessly redirected to the newer file names. <shrug>

Seems like the more I think I know, understand and/or make provisions for, the more convoluted my log files become as a result.

Maybe someday all these bots will get on the same page - figuratively speaking. :-)

Thanks again.

Pendanticist.

mvl22

11:12 pm on Dec 10, 2002 (gmt 0)

10+ Year Member



I've had

/robos.txt
and
/robotsxx.txt

recently a lot also, but it appears to have stopped now.

See also my thread at
[webmasterworld.com...]

which appears related.