Forum Moderators: open

Message Too Old, No Replies

Google-bot gone wild? (Log full of "File does not exist" errors)

Anyone else seeing unusual G-bot behavior?

         

Meta_Vision

1:22 am on Sep 21, 2004 (gmt 0)

10+ Year Member



Second time in a week I'm getting a log
full of "file not found" errors by Google-bot 2.1

I haven't changed anything in the directories
Google-bot is "getting confused" in.

(I've seen other bots in the past generate
wrong file names -- and yes, I've read
about how bot algorithms sometimes
do this to "check for cheating" of some kind,
but I've never seen the Google-bot
go "this wild" before
, )

Of course, I suspect if anyone else was
seeing this behavior, there'd be a comment
already ... but anyway. Just checking. {smile}

jenkido

5:47 am on Sep 21, 2004 (gmt 0)



I recently got 2400 requests from Googlebot for relative Amazon urls.

e.g.:
mydomain.com/exec/obidos/tg/detail/-/B0002NY8GW

I can't figure out where Google is getting these links. I tried doing a search for the links in Google via the "link:" prefix and they're not in the index. Also tried with a plain text search, and still nothing.

Google replied to my email and said Googlebot crawls links it finds on other pages and to search Google to find the page(s) that are linking to me. They also mentioned that there's no way for them to provide the referrer. :(

Meta_Vision

5:23 pm on Sep 21, 2004 (gmt 0)

10+ Year Member




Thanks. Mine's not a relative link problem,
but I think you may pointed me in the right
direction. (Someone making a page
of links that pointed to my content in the
"confused directory" ... and they just happened
to mess up the path name)

Since there's no one else talking about
"Google bot gone wild" but us {smile}
that's probably what happened.

Thanks again.,

BillyS

6:07 pm on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not seeing this behavior on my sites.

Josefu

6:54 pm on Sep 21, 2004 (gmt 0)

10+ Year Member



I'm getting this behaviour. Big time.

What Mr. Googlebot is doing is looking for a file from one directory in another - very odd. He's been doing since three weeks now, but all the errors don't seem to be affecting my site's serp's. I was going nuts for a while though, trying to find the 'mistake' in my coding that was causing Googlebot to do this.

renee

7:22 pm on Sep 21, 2004 (gmt 0)

10+ Year Member



I found a few cases where G is confused in trying to crawl items in scripts. for example, i have the following:

<a href=javascript:window.open('a'+'b')>text</a>

the goolebot then tries to crawl mydomain.com/a and mydomain.com/b

i was using javascript precisely to prevent bots from crawling url 'ab'.

Macro

7:28 pm on Sep 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That js may be one issue. Any of you guys using Actinic? I have this problem too and some hints that Actinic may be behind the problem.

Josefu

8:05 pm on Sep 21, 2004 (gmt 0)

10+ Year Member



No atinic here. In addition, my javascript-guided URL's use the <a href="URL.html" onClick="javascript:function(this.href);return false"> method.

Meta_Vision

11:04 pm on Sep 21, 2004 (gmt 0)

10+ Year Member



Josefu, yes, right file name in the wrong directory. And yes,
I too went through "everything" to see if I'd goofed. Nope.

So much "fun" running around screaming: "Oh God what have I done?
Forgive me, Google-bot, I didn't mean to confuse you"
Not kidding. {smile}

bakedjake

11:19 pm on Sep 21, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the goolebot then tries to crawl mydomain.com/a and mydomain.com/b

heh heh heh... trying to auto-detect javascript cloaking.

Good work G. Try harder. ;-)

Meta_Vision

6:09 am on Sep 23, 2004 (gmt 0)

10+ Year Member



Quick follow-up

Another wave of "file not found" errors tonight.

No javascript linking on my site.
Nothing "sneaky" about the directory structure
(normal use of Unix structure organization)

WHERE STRUCTURE IS

.../domain/subdirectory/file1.html

.../domain/subdirectory/file2.html

GOOGLE-BOT IS INSTEAD LOOKING IN

.../domain/file1.html

.../domain/file2.html

While I do have a handful of domains on the
same server, Google-bot seems to be
not just looking for "something wrong"
with that ... but even under just one domain
Google-bot is leaving out subdirectory names
to generate the bad file paths
. Surely Google
doesn't care how I organize a Unix file structure.

This baffles me. {smile} Not the first time,
but this is NEW -- getting a log full of junk
from Google-bot generating path errors.

abates

11:09 am on Sep 23, 2004 (gmt 0)

10+ Year Member



Meta_Vision: I had the same thing today. Also GoogleBot was requesting URLs for subdomains from the main www.domain.com. There's definately soemthing screwy going on there...

Trax

11:26 am on Sep 23, 2004 (gmt 0)

10+ Year Member



I don't have this on all of my sites
very strange

Macro

12:24 pm on Sep 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got directories/files organised as
root>products>widgets>widget1.htm, widget2.htm etc and
root>products>gizmos>gizmo1.htm, gizmo2.htm etc

Google is looking for files like products/widgets/gizmos/widgets/widgets/widgets/gizmos/gizmos/gizmos/widget1.htm! and
products/widgets/products/widgets/products/widgets/products/widgets/gizmos/widget1.htm!

It ran up more notfound pages in one day than I get traffic all month. And it's done this several times in the past year.

billygg

12:45 pm on Sep 23, 2004 (gmt 0)

10+ Year Member



wow, thats crazy stuff, this is a little off topic, but on topic. as of last month, on my site, my linking structure was directory based. with each page in each directory being default.aspx. when i did my linking structure throughout my site, i didnt link to the specific page, instead, i just would send a link to that directory, hence google was picking up the default doc for each folder without me specifying. so basically an internal url in my site would look like this... [index.com...] . never specifying file name. well google loved it, and ranked all my pages. after about 3 full months, it started dropping them all because i wasnt specifying a page name like default.aspx. not sure if google is checking each directory for a default page now or what. it kinda struck me as odd. i know marcomedia used the same type of linking structure, and when that happened to my site, macromedia lost lots of internal PR as well. maybe googlebot is checking directories for types of files, and not finding them...

dirkz

5:15 pm on Sep 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[index.com...] is perfectly allright, never had problems with it.

Is your internal link structure consistent? Or do you link with .../default.aspx in some places?

What's up with external backlinks? Sure they all use the pure directory link?