Welcome to WebmasterWorld Guest from 54.145.209.107

Added %c0 in urls - problem in google or in my server?

   
2:06 pm on Aug 22, 2006 (gmt 0)

5+ Year Member



Two directories of my website are not in google cache since april 2006.
Now some pages of these directories returned in google search but the address is strange.
Example:
- the good address is: www.mywebsite.com/web/musica/povia.asp
- the address in google search is:
www.mywebsite.com%c0/web/musica/povia.asp
What's the problem, it's a google problem or a problem with my
webserver?
All the others directories of my website have not this problem in
google search.

Thanks and excuse me for my bad english, Franco

10:07 am on Aug 23, 2006 (gmt 0)

5+ Year Member



Sorry for my reply, but today another discovery:

1) site:www.mysite.com/directory_disappeared --> not pages in cache

2) site:www.mysite.com%c0/directory_disappeared --> All pages in cache

Can anyone help me?

5:28 pm on Sep 6, 2006 (gmt 0)

5+ Year Member



Hi,

Same problem with my site.

www.mysite%cO.com is showing cached page in google.

but www.mysite.com doesn't.

Pls help us.

Thanks/

5:59 am on Sep 7, 2006 (gmt 0)

10+ Year Member




Just pulling up the post, I find it interesting.

I think that Google has serious basic issues at the moment, I wonder why this one happened.

6:16 am on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I was trying to research this -- to see how various servers handled the extra character after the domain extension. What I discovered is that different BROWSERS do different things:

1. IE removes the extra character, but apparently still looks for the bogus domain name because I get a 404
2. Firefox keeps the extra character, and of course I get a 404
4. Opera removes the extra character and then requests the corrected domain name, which resolves.

I am not sure why Google would keep these erroneous urls in their index, unless perhaps the server is resolving it anyway -- or perhaps googlebot's crawling code corrects the urls, in the manner that Opera does.

But where is the character coming from in the first place? Is it a glitch in Google? Competitive sabotage against the domain? A typo in an inbound link somewhere?

5:59 pm on Sep 7, 2006 (gmt 0)

5+ Year Member



I have decided to post the web address of my website because i need your help.
In Google write:

<Sorry, absolutely no personal urls, per Forum Charter [webmasterworld.com]>

With the strange command 1 all the pages, disappeared in normal cache, return.
In the google search the section web, before april 2006, gets me about 10.000 visits, now zero visits and in this section i use adsense ...
This is a relevant problem for me as you can understand.
Thanks, Franco

[edited by: tedster at 6:00 am (utc) on Sep. 8, 2006]

6:06 pm on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That appears to be a Google display problem that %c0 isn't in the actual url it however gets used to attempt location of the cache entry which it won't find.

I'd classify that as a Googlebug ...

5:47 am on Sep 8, 2006 (gmt 0)

5+ Year Member



I try to contact google team about this "googlebug" but without success.
What can i do now?
Thanks
6:16 am on Sep 8, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I assume you have already looked into this, but I still think I'll say it. Do whatever you can to make sure that the bizarre url is not resolved by your server -- through any kind of misconfiguration of the server itself or through a problem with your DNS. If googlebot gets a 200 code returned along with content when it tries to spider the bad urls, that could make fixing things a lot more complicated.
9:26 am on Sep 8, 2006 (gmt 0)

5+ Year Member



The others search engines (yahoo, msn, ...) have not problem to find the correct web address.
It's possible a server misconfiguration active just for googlebot, if yes how can i verify.
10:19 am on Sep 8, 2006 (gmt 0)

10+ Year Member



A similar error that is going on out there is with urls that end with an underscore or two or more underscores(ex.domain.com/dir.html__) - for some reason googlebot does correct these and list them as they seem to be able to get a 200 returned - although doing it yourself you will get a server error(404) - Ive actually added a line to my htaccess that will resolve and 301 those types of urls although I did find two larger spammers linking to me that way (not sure if it was on purpose)
12:13 pm on Sep 8, 2006 (gmt 0)

5+ Year Member



Same problem for me.

I held this description of the googlebot-bug on the 17th of August as apparently I was the only one to have this problem, now here is what I commented, and still happening regularly on our site:

------access log-----------
66.249.72.1 - - [17/Aug/2006:06:39:47 +0000] "GET /any-folder/any-sub-folder/any-file-name.%20... HTTP/1.1" 404 1736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
------end access log-------

Essentially, it tries to spider the file

widgets/blue-ones/list.%20...

instead of the correct, existing, indexed on google already:

widgets/blueones/list.html

This is happening in several websites of ours (at least 4-6 times every day since the middle of August)

Another case is attempting to fetch (get)

GET /index.html')

instead of the correct

GET /index.html

adding ') at the end of the string...

Of course we have never had such a badly name file(s) on our web servers, therefore this is an "almost right" information that Googlebot has about our URLs.

Please note that the wrong attempted URL is also reported in Google Sitemap in the Diagnostic / Web Craw / Not Found section.

The way it's done, looks like to me that Googebot takes the "abbreviated listing" of our actual URLs to be used in the spidering phase, which is crazy...

you could try to

grep '%20...' accesslog.filename

from your unix shell (if you use unix/linux, of course).

Thanks

12:18 pm on Sep 8, 2006 (gmt 0)

5+ Year Member



Interesting discussion folks. I thought always thought it was just me having these issues.

I am seeing %C0 on one site but %5C with another. As far as I remember %5C is an actual \ backslash.

I also see the report in Sitemaps too.

12:26 pm on Sep 8, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Those extra characters could be from a bad link from your site or someone elses site.
12:47 pm on Sep 8, 2006 (gmt 0)

5+ Year Member



Trinorthlighting,

Nope.

If someone else (even not internally to my domains) had linked me for

my-domain.com/invalid-path

I would have detected it in the referral logs that we carefully analize every day, one by one, even if it was a 404 like in this case.

Never in the last 4 years someone (but googlebot) has searched anything with those wrong characters in the URL....

Also, the mistake is too common and too new for not having left any trace in the access/error log, if not the googlebot signs...

1:01 pm on Sep 8, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Ahh,

It could be coming from someones browser via the google toolbar. That would support tedster's theory and make a lot of sense.

1:10 pm on Sep 8, 2006 (gmt 0)

5+ Year Member



Why a Google tool bar?

This is what goole says

"This page lists URLs from your site that Googlebot had trouble crawling."

This is not coming from a Googlebar: this is googlebot trying to crawl that wrong URL.

Please explain.. thanks.

2:26 pm on Sep 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The bot works from a url database, one of the sources of the urls is IIRC from the google toolbar. In other words the toolbar phones home. I don't run the toolbar, such items are not installed on my systems.

The other sources are links on other pages, both from your site(s) and other site(s).

Mistookes in links happen and when picked up in googles crawling they get stored for later crawling.

The thing about the site the starter of this post has, is that clicking on the page title in the serps returns the actual page, that is the one without the %c0 in it. The %c0 is not in the link that Google has for the page. The %c0 is however shown in the green url display. The %c0 also is used in the link to Googles cache entry. Following the cache link results in the cache entry not being found. However removing the %c0 from the cache link gets you to the cache copy. That cache copy is a bit messed up as well.

I use mozilla as my browser and when the %co is used in any of the links on the site I get the www.example.com%co was not found message, which IIRC comes from a DNS lookup failure.

In other words ya can't even get there using %c0 in that manner.

Too many things to contemplate, too little time, and not enough real information.

[edited by: theBear at 2:30 pm (utc) on Sep. 8, 2006]

 

Featured Threads

Hot Threads This Week

Hot Threads This Month