Forum Moderators: open

Message Too Old, No Replies

google bot problem

         

indiandomain

6:48 am on Apr 13, 2003 (gmt 0)

10+ Year Member



i have a url /example_Limo.html
and when i check my logs the google bot shows this
64.68.82.41 - - [13/Apr/2003:02:45:23 -0400] "GET /Example%20Limo.html HTTP/1.0" 200 12810 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

does the bot replace "_" by a "%20" in the url?

what can i do to make the bot index the url the way it is
/example_Limo.html

[edited by: Brett_Tabke at 7:05 am (utc) on April 13, 2003]

Brett_Tabke

7:07 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



%20 is a hex 20 or decimal 32 on your ascii chart.
The underline char is hex 5f. Two completely different characters.

indiandomain

7:19 am on Apr 13, 2003 (gmt 0)

10+ Year Member



yes
but then why is the bot reading my url as
Example%20Limo.html instead of Example_Limo.html

will my url appear as Example%20Limo.html or Example_Limo.html in google after indexing?

vincevincevince

8:34 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



is the link it is `following` correct? sure you didn't make amistake there?

indiandomain

8:51 am on Apr 13, 2003 (gmt 0)

10+ Year Member



my website link is site.com/Example_Limo.html

when i checked my log files it shows

64.68.82.50 - - [13/Apr/2003:04:49:44 -0400] "GET /Example%20Limo.html HTTP/1.0" 200 14060 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

my link has a '_' in it.
doesnt google identify this?

killroy

10:51 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you click on the link yourself, what does the log line look like?

SN

le_gber

11:25 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

isn't the %20 the codename for 'space'?

Leo

indiandomain

11:27 am on Apr 13, 2003 (gmt 0)

10+ Year Member



if i click the link on my site the address bar in the browser shows site.com/Example_Limo.html

its just that the bot is replacing _ by %20

vincevincevince

2:49 pm on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I didn't see that in my logs.

It can be an easy mistake, the "_" " " error when coding, depending on package, I'll admit.... once it puts that annoying underline on it, you can't see `_` from ` `.

[widgetfinder.com...]

[can you see if that's a `_` or a ` ` between test and url above? i think not!]

jomaxx

10:59 pm on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has no problem with underscores. What's happening is that some link, somewhere on the Internet, maybe on your site maybe not, references that page with a space where the underscore should be.

pendanticist

11:14 pm on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What's happening is that some link, somewhere on the Internet, maybe on your site maybe not, references that page with a space where the underscore should be.

...and eventually some bot, or spider will come along and index it. After that, you play hell trying to get it rectified because of perpetuation.

Just one of those little 'ole idiosyncrasies of the Internet.

Pendanticist.

killroy

11:43 pm on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's why I asked what it shows in the log if you click yourself (not what it says in the address bar)

SN

pendanticist

12:24 am on Apr 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Understood Kilroy.

I think there are other factors involved here, but I don't know what they are. Perhaps something to do with browsers or servers, I can't say. All I know is once links like that get into the system and one clicks on them, whatever force caused the anomolie in the first place may work the same way in reverse, thus adding to that perpetuation.

In the past, I've had to notify those who were linking to me to alter the messy url they had listed because it wasn't exactly what I've set up and to reduce this very issue.

Ex: MSN-WebTV's ReadOnly browser almost always puts a '?' at the end of my root url.

www.blahblah.com/?

I can click those links and the '?' will remain in my address bar when you would think it would set off a 404 because that link is not a part of anything I've ever published.

So, in the scheme of things and to answer indiandomain's question, albiet indirectly: I don't think Google has anything to do with this, other than they end up perpetuating the problem.

So, what's a Webmaster to do, create a White List of acceptable URLs to prevent this? Wow, the concept is staggering.

Or, is it? Hmmmmm... A White List of acceptable urls...

Bots mess things up on the Grand Scale and we have to use the Minuscule Scale to correct them.

Pendanticist.

indiandomain

3:51 am on Apr 14, 2003 (gmt 0)

10+ Year Member



"That's why I asked what it shows in the log if you click yourself (not what it says in the address bar) "

when i click on the link my logs show

64.88.163.46 - - [13/Apr/2003:23:48:41 -0400] "GET /example_Limo.html HTTP/1.1" 200 11237 "http://example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; YComp 5.0.2.6)"

its only that the google bot uses a %20 instead.
even the inktomi bot shows this

66.196.72.71 - - [13/Apr/2003:21:28:36 -0400] "GET /example_Limo.html HTTP/1.0" 200 11818 "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]

really funny situation guys....my html clearly shows example_Limo.html so its confusing why google is using a %20

any google expert please advice..