Forum Moderators: open
does the bot replace "_" by a "%20" in the url?
what can i do to make the bot index the url the way it is
/example_Limo.html
[edited by: Brett_Tabke at 7:05 am (utc) on April 13, 2003]
It can be an easy mistake, the "_" " " error when coding, depending on package, I'll admit.... once it puts that annoying underline on it, you can't see `_` from ` `.
[widgetfinder.com...]
[can you see if that's a `_` or a ` ` between test and url above? i think not!]
What's happening is that some link, somewhere on the Internet, maybe on your site maybe not, references that page with a space where the underscore should be.
...and eventually some bot, or spider will come along and index it. After that, you play hell trying to get it rectified because of perpetuation.
Just one of those little 'ole idiosyncrasies of the Internet.
Pendanticist.
I think there are other factors involved here, but I don't know what they are. Perhaps something to do with browsers or servers, I can't say. All I know is once links like that get into the system and one clicks on them, whatever force caused the anomolie in the first place may work the same way in reverse, thus adding to that perpetuation.
In the past, I've had to notify those who were linking to me to alter the messy url they had listed because it wasn't exactly what I've set up and to reduce this very issue.
Ex: MSN-WebTV's ReadOnly browser almost always puts a '?' at the end of my root url.
www.blahblah.com/?
I can click those links and the '?' will remain in my address bar when you would think it would set off a 404 because that link is not a part of anything I've ever published.
So, in the scheme of things and to answer indiandomain's question, albiet indirectly: I don't think Google has anything to do with this, other than they end up perpetuating the problem.
So, what's a Webmaster to do, create a White List of acceptable URLs to prevent this? Wow, the concept is staggering.
Or, is it? Hmmmmm... A White List of acceptable urls...
Bots mess things up on the Grand Scale and we have to use the Minuscule Scale to correct them.
Pendanticist.
when i click on the link my logs show
64.88.163.46 - - [13/Apr/2003:23:48:41 -0400] "GET /example_Limo.html HTTP/1.1" 200 11237 "http://example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; YComp 5.0.2.6)"
its only that the google bot uses a %20 instead.
even the inktomi bot shows this
66.196.72.71 - - [13/Apr/2003:21:28:36 -0400] "GET /example_Limo.html HTTP/1.0" 200 11818 "-" "Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]
really funny situation guys....my html clearly shows example_Limo.html so its confusing why google is using a %20
any google expert please advice..