homepage Welcome to WebmasterWorld Guest from 54.167.96.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
How could this have happened?
Hexidecimal %20 in my url
Snookered




msg:116980
 12:52 am on Jun 23, 2004 (gmt 0)

Hi,

I've discovered that in the Google index there is a second listing for my site:

[%20bluewidgets.com...]

Of course if clicked it leads to a The page cannot be displayed. I don't undertand how it got there, my main page is .shtml and I don't have another. This might help to explain why Google hasn't indexed pages that I've had up since February. Has this happened to anyone else?

Thanks

 

ciml




msg:116981
 5:00 pm on Jun 23, 2004 (gmt 0)

I haven't seen that for a long time. This used to happen in Google if you had wildcard DNS and hosting that served the content regardless of the domain used to access it. Then, when someone links to that address (with the %20 in the domain) Google would be served the content and merge it with the valid URL for the page.

It's easy enough to link to someone that way by accident using DTP-style HTML authoring software, but ideally the search engine will notice that the domain is wrong (in HTTP terms) and ignore it.

john_k




msg:116982
 5:01 pm on Jun 23, 2004 (gmt 0)

I believe that %20 is the encoding for a space. So look for someone linking to your site with an erroneous space before the www.

sublime1




msg:116983
 7:15 pm on Jun 23, 2004 (gmt 0)

We have recently seen lots of incorrectly encoded URLs for our sites showing up in the index (that is, in the "site:mycompany.com" results). I they are alarming, and also seem to be persistent.

When I discovered this, I put in rewrite rules to do a 301 redirect from the junk URLs to the correct ones.

What is surprising is that they got put in the index at all, since they would have resulted in a 404 error prior to my fixup -- it looks like the Google crawler found a string starting with http:// and grabbed it, even though this was a fragment of another URL that would have worked. Worse, I think they also got a session id that is tacked onto the end of the URL so we have multiple instances of the same page showing up. Doh!

In our case, I believe these links all came from the "pseudo-directories" which were harvesting queries for popular keywords from Yahoo and other sources of paid listings. Because we add tracking codes for these, we can see where the URL originally came from.

trimmer80




msg:116984
 12:30 am on Jun 24, 2004 (gmt 0)

since they would have resulted in a 404 error prior to my fixup

so you dont have wildcard dsn enabled? This usually only occurs if you do. Infact a 301 redirect would benefit you as you can still obtain PR from sites that do this type of misspelling.

g1smd




msg:116985
 9:10 pm on Jun 24, 2004 (gmt 0)

Google indexes every link that it finds on every page that it crawls. It might not show those pages in the SERPs, but it does record that the link exists (note: not the place that the links points to). There are a great many such typos indexed already.

Look at this search [google.com] that lists malformed links to PDF files for example (final trailing "/" is an error).

my3cents




msg:116986
 5:52 am on Jun 25, 2004 (gmt 0)

I have the same problem with incorrect urls being indexed, when they result in a 404, google is showing title and description of my default 404 page.

Also, it looks like it is treating my home page as duplicate content for:

[mysite.com...]
www.mysite.com
www.mysite.com/index.html
www.mysite.com/?trackingurl
www.mysite.com/%1Fdynamicstringfromsomesearchengine

g1smd




msg:116987
 3:36 pm on Jun 25, 2004 (gmt 0)

Yep. Those are all different pages as far as Google is concerned.

A 301 redirect from domain.com to www.domain.com will take care of one problem.

For links to the index page just use www.domain.com/ without mentioning the exact filename. That may take care of another problem (but you also need to get all other sites that link to you to change to that format).

my3cents




msg:116988
 3:59 pm on Jun 25, 2004 (gmt 0)

g1smd,

I don't see how I can write redirects for all of the tracking urls and dynamic strings that are being generated from other search engines, I pointed out a few obvious ones, a few I can 301, but there are several more I don't know what to do with. By the time G indexes some dynamic url it's too late for me, and it seems like I am getting a duplicate content penalty. The dynamic strings are not from our website, we are completely static, they look like results from another search engine or directory.

sublime1




msg:116989
 4:01 pm on Jun 25, 2004 (gmt 0)

g1smd -- thanks for the clarification on what shows up in the "site:" query. From that I feel better.

my3cents -- you say:
I have the same problem with incorrect urls being indexed, when they result in a 404, google is showing title and description of my default 404 page.
Also, it looks like it is treating my home page as duplicate content for:

[mysite.com...]
www.mysite.com
www.mysite.com/index.html
www.mysite.com/?trackingurl
www.mysite.com/%1Fdynamicstringfromsomesearchengine

Is your server returning a 404 error for the 404 page (many sites send back error pages with a 200 status)?

If your page is correctly returning a 404 error, then this would be bizarre situation, since it implies not only that google is finding bogus URLs (which is natural), but after it retrieves them it fails to notice the 404 (Not Found) response and goes ahead to index the page.

Make sure your bad pages are returning the 404 response! And if they are, please let us know, as this could explain a lot of things :-)

my3cents




msg:116990
 4:04 pm on Jun 25, 2004 (gmt 0)

I am going to check into this further

my3cents




msg:116991
 4:14 pm on Jun 25, 2004 (gmt 0)

ok, I have a .htaccess file that says:

ErrorDocument 404 [mysite.com...]

if you goto [mysite.com...]

you get the errorpage.shtml page

the incorrect urls are being indexed with the title and description snipet from the errorpage.shtml

Also, I am calling my admin to see if there is anything he has done that could be causing a problem or if he has any suggestions.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved