Forum Moderators: open

Message Too Old, No Replies

Strange URL error (%20 ) in SERP (dominic & esmeralda)

http://www.mydomain.com/%20 should be http://www.mydomain.com/

         

rattlesnakedriver

9:41 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Over the last two indexes I have noticed an error in an URL Google is returning for my site. For a few important search terms that should be returned as [mydomain.com...] is getting returned as [mydomain.com...] in the SERPs. This is the index page of my site. Something else that has an impact here - although [mydomain.com...] does not exist our site is set up to redirect to the index page if a wrong URL is typed in (http://www.mydomain.com/duh.html)so it will never return a 404 error.

So to lay it out step by step:

1)When I do a search for "keywords x" the URL for our index page gets returned with "%20" tacked on to the end of it in the SERP.
2)If you click on it it goes to the homepage of our site but only after being redirected by a page not found redirect.
3)Basically www.mydomain.com/%20 does not exist and we would like it to be changed to just www.mydomain.com/ - I see that there has been a new update and for the 2nd month in a row it is happening (we waited for a new indexing because we thought it would automatically get fixed but since it didn't I thought it was time speak up about it). This is affecting our page ranking and would like to know if anything can be done to fix it.

This does not happen for all searches that send people to our site - If we do a search for "keywords y" the result that comes back is correct [mydomain.com...] (the correct URL).

I would happy to email anyone the specifics of this to get it resolved as it is having a major effect that I don't think will go away automatically anytime soon.


buckworks

4:51 pm on Jun 25, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Well, the error is back ...

Things seem normal with the internal pages that I checked, but Google is listing the index page as mysite.com/%20 again, and mysite.com/ has sunk out of sight.

Puzzling ...

rattlesnakedriver

6:39 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



My index url is still fluctuating between mydomain.com and mydomain.com/%20.

As my site is a yahoo store, I've figured out that any rogue backlink out there that points to a page that doesn't exist in my site is inappropriately redirected to a subdirectory homepage at mydomain.com/mydomain instead of using proper header protocol.

The most frustrating thing about this scenario is yahoo shows a duplicate homepage of my site at mydomain.com/mydomain. So when google finds the rogue link it follows it to this mirror index page and then starts indexing my inner pages as mydomain.com/mydomain/innerpage.html! Now I have several duplicate pages of content at both mydomain.com/mydomain/innerpage.html and mydomain.com/innerpage.html (the url it is supposed to be).

So two questions are could the mydomain.com/%20 be google's way of showing the mydomain.com/mydomain in the SERPs, since when I do a check of all my pages indexed in google it shows the mydomain.com/%20 instead of mydomain.com/mydomain. While all the other innerpages are shown as mydomain.com/mydomain/innerpage.html.

Secondly is google seeing all the pages with duplicate content and filtering with some sort of penalties?

hetzeld

7:11 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Buckworks,

Your ErrorDocument line is faulty as it will always return a 200 header if you use a full URL.(starting with [...)...]

You should use:
ErrorDocument 404 /myerrorpage.html

as this will return the right 404 header, as it should. ;)

Dan

hetzeld

7:17 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Rattlesnakedriver,

if you are allowed to edit the .htaccess file (and if your server is running Apache), a single line will sort this out for spiders and visitors.

RewriteRule ^%20$ [yourdomain.tld...] [R=301,L]

This will tell the spiders that the page named %20 has been permanently moved and send the according 301 header.
It will also redirect your visitors to the right page.

Dan

PS: make sure to use an external redirect (starting with [...)...]

buckworks

7:46 pm on Jun 25, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hetzeld, thanks!

I'm also going to add the RewriteRule you suggested, even though the site that had the rogue links has corrected them. Fortunately it's not common for someone to capture an extra space when they copy and paste an URL, but it's a foreseeable circumstance and having a fix in place Just In Case seems like a good idea.

Now if you'll excuse me, I'm off to edit some .htaccess files ... for several sites!

rattlesnakedriver

7:49 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Thanks much Hetzeld.

I have contacted yahoo a couple of times about this but unfortunately yahoo has been uncooperative with the issue.

I can't just drop them and move on at this point though because much sweat equity has gone into building the site.

I am currently trying to figure out a client-side script that would live in the page template. The script would query what the page URL is and if it is something other than the root directory ( such as mydomain.com/mydomain/ ) then the page would get redirected to the proper index page at mydomain.com.

Once this was up and working I could then try and get google to manually remove all the existing mydomain.com/mydomain/innerpage.html pages.

However my hopes are big and ability level limited at this point. Anyone know if this type of script exists?

hetzeld

8:01 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Buckworks and Rattlesnakedriver,

You're welcome!

In fact, a more general rule could be written:

RewriteRule (.*)%20s(.*) [domain.tld...] [R=301,L]

In plain English, it means you permanently redirect any string starting with, containing or ending with a %20s to the same string without the %20s
(.*) means any string, even the null string

Dan

hetzeld

8:11 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Rattlesnakedriver,

If I get you right, you want to move all pages in the /mydir subdirectory to the root of your web?
If this is the case, it is quite easy with mod_rewrite as well.

RewriteRule mydir/(.*) [yourdomain.tld...] [R=301,L]

This will redirect all pages in the /mydir directory to the same page in the root directory, sending a "301 - moved permanently"
This header tells the spiders to update their index entry.

Dan

SlyOldDog

8:21 pm on Jun 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The strange thing is that the %20 page returning the Yahoo standard "Page not found" server page didn't trip the duplicate content filter. There must be zillions of identical error pages out there.

rattlesnakedriver

8:50 pm on Jun 25, 2003 (gmt 0)

10+ Year Member



Thankfully it hasn't tripped the google duplicate content filter to the point where these pages are completely banned, otherwise I'd be totally out of business. However, I can tell you that there are millions of duplicate pages in google's index because yahoo hasn't felt the need to change their page error rewrite to something that will eliminate this.

I encourage other yahoo store users to contact yahoo about this so they can see just how vast this problem is.

SlyOldDog

9:20 pm on Jun 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



But I don't really understand what you're worried about? Your index page won't get banned because it has unique content.

The %20 page should get dropped by Google because it will deliver a standard yahoo 404 page. But that's what you want isn't it?

If you're worried that you're losing customers who find the %20 page, you can follow the advice on this thread, but I can't imagine this page being top on one keyword search is going to bring down your business?

g1smd

12:20 am on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen a number of sites with a similar problem due to hasty cut and paste. Good idea to fix it.

buckworks

1:26 am on Jun 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



<<I can't imagine this page being top on one keyword search is going to bring down your business?>>

SlyOldDog, it's not the rogue URL being on top for a search or two that's the problem, it's that the "real" URL drops out of sight from other SERPs where it was ranking well.

This 43 message thread spans 2 pages: 43