Forum Moderators: open
So to lay it out step by step:
1)When I do a search for "keywords x" the URL for our index page gets returned with "%20" tacked on to the end of it in the SERP.
2)If you click on it it goes to the homepage of our site but only after being redirected by a page not found redirect.
3)Basically www.mydomain.com/%20 does not exist and we would like it to be changed to just www.mydomain.com/ - I see that there has been a new update and for the 2nd month in a row it is happening (we waited for a new indexing because we thought it would automatically get fixed but since it didn't I thought it was time speak up about it). This is affecting our page ranking and would like to know if anything can be done to fix it.
This does not happen for all searches that send people to our site - If we do a search for "keywords y" the result that comes back is correct [mydomain.com...] (the correct URL).
I would happy to email anyone the specifics of this to get it resolved as it is having a major effect that I don't think will go away automatically anytime soon.
As my site is a yahoo store, I've figured out that any rogue backlink out there that points to a page that doesn't exist in my site is inappropriately redirected to a subdirectory homepage at mydomain.com/mydomain instead of using proper header protocol.
The most frustrating thing about this scenario is yahoo shows a duplicate homepage of my site at mydomain.com/mydomain. So when google finds the rogue link it follows it to this mirror index page and then starts indexing my inner pages as mydomain.com/mydomain/innerpage.html! Now I have several duplicate pages of content at both mydomain.com/mydomain/innerpage.html and mydomain.com/innerpage.html (the url it is supposed to be).
So two questions are could the mydomain.com/%20 be google's way of showing the mydomain.com/mydomain in the SERPs, since when I do a check of all my pages indexed in google it shows the mydomain.com/%20 instead of mydomain.com/mydomain. While all the other innerpages are shown as mydomain.com/mydomain/innerpage.html.
Secondly is google seeing all the pages with duplicate content and filtering with some sort of penalties?
Your ErrorDocument line is faulty as it will always return a 200 header if you use a full URL.(starting with [...)...]
You should use:
ErrorDocument 404 /myerrorpage.html
as this will return the right 404 header, as it should. ;)
Dan
if you are allowed to edit the .htaccess file (and if your server is running Apache), a single line will sort this out for spiders and visitors.
RewriteRule ^%20$ [yourdomain.tld...] [R=301,L]
This will tell the spiders that the page named %20 has been permanently moved and send the according 301 header.
It will also redirect your visitors to the right page.
Dan
PS: make sure to use an external redirect (starting with [...)...]
I'm also going to add the RewriteRule you suggested, even though the site that had the rogue links has corrected them. Fortunately it's not common for someone to capture an extra space when they copy and paste an URL, but it's a foreseeable circumstance and having a fix in place Just In Case seems like a good idea.
Now if you'll excuse me, I'm off to edit some .htaccess files ... for several sites!
I have contacted yahoo a couple of times about this but unfortunately yahoo has been uncooperative with the issue.
I can't just drop them and move on at this point though because much sweat equity has gone into building the site.
I am currently trying to figure out a client-side script that would live in the page template. The script would query what the page URL is and if it is something other than the root directory ( such as mydomain.com/mydomain/ ) then the page would get redirected to the proper index page at mydomain.com.
Once this was up and working I could then try and get google to manually remove all the existing mydomain.com/mydomain/innerpage.html pages.
However my hopes are big and ability level limited at this point. Anyone know if this type of script exists?
You're welcome!
In fact, a more general rule could be written:
RewriteRule (.*)%20s(.*) [domain.tld...] [R=301,L]
In plain English, it means you permanently redirect any string starting with, containing or ending with a %20s to the same string without the %20s
(.*) means any string, even the null string
Dan
If I get you right, you want to move all pages in the /mydir subdirectory to the root of your web?
If this is the case, it is quite easy with mod_rewrite as well.
RewriteRule mydir/(.*) [yourdomain.tld...] [R=301,L]
This will redirect all pages in the /mydir directory to the same page in the root directory, sending a "301 - moved permanently"
This header tells the spiders to update their index entry.
Dan
I encourage other yahoo store users to contact yahoo about this so they can see just how vast this problem is.
The %20 page should get dropped by Google because it will deliver a standard yahoo 404 page. But that's what you want isn't it?
If you're worried that you're losing customers who find the %20 page, you can follow the advice on this thread, but I can't imagine this page being top on one keyword search is going to bring down your business?