Forum Moderators: Robert Charlton & goodroi
Old pages are showing up in site:mysite.com and showing as Supplemental Results including .html (that had been 301'd to .shtml in november 2001) and .shtml (301'd to .php around January. It was after the first signs of problem.. I'd thought it a good time to update to php while they weren't ranking as well :S Yeah yeah.. I know now that was not the best thing to do.)
When these pages were moved, the originals were deleted and 301'd to the current url in .htaccess. (.html strays were updated to redirect to .php, so shouldn't be double redirect problems.)
Outgoing links are tracked using php 301's, but are showing up in the pages listed at my domain. Even ones that've always been 301's, (before I knew about 302 hijackings, a few had been 302's.. that was cleaned up many months ago and header tests ran to make sure it was done properly)
Also many www. pages are now showing up, even www.site.com/page.html etc that haven't existed since 2001.
Here's a brief history of this domain:
----------------------------------------------
- I've always used the non www. version of my url, some do link to the www. version tho (organic linkers tend to link however they prefer.. normally I'm thrilled to get these links anyway, up till now I'd not had a problem.)
- Online and ranking reasonably well for a variety of search terms since 1998, many inner pages were doing well until this problem.
- Hosted at the same server since 1998.
- No black hat or questionable seo as far as I know (just the usual, h1 for the title, descriptive text etc. exchanging links with relevant sites.)
- No major server downtime (less than 15 minutes total for the entire year according to logs)
- a wide variety of linkers, mostly related and mostly organic. No bought links whatsoever.
- Updated almost daily since 1998
- php pages are used only to include a poll, no other dynamic content.
What I've checked:
-----------------------------------
- header tests all show a clean 301's.
- site:www.mysite.com shows 602 pages, site:mysite.com shows 978 pages actual pages are around 100!?
- httpd.conf goes directly to the non www. version (removed custom 404 and looked at the error message, no www. in the url on the page) so that won't be causing a loop as far as I can see.
So why would I drop from other engines when I 301 www. -> non www? coincidence?
- Css and html code both tested and passed without errors with the W3C validator.
- Googlebot still spiders without problems, even through the tracking links.
- Tested robots.txt, passed as well. Only disallowing my stats.
What I've recently tried:
-----------------------------------
- A few months ago, 301'd all www. to the non www. version... a few weeks after I lost rank in Yahoo and MSN. Removed the 301, rank came back in both engines within 2 weeks. Some inner pages never recovered. No change during this time in G.
- Changed all internal links to absolute around 5-6 months ago
- Tried briefly to disallow Gbot from going through my tracking links, thought better of that and removed that from robots.txt to allow him/her to go through again (I don't know how that little creature thinks, but I don't want it to think I'm hiding stuff. I'd rather it go and give my linkers credit.)
- Tried waiting it out :S
- Tried to not pull my hair out :S
Any idea's? What have I missed?
Should I try 301ing www -> non www version again and risk dropping in other engines?
Why are pages that have always been 301 (ie. outgoing links) even showing up in pages listed?
Why have the .html pages that were removed and 301'd in 2001 coming back to haunt me and showing as supplemental?
Still more .html pages are making their way in, even though these were removed in 2001, and many tracking links that've always been 301's (even ones added this month) are making their way in the listing.
pages listed at:
site:www.mysite.com 611
site:mysite.com 1,040
This should really be around 100 pages.
As to why they're appearing, I can only speculate. Possibly it's attempted page jacks or:
<hare brained theory>Google doubled the size of their index circa November '04 and enjoyed some criticism for the low quality pages they added. At that time, they added many URLs of pages that no longer existed. Since then, there have been similar reports of old and non-existent pages being added to their index. Those pages usually went supplemental quickly and eventually disappeared. My suspicion is this is an ongoing project. I suspect Google is adding pages from old indexes and/or old spider crawls periodically, perhaps so those pages can be reevaluated under the present algo.</hare brained theory>
After making a major change to your site (301 non-www to www for example), the worst thing you can do is reverse the change after two weeks! At that point some of your site has been spidered with the new domain version, possibly some of it is indexed under that new version on some SEs, and then you reverse your change and that process begins anew...
Examine your backlinks and determine which domain version is most commonly linked to and use that version for your site. "linkdomain:yourdomain.com" at Yahoo is the most reliable way to see your incoming links that I've found. If that suggests a change/301 to the www version, make the change and WAIT to see how the SEs react. Personally, if the results are ambiguous, I might wait through two Google updates, up to six months but I'm patient. :)
Regardless of which version of your domain name is most frequently linked to, 301 the least used version to it. And insure that all of your absolute links use the same version you 301 to.
And be patient! :) When rankings fluctuate, ask yourself:
1. Is this caused by something I changed?
2. Is this caused by something the SEs changed?
3. Is this caused by something my competitors changed?
Don't make changes until you can answer those questions with some certainty.