Welcome to WebmasterWorld Guest from 126.96.36.199
1. Is robots.txt at the root, is it valid, and do the rules actually say, technically, what you hope they do?
2. Does the domain root - http://www.example.com/ - resolve directly with a 200 OK status? Anything else and you've got a potential problem. No 301 redirects to default.htm, to an internal folder or anything like that. If you "must" redirect the domain root (and you really, really should avoid this) use a 302.
3. Look at all your "Home" links in the internal pages - they should also point to the domain root, and not to default.htm or anything like that. Beware of typos (it happens!)
4. If you have a SiteMap of any kind (html, xml) again be sure that the pure "domain root" is used for the Home Page.
5. If you have a secure certificate, make sure that [example.com...] does not return a 200 OK status. The best practice is only to install the cert on a dedicated subdomain, such as secure.example.com
This list does not account for every case that people are reporting of missing Home Pages right now, just some. Other cases I've looked at have me baffled too, and I will add to this thread if any other reasons surface.
The site linked internally to "/index.html" and both www and non-www were open to indexing.
When a site:domain.com -inurl:www search was done, there was the root index page in plain view. It was listed as domain.com without the www on the front.
It's a bit of a conundrum when the listing is split like this with both www and non-www spidered and indexed. Do you go with the one that has the root page included, the one with the highest PR, or the one with the most amount of pages indexed?
There were more pages listed as www URLs than non-www URLs so www is what we are going with to list the site.
The fixes have been applied. There is now a 301 redirect from "/index.html" to "www.domain.com/" and all non-www URLs site-wide now 301 redirect to the www version, preserving the path and filename in the redirect. Over the next few weeks all of the internal links that point to "/index.html" will be edited to point to just "/" instead.
The non-www URLs will mostly drop into Supplemental for a few months, before disappearing. The number of www URLs should increase a bit over that time too.
I'm sure that your problem isn't as simple as that, but that is what I have done for this site.
[edited by: g1smd at 12:30 pm (utc) on Oct. 31, 2007]
Stolen Content: What To Do First [webmasterworld.com]
Proxy Server URLs Can Hijack Your Google Ranking - how to defend? [webmasterworld.com]
Over the next few weeks all of the internal links that point to "/index.html" will be edited to point to just "/" instead.
I am looking for just a little clarity on that point above.
So if i have :
I should change it to:
And then eventually get it set up so not even file extensions are needed I would imagine...?
They are treated as four different URLs by search engines, and therefore as four different but identical pages.
That is called Duplicate Content.
If your internal pages link back to /index.html, all your pageRank is channeled there. External sites probably link to www.domain.com/ and their Pagerank is channeled to that URL.
With split Pagerank, both pages are not as strong as they otherwise would have been.
Additionally, Google does not want to list the same content multiple times. They will pick one to list and hide the other, or else drop it into the Supplemental Index.
They usually favour listing the shorter URL, like www.domain.com/ and so that is the one that your website should internally promote too.
However, even if all of your links point at the canonical version that you chose, the fact is, is that all the alternative URLs will still work and all of them will still send a "200 OK" HTTP status code when accessed. You haven't fully solved the problem at this point.
Search engines might still find and then try to index those other URLs. If you add a 301 redirect for the three versions that you do not want to be indexed, then none of those can be indexed. Only your one chosen canonical URL can be indexed.
The 301 redirect ensures that anyone that does try to access the other three URLs is redirected to the correct URL before the content is served to them.
We lost all rankings for the Main domain root page while most of our internal traffic for long tails remains strong..
Any Idea what kind of penalty this is?
The main KW's we used to rank for we are no where to be found.
All non ENGLISH google's still hold us at our old positions for over 50 days since the English versions are punished.
we did fix a potential duplicate issue 1.5 weeks ago with a index.php showing blank but with meta data.
ALSO...the site gets indexed everyday. More pages are added everyday. Our home page (according to google webmaster tools) has 70k in links from other sites...Cache is always updated on the root domain everyday also...