I have worked with a few sites that had this problem. Here's a list of the things I check:
1. Is robots.txt at the root, is it valid, and do the rules actually say, technically, what you hope they do?
2. Does the domain root - http://www.example.com/ - resolve directly with a 200 OK status? Anything else and you've got a potential problem. No 301 redirects to default.htm, to an internal folder or anything like that. If you "must" redirect the domain root (and you really, really should avoid this) use a 302.
3. Look at all your "Home" links in the internal pages - they should also point to the domain root, and not to default.htm or anything like that. Beware of typos (it happens!)
4. If you have a SiteMap of any kind (html, xml) again be sure that the pure "domain root" is used for the Home Page.
5. If you have a secure certificate, make sure that https://www.example.com/ does not return a 200 OK status. The best practice is only to install the cert on a dedicated subdomain, such as secure.example.com
This list does not account for every case that people are reporting of missing Home Pages right now, just some. Other cases I've looked at have me baffled too, and I will add to this thread if any other reasons surface.
Thanks tedster. I will make sure these changes have been followed up for the sit.
I just helped a site out that showed no root index page when they searched for site:www.domain.com.
The site linked internally to "/index.html" and both www and non-www were open to indexing.
When a site:domain.com -inurl:www search was done, there was the root index page in plain view. It was listed as domain.com without the www on the front.
It's a bit of a conundrum when the listing is split like this with both www and non-www spidered and indexed. Do you go with the one that has the root page included, the one with the highest PR, or the one with the most amount of pages indexed?
There were more pages listed as www URLs than non-www URLs so www is what we are going with to list the site.
The fixes have been applied. There is now a 301 redirect from "/index.html" to "www.domain.com/" and all non-www URLs site-wide now 301 redirect to the www version, preserving the path and filename in the redirect. Over the next few weeks all of the internal links that point to "/index.html" will be edited to point to just "/" instead.
The non-www URLs will mostly drop into Supplemental for a few months, before disappearing. The number of www URLs should increase a bit over that time too.
I'm sure that your problem isn't as simple as that, but that is what I have done for this site.
[edited by: g1smd at 12:30 pm (utc) on Oct. 31, 2007]
The site in question isn't a blog, by any chance...?
Another thing to check is whether your Home Page has been scraped or proxy-hijacked. See:
Stolen Content: What To Do First [webmasterworld.com]
Proxy Server URLs Can Hijack Your Google Ranking - how to defend? [webmasterworld.com]
Our home page has been missing for 45 days...
I just found it on page 32 of a site: search
|Over the next few weeks all of the internal links that point to "/index.html" will be edited to point to just "/" instead. |
I am looking for just a little clarity on that point above.
So if i have :
I should change it to:
And then eventually get it set up so not even file extensions are needed I would imagine...?
Yes, that's right. Actually, I prefer <a href="http://www.example.com/>Home</a>.
Related thread from our Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page.
Yes, never include the index file filename in the link.
End the link with a trailing slash on the end of the URL.
Finally add a 301 redirect from .../index.html to .../ for the root and for all folders, preserving any folder names in the redirect.
Can anyone explain to me why we have to link the internal pages with the root www.example.com and not ww.example.com/index.htm. I just need to know thefundamentals behind that. Also you can see many popular wesites do have their internal pages linked to index.htm and not the pure root itself and this has been done since a long time.Also whether redirecting example.com/directory/index.html to example.com/directory/ is a valid procedure.
These four URLs likely all show the same content:
They are treated as four different URLs by search engines, and therefore as four different but identical pages.
That is called Duplicate Content.
If your internal pages link back to /index.html, all your pageRank is channeled there. External sites probably link to www.domain.com/ and their Pagerank is channeled to that URL.
With split Pagerank, both pages are not as strong as they otherwise would have been.
Additionally, Google does not want to list the same content multiple times. They will pick one to list and hide the other, or else drop it into the Supplemental Index.
They usually favour listing the shorter URL, like www.domain.com/ and so that is the one that your website should internally promote too.
However, even if all of your links point at the canonical version that you chose, the fact is, is that all the alternative URLs will still work and all of them will still send a "200 OK" HTTP status code when accessed. You haven't fully solved the problem at this point.
Search engines might still find and then try to index those other URLs. If you add a 301 redirect for the three versions that you do not want to be indexed, then none of those can be indexed. Only your one chosen canonical URL can be indexed.
The 301 redirect ensures that anyone that does try to access the other three URLs is redirected to the correct URL before the content is served to them.
We did a mass 301 for the entire site almost 2 months ago...
until the last week our "ROOT" was I thought removed from the site:domain search but it seems its is just suppressed to page 30 - 32 lol
We lost all rankings for the Main domain root page while most of our internal traffic for long tails remains strong..
Any Idea what kind of penalty this is?
The main KW's we used to rank for we are no where to be found.
All non ENGLISH google's still hold us at our old positions for over 50 days since the English versions are punished.
we did fix a potential duplicate issue 1.5 weeks ago with a index.php showing blank but with meta data.
ALSO...the site gets indexed everyday. More pages are added everyday. Our home page (according to google webmaster tools) has 70k in links from other sites...Cache is always updated on the root domain everyday also...