Forum Moderators: Robert Charlton & goodroi
I understand that those informations are not enought to help me, but since I can't understand at all why this happened I really can't tell much else.
Mmmhhh..... help? Would be appreciated :)
In 500+ pages, I know I have 13 pages that are linked in two different ways inside the site. This is an issue of the CMS I am using (Joomla), but I know about other sites using the same CMS and no big Google index problems.
Since I noticed I had some probles with google, I tried to solve this possible duplicated content problem by banning duplicate URLs on my robots.txt
So I had the same page linked both as [---------.com...] and [---------.com...] and put on robots.txt a Disallow on the second URL.
After a week I removed the rules because I understood it would be better to put in act a 301 redirect, wich I still have to do.
Anyway I had the indexing problem before even touching my robots.txt
Another potential duplicate content problem could be that my homepage and my RSS feed, both indexed, looks very similar. I suppose anyway that google can recognize that they are different things.
Another note, maybe important, on my homepage I currently have 4 outbound external links and 57 internals. Sometimes I could even have less externals.
I have a bunch of inbound links pointing everywhere on my site and no link exchange.
Of course if someone wants to give a look at my website just send me an email.
[edited by: ichthyous at 9:34 pm (utc) on June 1, 2006]
The only thing I see is the cache dates go back to July of 2005 like some people have mentioned. Basically it seems like anything I turned out since August 2005 has gone supplemental.
1. At one time your pages were indexed and since Google has changed it aglo they no longer rate being in the active index. If you don't change anything these supplemental pages will eventually go away.
2. The reason the pages were changed in status from actively indexed to supplemental is because your site's real page rank is too low for the distance (the number of clicks) these pages are away from your home page for Google new algo.
3. If you want to change this, either increase your page rank or reduce the number of clicks away from the home page.
4. Or you can wait for Google to change their algo. They do it constantly and I'm sure they will do it again.
Make sure that the response code for HTTP/1.1 accesses to non-www really is 301. Use WebBug, or similar, to check it.
Run Xenu LinkSleuth over the site and make sure that all internal links always point to www pages. Make sure that there are no links to any non-www URLs anywhere within the site.
The non-www pages may well take several years to drop out of the index, but don't worry about that. They are already marked as being Supplemental and are not causing any problems to you. If they ever appear in search results, then visitors will be redirected to the correct www page automatically by your redirect. That is what you want to happen.
My internal pages do not link to wwww.mysite.com/internalpages.html.. It links to to /internalpages.html, (w/o the domain name) could that be a problem?
1st) 791 in index, down from +/- 1600 and still dropping. all but 1st 13 return supplemental on site:
2) 73 in index. all supplemental but main page.
the 1st, 4 years old, the 2nd 1 year+ old.
301s in place, don't have that problem, xenu run and ok, use that to assist generate site maps.
these are obviously very small sites vis. sites being run by the folks here. however, up until about last September, the 1,600 page site was getting about 6,000 googlebot hits a month. currently just 500 a month with more pages on site. all original content.
Our site has lots of text links on the home page. We have site map as well.
The site was designed with lots of text links for the crawlers to get through easily..
However, we currently have load of supplemental results and very few indexed pages..
So, again is thier a way I can check if the Google bots are having trouble crawling our site?
I guess its because of duplicate content
well, i guess anything google these days is possible, but if it's duplicate content i cannot imagine where it's coming from. it's all original content. it's a small 1 man site that sells books. some terms naturally may occur in more than one listing on a page, such as "page" or "history" or whatever. the only duplication i can possibly see is the info on the "product detail" page for 1 product is also in the listing pages of 10 items per page.