|2 pages out of 150 still show no www. |
No description in serps either
Its been many moons now since I set up a 301 redirect to end all non-www requests
to the proper www.mysite.net pages.
I went thru the entire site making sure all internal links were www also.
That seemed to help a little in serps, but other factors could explain things.
I just now did an allinurl: check for my site, leaving out the www.
All my regular www pages show up as such, but 3 pages stubbornly remain
listed without the www. Worse, they are URL only, with no description.
Those 3 pages are listed both ways: www with a short snipped and non-www
which is URL only.
Searching by keywords, only the proper www-version shows at all.
1) What are the likeliest causes of this? Old incoming links maybe?
2) Is it any cause for concern? Pagerank splitting or whatever?
3) Is there something I should do about it? If so what?
It doesn't look like any emergency in any case. Thanks in advance -Larry
I have written extensively about a site that had been online for 3 or 4 years without a redirect, and was badly listed in Google. There were about 150 listings in Google, some were shown as www and some were non-www pages. Most were without title and description. Some pages were listed twice (both www and non-www versions) and some were not listed at all. The site really has only 118 pages.
In March the redirect was added from www to non-www (the opposite of what I normally do). All of the non-www pages appeared within days, and almost all had a title and description. The www site listings were slowly fixed in the index, either losing their title and description, or just disappearing from the index. The cache date was updated for all of the pages that remained in the index.
After a few weeks, only a few www pages were left, then many suddenly reappeared in the index again. This time they were fixed by linking to all the stuff we didn't want indexed by putting a "fake sitemap" on another site. After a few weeks, all was fine again; at the beginning of May, most of the www pages dropped out again.
At the end of May, just as the Bourbon update was beginning, the index suddenly went back to the version that Google had displayed back in January, and the cache dates were all from December 2004 and January 2005 too. The index contained both www and non-www pages again, and many URL-only listings too.
It stayed this way for nearly a month, and then was fixed all by itself, all except for two www pages which still remain. The latest changes happened much less than two weeks ago.
I have to admit that I was surprised that so many webmasters had problems with the www and non-www versions. I'm a newbie and early on I realized that my PR was being split between the two versions. I did some reading and someone here mentioned to do the 301 - so I did. Honestly, I thought that was webmaster 101 and I just didn't have that class yet. I think I put the redirect in last fall.
Google did struggle with the two versions for about two updates before it was completely straightened out. For me, the PR consolidated too - which was the benefit I was after.
Here is that thread:
The interesting thing about that thread was that g1smd posted in the thread. I have a lot of respect for g1smd and read his? posts carefully. I was surprised to see him posting later on about the 301 - he could have solved his problem back in October...
The site that I fixed was that of a friend, and it was fixed by adding the redirect the very same day that I first discovered that there was a problem with the site.
Yes, I am aware of the redirects and have already recommended them for a very long time, but I haven't yet personally gotten around to every webmaster to suggest they use it too. :-)
<<301..... Honestly, I thought that was webmaster 101>>
It is now!
I also have some stubborn non-www pages that won't go away.
I wonder if Google Sitemaps could be used to force G to crawl the remaining pages?
So submit a site map of the non-www pages (or whatever you want to get rid of), forcing Google to crawl them, see the 301, and close out that version of the site?
Anyone want to give it a shot?
Thanks for all the comments people. I suppose I should leave well enough alone,
and just be grateful that I don't have a much bigger mess.
The 2 or 3 pages in question aren't that important. -Larry
OK. Another development.
One of the two remaining URL-only listings for www pages has today dropped out of a site: search. The page that that URL pointed to has not existed for several years.
The other rogue www listing still remains. The 118 non-www pages are still listed, all of them with full title and description. The 301 redirect (from www to non-www) has been in place since mid-March.
Here is the odd bit.
Today, 223 of the 224 /cgi-bin pages that exist have all suddenly reappeared in a site: search. They are all shown as URL-only listings. These pages have been disallowed in the robots.txt file since mid-March. The URL of that robots.txt file was submitted to the Google URL Console in late March too (and at that time, all of the /cgi-bin pages dropped out of the index within a few days), and have stayed out until today.
g1smd, man you is busted, hiding those pages first as fully indexed duplicate content, then invisible, now url only but visible and ciml wonders why folks worry about Google cache entries being in the index.
They found them and put them all back as a special holiday gift just for you ;).
The /cgi-bin pages were all added to the robots.txt file as it was not necessary for Google to index them at all.
Half of them need a password to get in. Without it you get a 401 error. It seemed pointless to have them in the index as URL-only entries.
The other half of the 224 cgi-bin pages are pages where people can submit information. Those pages are near identical to each other. Again, it was not necessary for Google to index those. We didn't want people coming to a submission page directly from a Google result. We wanted them to see the site content first. If they then want to submit something, then there is a link on every page to do so.
In March adding the /cgi-bin folder URL to robots.txt got them all out of the index within a few days (the robots.txt URL was submitted to the Google URL Console for removal).
Google has correctly listed the 118 real HTML content pages; we didn't need the 224 site management cgi-bin pages listed too. The cgi-bin pages are not duplicates of the site content, 112 are public submission pages, and 112 require a password to get in.
OK. I have partially found the answer.
In the Google URL Console, pages that were requested for removal between 2005-03-28 and 2005-04-04 are showing as expired and so have been added back into the listings.
I don't know why they have been re-added, as the robots.txt file is still on the site, and still says that they should not be indexed.
Note: A few months ago many people were reporting that this removal would be for 6 months. These pages are back in after only 3 months.
However, there are other pages (on other sites) that were removed in March (in fact all of the pages removed in the rest of March, all before the 27th) that still show as "complete" and are still removed from the index.
I wish, now, that I made a note as to which pages were removed by submitting the URL of the robots.txt file, and which were removed by submitting the URL of a page or folder that needed delisting and letting the bot see the 404 status that some were giving out.
<edit>It looks to me, that pages asked to be removed by submitting the URL of the page are dropped for 6 months or more (that is, they are still out after 4 months), and that pages removed by submitting the URL of the robots.txt file that mentions them as "disallowed" are removed for only three months (as they are back in on the 91st day).</edit> -- no, see the rest of this, I already disproved it:
Oh my! I removed all trace of a local site that had closed down, by submitting the URL of each page, individually, to the URL Console. I did that in mid-March. The entries from March 26th still show as "complete" in the console, but those from March 28th show as "expired". A load of pages have actually been added back into the Google index, with full title and description and a cache from nearly a year ago. Every link in Google's index goes to a 404 error page, as the site has been offline for 9 months (there is a "we have closed down" notice on the main index page, all of the rest of the content is gone, long ago). I cannot understand why Google didn't check the status of the pages before adding them back into the index. As they are all 404 (were submitted as 404 pages), and have been 404 for 9 months, why add them back in?