Forum Moderators: Robert Charlton & goodroi
One question: would it be a good idea to use the removal tool and get rid of these two other versions once and for all? Or will that kick my actual page out of the index now that Iīm 301 redirecting the ones to be deleted?
At the moment my site seems to be getting a huge hit in the SERPs although it again got a PR7 with the latest PR update. I wonder if all these suplemental listings are harming my actual site.
G just came through and did a full deep crawl, within 2 days G now showes only the new pages with fresh tags and all of the 1200 old pages are gone.
It appears that it cleaned everything up. Side note, I did not notice any toolbar pagerank updates.
My thoughts are that they added an old index to bump up the total indexed page numbers when MSN launched their new SE the mix. and are now going through and cleaning everything up.
Mito99:
>One question: would it be a good idea
>to use the removal tool and get rid
>of these two other versions once and
>for all?
Be very careful with the removal tool. I just nuked a bunch of my www pages trying to remove non-www then to find out a 6 month wait before reinclusion. See this thread: [webmasterworld.com...] msg #:232
Make a list of all the URLs that you do not want in the index. Make that list into a page of links and get that loaded onto another site somewhere. It is likely that Google isn't crawling the old URLs and therefore has not seen the redirect. A page of links to crawl will get them started. In the short term the number of rogue pages in the index might rise, but will then fall to almost zero. It takes at least a few weeks to sort out.
Check your internal links to folders. Make sure they all end in a trailing / on the link. This avoids the redirect from entered-domain.com/folder to $default-domain-name$.com/folder/.
Make a list of all the URLs that you do not want in the index. Make that list into a page of links and get that loaded onto another site somewhere. It is likely that Google isn't crawling the old URLs and therefore has not seen the redirect. A page of links to crawl will get them started. In the short term the number of rogue pages in the index might rise, but will then fall to almost zero. It takes at least a few weeks to sort out.
This is done a/o 5/23. Placed list on main of another (unused) domain and submitted that url to G. Will wait to see how long it takes either for a visit or for the 301 removals.
Check your internal links to folders. Make sure they all end in a trailing / on the link. This avoids the redirect from entered-domain.com/folder to $default-domain-name$.com/folder/.
I think I am ok here, but will recheck.
This should be a start, now if I could only get rid of those nasty "supplementals" (:
My feeling is that my newer pages are in the index but have not yet had pagerank transferred to them properly; or my newer pages aren't in the index at all. It is conceivable that my pages are in the "per domain" index file, but not the regular results.
Also, my old supplemental pages haven't been crawled en masse by Googlebot for quite some time: at least several months. I only see one or two being 301'd here and there in my logs.
This is the exact same thing that's happened to me as of 3/23. Phrases that I'm used to monitoring now return "supplemental results" that rank *very* poorly, and my newer pages are nowhere to be found. However, if I request "more results from www.mydomain.com" my newer pages show up in the results.
I'm coming at it from a different approach, ie not serp, instead using site: or allinurl: to get an idea of total in the index. I'd estimate perhaps 90% of my pages are either non-www, url only, not cached, or supplemental.
Also, my old supplemental pages haven't been crawled en masse by Googlebot for quite some time: at least several months. I only see one or two being 301'd here and there in my logs.
Well, I thought mine were being crawled en masse. I'm getting +/-6000 googlebot requests per month (it's a small site). Yet some of the cached files have dates as old as March 04 and Oct 04. The files in question are certainly being requested, but as g1smd mentions above, evidently the non-www url is not. Just checking the logs now, I don't see many 301s associated with googlebot, though do with a few others.
One question I have on the supplemental, url-only, etc issue is page title and meta content. My page 1 for blue-widget has the same <title>, meta kw and desc as pages 2 and up. The head section is generated dynamically with a php .inc file for each product category. So, blue-widget.php is indexed, cached etc all fine, but page 2 which is blue-widget.php?offset=10 is not. I'm code challenged, so I'd have to hire someone to make any programming changes. Would the simple addition of "Page 2" to <title> and some different kw's/desc make a difference?
As an aside, all the above mentioned issues do not occur with MSN. Pages are in cache, cache dates are fresh, many times < month, new pages appear quickly, etc.
Once a page is a supplemental result for a particular search query then the title and snippet are not updated again (for that search query).
You can change the content on the page and it will rank for the new content, the new content will be cached, but the page will also continue to appear in SERPs relating to searches for the old content even if that content is no longer actually on the page or in the cache.
This can happen even if Google does update the cache for the page.
A page can be a normal result for searches based on current content, but can be a supplemental result when making a search that includes words that used to be on the page but no longer are.
Got reply back from support ". . . Please note that we searched for example.com and found that it is currently included in our search results. . . " Unfortunately, they made no mention of the underlying problem with the non-www pages and with getting the www pages which I inadvertently nuked back into the index before 6 months.
g1smd: If you're still following this thread, an update on your suggestion re linking to the bad (non-www) urls from elsewhere. Submitted url 4/23 (not 5/23 as I previously posted). Yahoo found the site on 4/27 and again on 4/29. Googlebot hasn't been around yet. Will continue to wait.
[edited by: ciml at 7:27 am (utc) on June 13, 2005]
[edit reason] Examplified [/edit]
Within a few days of sorting out both the links and redirect, nearly all of the non-www pages had a title and description where the URL also had a trailing / on it, but there were still many of the other 3 variations also with a title and description. All of the non-www pages with trailing / on, were now in the index. There were also a large number of www URLs with and without a trailing / on, and a large number of www URLs without a trailing / that were also listed, but mostly without title and description.
Within another week, all non-www with / had a title and description, and only about 10 of the non-conforming URLs still had a title and description. The number of URLs (the ones we wanted removed) without title and description originally shrank to a few dozen, and then got "stuck". I then put the "external sitemap" in place, and the number of URLs without title and description started falling again, and then a few days later rose again to well over 100. After a few days, the number then started falling again, and has continued to fall. The number gets less by 6 to 8 links every 2 to 3 days, and is now down to 14. All of the non-www with / continue to be fully indexed, and every page of those is indexed. Everything else is without title and description, and is fast disappearing.