|Deindex Website from Google first and Start with a Clean Slate?|
I was given a portal that was doing poorly in Google, give its decent link popularity. Alexa rank in the range of 10k. Tens of thousands of pages. (No instances of Alexa rank being influenced :-))
It was a Herculean task getting my head around the overall site structure and navigation. Digging deeper, I ascertained only about 10% of pages are useful. Remaining 90% of pages either duplicate pages, search result pages, various forms, print version pages, test folders, and so on. Now, next task was getting non-useful pages out of index through robots.txt, 404ing duplicate pages, setting URL parameters in WMT (there are about 40 different types of URL parameters!). Overall cleaning up the mess. I told it will take time for Google to notice and readjust its index.
The client asked. How about deindexing the site, ensure site is out of Google index completely and let it index, so that only the useful 10% pages are in the index? With URL removal tool it can be done in a matter of days. I searched for and answer, that it might be too drastic, not convinced with my own answer.
Anyone ever done that? Can it be thought of as an option to speed up the process?
I agree with your initial assessment. The client may be impatient for results, but a complete deindexing in this situation is probably too drastic.
Deindexing from Google will not fix the problem of having poor quality pages on the site or the 404 errors you will have if you only delete those poor quality pages.
Thanks for the confirmation Tedster
|Deindexing from Google will not fix the problem of having poor quality pages on the site or the 404 errors you will have if you only delete those poor quality pages. |
During the cleanup act, many entries are made to robots.txt - few folders, few files and many wildcards are disallowed. URL parameter handling was defined in WMT. A majority of these disallowed pages continue to be present in SERPs and it may be awhile before they are out. To cut the time, it was a suggestion to remove all the pages and let only useful pages in, since not-useful pages are blocked in robots.txt. Not so bad an idea to dismiss outright?
Compromise? Leave index alone but remove everything from g###s cache, so at least you don't have all those garbage pages flopping around in the background.
|A majority of these disallowed pages continue to be present in SERPs and it may be awhile before they are out. |
these may never be out of the index.
a disallow excludes googlebot from crawling but it's not an indexing directive.
|these may never be out of the index |
Pages in a folder can be removed from google's index through URL removal tool on WMT. But there are many, thousands in fact, such as search result pages, that are not in a particular folder but generated through dynamic queries. They are now disallowed through a wildcard in robots.txt. Will they always remain in the index, unless you deindex the whole site?
P.S: Adding META noindex/nofollow to those pages doesn't seem to be possible, given the way the site is built.