homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Removing large quantities of product pages

 12:54 pm on May 8, 2012 (gmt 0)

We have a large old site that has a high number of pages (around 200) that are old products and very weak content-wise. As the site has grown they have been pushed back but never updated and not removed from the server. They are still linked in but very poorly and not from any top level pages.

We are doing a massive overhaul on the site with a new design and navigation structure which will obviously link to the remaining useful pages on the site. But what to do about these old pages?

They don't get any traffic and I am sure G ignores them completely but I don't want G to think that the site has suddenly shrunk, if it is going affect how it views the site.

Do you think it would be best to:

- 301 all the pages (even though this is a massive task)
- if so to the index page, or spread out to other pages?

- Delete the pages completely from the server and put a custom 404 page up

- Break all the links to any of these pages but leave them on the server

Or any other ideas?



 2:58 pm on May 8, 2012 (gmt 0)

Delete the pages completely from the server and put a custom 404 page up

If you're not still actively selling the products, then I like this option. Except I'd maybe try serving a 410 to see if that would get the pages out of Google faster. 410 means gone gone gone never to be returned.


 1:11 pm on May 22, 2012 (gmt 0)

OK I am going down the 410 route and have found this code that needs to go into the htaccess file:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) - [G]

ErrorDocument 410 /gone.html

Do I need to list somehow all the pages that have been deleted or does this code just say "any pages not found have permanently been deleted"?



 1:23 pm on May 22, 2012 (gmt 0)

I wouldn't send 410 for URLs that have never existed (your code does that, and that is a problem).

You're creating a problem when a new page comes online and the status changes from "410 Gone" to "200 OK".

Is the site database driven? If so it is a simple task to tie this to that data. Set up a separate table that records the ID of products as they are deleted from the main product table. Amend your script to check both tables. If product found, then show it. If the ID is found in the "deleted" list then return 410, otherwise return 404.

Do the job properly. A five minute hack will create more long term problems than it solves.

[edited by: g1smd at 1:38 pm (utc) on May 22, 2012]


 1:28 pm on May 22, 2012 (gmt 0)

The pages DO exist, they are just old and no longer relevant/needed and we are restructuring the site and want to remove these from the server and tell Google that the pages should be removed from their index.

So, is the code above right?
Do I somehow need to list all the pages that we are removing?


 1:37 pm on May 22, 2012 (gmt 0)

The code above sends "410 Gone" for anything and everything that does not exist.

That is a problem for URLs that will exist in the future. You should send 410 only for pages that did once exist and now no longer exist.


 2:07 pm on May 22, 2012 (gmt 0)

The site is not database driven, it is not dynamic - it is all static html & php pages. As I said, these pages that we want to remove are dead pages never to be resurrected...

So this code will tell G that they are just that, gone?

If that's not the case and we should be listing all the pages that are being removed what is the code?


 2:29 pm on May 22, 2012 (gmt 0)

Yes, the code you supplied will tell Google that those pages are gone, but with a huge side-affect that will negatively impact the rest of your site.

Is there any commonality between all the various URLs that will be "Gone"?

You should make a list of those and examine it, as it may push you towards one solution being better than another.


 2:46 pm on May 22, 2012 (gmt 0)

No, the pages are all named different things like barcelona_suite.html / verona-chairs.html / etc named as the product the page is about.

What "huge side-affect that will negatively impact the rest of your site" will this be? Everything I have read says this is the best way to deal with permanently removing pages from a site.


 2:47 pm on May 22, 2012 (gmt 0)

I've never had large numbers of old product pages to remove as in the OP, but what seems to work well for me is to just change the robots meta tag on the pages to "no index, no follow."

There have been a couple of times when a product was later updated, improved, or a new product was released. I was then able to update the old page, change the robots meta to "index, follow" and the page returned to the SERPs in a few weeks.

I've always felt that Google in particular doesn't like it when a page suddenly disappears. It either wants to be redirected to a new location, or at least find the old page, even if there's nothing on it. I usually strip down the old pages to remove most of the old content from them, and make sure I'm not linking to them from someplace else on my site. I also add a "Sorry, this page is no longer being maintained, please see..." message to the page to catch anyone who somehow manages to find it, perhaps by using an old link from an external site.

I value netmeg and g1smd's advice very much, so their suggestions might be best, but this has worked well for me over the years, and it's another option to consider.


 3:19 pm on May 22, 2012 (gmt 0)

Google apparently does now respect the 410 and surely it is better to tell it pages have gone rather than strip 200 pages to their bare bones which would then just leave hundreds of incredibly thin pages albeit not linked in, and adding robots meta to each one would take forever and we really need to clean these off the server!


 3:42 pm on May 22, 2012 (gmt 0)

Returning "410 Gone" for the exact URLs previously used for those pages which are now removed from the site is a good idea.

However, the code you supplied returns "410 Gone" for any and every URL that does not exist, not just for those pages that that you are in the process of removing from the site.

This includes returning "410 Gone" for URLs which will come into use as new pages next week, next month and next year. You should not return "410 Gone" for those URLs.


 3:58 pm on May 22, 2012 (gmt 0)

Ok, so how do I code the htaccess file so I can list these 200 page URLs?


 5:47 pm on May 22, 2012 (gmt 0)

google treats 410 and 404 the same. In my cases, both scenarios took about 3-4 months for google to completely remove the pages from its index.


 6:01 pm on May 22, 2012 (gmt 0)

Google treats 410 and 404 slightly differently. They recrawl every URL they have ever seen forever, however those last seen returning 410 status are scheduled at a lower priority than those returning 404 HTTP status code.*

*Conversation with Vanessa Fox and Pierre Far at SMX London last week.

[edited by: g1smd at 6:55 pm (utc) on May 22, 2012]


 6:41 pm on May 22, 2012 (gmt 0)

I have used both 404 and 410 many times throughout the years. As far as having the content removed from google's index, they are both the same from my experiences. Google has even said they treat them the same. It is also listed in this google article: [support.google.com...]

I have also 410 entire sections of websites, only to return them at a later date in time, and they returned very quickly, just as quickly as the 404's did.


 7:08 pm on May 22, 2012 (gmt 0)

Simply sending the default 404 will not hurt you, I can confirm that by having removed an entire section of my site and watched the remaining pages climb in rank/traffic. Having less low quality pages is definitely the way to go right now.

Tip: look at your analytics over the past 30-90 days and find pages that receive less than a visitor per day on average from search and no direct traffic. If these pages have little to no incoming links consider removing them and/or merging them to similar content pages via 301. You'll remove low quality pages(as determined by Google) and bolster existing pages. "is this a great article?" if the answer is no or you're on the fence, ditch it. If the answer is yes build backlinks to it or merge it asap.


 11:02 pm on May 22, 2012 (gmt 0)

Keep in mind that if you 404 or 410 those pages they'll stay in WMT as crawl errors.


 11:06 pm on May 22, 2012 (gmt 0)

They will show as errors in WMT but generally drop out of the report about three months after no links point to the URL.


 12:02 am on May 23, 2012 (gmt 0)

I can confirm that by having removed an entire section of my site and watched the remaining pages climb in rank/traffic. Having less low quality pages is definitely the way to go right now.

How long did it take before you saw improvement? Did it require a Panda update (or multiple updates) or did you see improvement through normal spidering?


 12:11 am on May 23, 2012 (gmt 0)

I'd guess two to three months, but am also interested in the real answer.


 8:13 am on May 23, 2012 (gmt 0)

Right, decided I am going to do 410

Back to my question - how do I code the htaccess file so it lists all the pages if you are saying the one above one will harm the site in the future...


 10:07 am on May 24, 2012 (gmt 0)

If this code:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) - [G]

Is going to harm future content, is there anyone who can tell me how to amend it so I can list the pages that are to be deleted?


 7:30 pm on May 24, 2012 (gmt 0)

RewriteRule ^the-url-path-and-file - [G]
RewriteRule ^the-other-url-path-and-file - [G]

These rules go before all other rules.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved