Forum Moderators: open
I have 3 different subdirectories on my site (example.com/widgets/index.htm + example.com/discount_widgets/index.htm + example.com/cheap_widgets/index.htm) each of the 3 subdirectories has roughly 10,000 pages each because my site sells roughly 10,000 different products with a page optimized for each product. The 3 different subdirectories were used to optimize each product 3 different ways to capture hits based on how the user typed in the product to the search engine. Example "widgets" and "discount widgets" and "cheap widgets", however the content of the pages only varies by 10% to 15% between the 3 subdirectories, which seems like spam and/or duplicate content.
I’m redoing my site and eliminating 2 of the subdirectories because I believe they are hurting my site due to duplicate content. Now for the big question, should I 404 or 301 the removed pages?
My site is dynamic so I have control over what I hand back to the user (or robot) based on the page they request, so I could hand back a 404 for the removed pages that maps to the matching product in the subdirectory I’m keeping so it’s not a dead end to the user -OR- I could hand back a 301 for the removed pages that maps to the matching product in the subdirectory I’m keeping. Either way the user gets to where they want I just want to know what’s better to hand Google (by the way I don’t mean cloaking, the user and Google will get the same result, I’m strictly speaking about the HTTP result code in the header being a 301 or 404).
I would prefer a 404 myself, the thought being to show Google the duplicate content is DEAD not redirected, but since I’ve never had experience with this I want to see what the general consensus thinks is better.
[edited by: ciml at 4:33 pm (utc) on Dec. 17, 2004]
I would start fresh but this is your site so ...
oh no, it works brilliantly for directories. You put a robots.txt in THAT directory and submited the robots.txt to G and everything will be nuked (or saved for the day MSFT announces something so Google can have 16 billion pages ;)).
blah : Googlebot
Disallow: /this_directory