Forum Moderators: open

Message Too Old, No Replies

404 vs 301

         

lost in space

8:39 pm on Dec 16, 2004 (gmt 0)

10+ Year Member



Before starting this thread I searched WebmasterWorld regarding this issue but the only results I found where from 2003 and I’m looking for more recent answers since things are always changing - especially with Google.

I have 3 different subdirectories on my site (example.com/widgets/index.htm + example.com/discount_widgets/index.htm + example.com/cheap_widgets/index.htm) each of the 3 subdirectories has roughly 10,000 pages each because my site sells roughly 10,000 different products with a page optimized for each product. The 3 different subdirectories were used to optimize each product 3 different ways to capture hits based on how the user typed in the product to the search engine. Example "widgets" and "discount widgets" and "cheap widgets", however the content of the pages only varies by 10% to 15% between the 3 subdirectories, which seems like spam and/or duplicate content.

I’m redoing my site and eliminating 2 of the subdirectories because I believe they are hurting my site due to duplicate content. Now for the big question, should I 404 or 301 the removed pages?

My site is dynamic so I have control over what I hand back to the user (or robot) based on the page they request, so I could hand back a 404 for the removed pages that maps to the matching product in the subdirectory I’m keeping so it’s not a dead end to the user -OR- I could hand back a 301 for the removed pages that maps to the matching product in the subdirectory I’m keeping. Either way the user gets to where they want I just want to know what’s better to hand Google (by the way I don’t mean cloaking, the user and Google will get the same result, I’m strictly speaking about the HTTP result code in the header being a 301 or 404).

I would prefer a 404 myself, the thought being to show Google the duplicate content is DEAD not redirected, but since I’ve never had experience with this I want to see what the general consensus thinks is better.

[edited by: ciml at 4:33 pm (utc) on Dec. 17, 2004]

lost in space

6:26 pm on Dec 18, 2004 (gmt 0)

10+ Year Member



Anybody?

walkman

6:47 pm on Dec 18, 2004 (gmt 0)



If I was you, I'd bite the bullet and DELETE everything in the two directories via Google's remove page. Within a day, everything will be deleted by Google. I used 301 to move domain.com to www.domain.com and I still had pages on domain.com 5-6 months later. Google is way too slow on 301s in my opinion. Plus, who knows, maybe any "penalty" will be trasfered to the directory.

I would start fresh but this is your site so ...

Powdork

6:53 pm on Dec 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm with walkman, although I thought the remove URL thing works for pages only, meaning you would have to manually enter the 20,000 url's. Not sure though. If you just take them off the server they'll be gone soon enough (although G will probably bring them back when they want to say they have 12 billion pages). I have had the same problem with 301's and Google. Yahoo seems to be doing them right now.

walkman

7:02 pm on Dec 18, 2004 (gmt 0)



"although I thought the remove URL thing works for pages only"

oh no, it works brilliantly for directories. You put a robots.txt in THAT directory and submited the robots.txt to G and everything will be nuked (or saved for the day MSFT announces something so Google can have 16 billion pages ;)).

blah : Googlebot
Disallow: /this_directory