301 dupes into nonexistent 404.html

I have 5000 pages of duplicate content I need to kill. I screwed up, and these 5000 pages took on the content of my home page. I need to get these 5000 pages out of the SERPs, so I can get my ranking back.

This is what I did to fix it:

In my httpd.conf, I used a RewriteRule to 301 the pages into [mysite.com...]

It works. When I use Webmaster Tools - Fetch as Googlebot, I get this:

HTTP/1.1 301 Moved Permanently
Date: Sun, 18 Oct 2009 15:03:25 GMT
Server: Apache/2.0.52 (CentOS)
Location: [mysite.com...]
Content-Length: 322
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.mysite.com/404.html">here</a>.</p>
<hr>
<address>Apache/2.0.52 (CentOS) Server at www.mysite.com Port 80</address>
</body></html>

+++++++++++++++++++++++++++++

Of course, 404.html doesn't actualy exist, so, when you try THAT URL, you get:

HTTP/1.1 404 Not Found
Date: Tue, 20 Oct 2009 14:37:51 GMT
Server: Apache/2.0.52 (CentOS)
Content-Length: 288
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /404.html was not found on this server.</p>
<hr>
<address>Apache/2.0.52 (CentOS) Server at www.mysite.com Port 80</address>
</body></html>

+++++++++++++++++++++++++++++++++++++++++++++++++

So, in the SERPs, I had just under 5000 pages which were indexed, and were showing my home page content. Now, it appears that some of these URLs are losing their titles and descriptions in the SERPs, but they are still listed in the SERPs, with just the URL, and a "Similar" link underneath.

+++++++++++++++++++++++++++++++++++++++++++++++++

I am aware of the URL Removal Request form. However, I am nervous about using it. I theorized that I need to make the fix, and wait for Google to crawl the fix, and 404 all of these pages. I was hoping that I would see this occuring by watching the number of affected URLs start going down from 5000. However, the number of affected URLs is still 5000, they are simply losing their titles/descs. I assume this is good from the standpoint of getting all the duplicates of my home page out of the SERPs. However, I still have all of these dead empty URLs now in the SERPs.

So, this has caused me to lose my 100% confidence in my solution. Am I dealing with this correctly?

1. Do I simply wait for google to update all of these URLs, like I am doing?

2. If so, when all 5000 are updated and none display my home page title/desc, I assume at that point I could do some housekeeping and use the URL Removal Request form and delete the now-empty-URLs once and for all? (I would at that point implement nofollow/noindex to make sure they never got back in. I can't do that yet because they won't get my 301->404 fix if I block googlebot from getting to them...)

3. Am I barking up the wrong tree? Am I solving this the wrong way, or in an unnecessary fashion?

4. Do I need to wait for the URLs to all 301 into the 404? Can I just do the URL Removal Request now, and delete them all right now, regardless if they are showign my home page content or not? Would a URL Removal Request for the 5000 URLs right now solve my dupe content problem in one quick move? Or would that be dangerous?

++++++++++++++

Long post. Sorry. I've read all threads I can, my fix is the culmination of much of what I've read here, I think I have this fixed, my headers look right, the SERPs are updating, but I just want to get a 2nd opinion while I sit here and wait that I am doing this right.

Thank you!

301 dupes into nonexistent 404.html

helpnow

g1smd

helpnow

g1smd

helpnow

helpnow

helpnow

g1smd

helpnow

g1smd

helpnow

g1smd

helpnow

g1smd

helpnow

jdMorgan

g1smd

helpnow

jdMorgan

helpnow

helpnow

g1smd

helpnow

jdMorgan

helpnow

jdMorgan

g1smd

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week