|Duplicate Content vs redirect question vs 404?|
I need to steer the bots in the right direction - my PR is bad
I've got a slight problem.
I once added a bunch of content (hundreds of pages) but didn't put much thought into the URL name. For example:
My content did pretty well and my PR started going up on some of these.
Then I added AdSense. I noticed that AdSense was posting ads that were not relevant to my content in many cases. Often, the URL chunk that didn't have anything to do with my site would trigger ads.
So, what I did was create a parallel directory of content and remove links to my old path. Now my directory URL is named somthing meaningful to my content. For example,
The problem... My old links are still indexed and PR'ed. My new URL is not getting indexed quickly and the PR stinks. This is after several months.
By the way, I know that folks have links to my content on other sites using the old URL.
So, today I've redirected folks from the old URL to the new. I don't have a full 404 response.
Question: Should I use a full 404 to just deny access to my site? Should I do something else to ensure my site is properly indexed?
|So, today I've redirected folks |
If you redirected with 301, you probably need nothing more.
301 is good. Do that.
Question along the same line.
What is the best way to go with pages that have been renamed or deleted....404 or 301?
We no longer carry a line of products so all associated html pages got deleted, now would it be better to redirect those pages via 301 to pages with similar content if possible or just let them bring up a custom 404 page that has navigation back to the site for the customer? Also this 404 page does show 404 status when using a server header checker.
Same question applies to pages that have been renamed.
Example: Renamed our furniture sale pages, furnituresale.html to sale.html, same basic content just different page name. Should we use a 301 from old page name to new or use the custom 404 to let the old pages drop out of the index?
For content gone - 404.
For content moved - 301.
That is exactly what those codes are for.
Thanks for your reply about the 301 and 404.
Let me throw out another example....suppose you have 4 pages of lets say tricycles for sale, but you reduce the number of tricycles you carry so now you only have 2 pages of products....would it be better to redirect the 2 pages that were deleted back to one of the remaining pages of tricycles?
My confusion comes in from whether to redirect deleted pages to pages that have similar content or 404 them...because I have been told that too many redirects looks spammy to the search engines, but then again I have also been told that too many 404's appears the same way
|For content gone - 404. |
That is exactly what those codes are for.
Had an interesting exchange with G help the other day regarding 404s and 410s. I am returning 410s for gone pages.
I enquired whether 410s were acceptable. The reply back was to the effect " . . . removal system . . . only processes pages returning true 404 errors . . . ". Researching this further, I found a reference to a January '06 conference where someone (Vanessa Fox?) said 410s were ok. Posted this info off to help @ g and have not heard back.
404s are mentioned here frequently, though it would seem 410 is more appropriate to remove a definitely "gone" page if one chooses to go that route instead of a 301 redirect.
According to w3.org specs [w3.org]:
404 is: Not Found, and continues ". . . No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable . . ." So, 404 can denote a temporary situation.
while 410 says Gone ". . . resource is no longer available . . " and " . . . condition is expected to be considered permanent . . ." This is explicit; gone, done, dead, kaput.
One would think G would be jumping all over the 410s instead of advising - only true 404s work.
There may be a downside to all this (410) I am not aware of not being that technically inclined, but so far it does not appear to be a problem.
fwiw, bad pages I am intentionally trying to remove from the index are being removed successfully with the 410.
Matt Cutts has said that 404 and 410 are treated the same by Google.
However, I don't see how you can say that a resource is permanently unavailable. Lets say you have a page www.domain.com/widgets.html which you delete and return a 410 response for. What if you later reactivate a page at that URL? What if you sell the doamin, and the new owner activates a page at that URL? Should the previous 410 resonse mean that it is never looked at ever again, never indexed, never can show in the SERPs?
If a 410 were to be interpreted as "never look at this URL again" then that is what would happen. In reality, a search engine needs to check the status of every URL that has ever existed at least several times per year, just to make sure the status, and/or content, has not changed in the meantime.
|Matt Cutts has said that 404 and 410 are treated the same by Google |
yes, i missed that reference, but picked similar up elsewhere. though help @ google is saying 404 only. perhaps just their boilerplate standard response.
|However, I don't see how you can say that a resource is permanently unavailable. Lets say you have a page www.domain.com/widgets.html which you delete and return a 410 response for. What if you later reactivate a page at that URL? What if you sell the doamin, and the new owner activates a page at that URL? Should the previous 410 resonse mean that it is never looked at ever again, never indexed, never can show in the SERPs? |
again, understand your point. however, the pages I am talking about are **bad**, never should have existed, don't really exist, shouldn't have been spidered, will never be used again.
these are not in the form /widgets.html. they are in the form /widgets.php?query_string=-252. note the negative and strange number in the query string. these are being spidered and returned due to bad coding on my end in the script/db. i am simply trying to apply temporary 410 patching for the problem urls in G until I can get the programming sorted.
i do not want these specific urls to be looked at again.