|301 redirects and duplicate content penalty|
I'm wondering if 301's can lead to a duplicate content penalty. I see pages of my old site still indexed after days or weeks, in Google, long after the 301 has been in place.
At what point would a duplicate content penalty occur?
Or am I worrying unnecessarily?
Fred, I think you're worrying unnecessarily. This is assuming that you've checked the 301s, and that the redirected requests return the desired urls with a 301 header response.
While I have no special inside track on Google's indexing structure, I do know that it's a very large and complex system, processing so much data that there are databases handling the order in which other databases execute various operations. It makes complete sense to me that, in such a large system, there are likely to be latency issues. I've observed myself that display of new data is liable to lag behind actual indexing.
I'm sure that among the rules in such a system, there is a routine that would prevent old redirected content, removed from the index but still displaying in the serps, from being seen as duplicate material.
Conceivably, though, if your pages aren't getting crawled, redirects of old material might not have been indexed. Have you checked your logs to see if Googlebot has seen your redirects? You might want to use "view as Googlebot", which in effect would both provide a confirmation of what Googlebot sees and also prompt a Googlebot visit.
Can you check the above and describe the situation in more detail.
@fred9989: I'm seeing the same. On my site the redirect has been in place for about less than a month. The old site still has all the pages indexed and show up in SERPs. And scrapper sites are ranking higher. Possibly because of the duplicate content penalty.
For the latency issues, I doubt that should be the case. If a new post can be indexed and result in the SERPs within a few hours, why should it take months to understand that a 301 means "permanently moved".
|If a new post can be indexed and result in the SERPs within a few hours, why should it take months to understand that a 301 means "permanently moved". |
Simply because Goliath is not stupid. To remain a black box, they must provide as little feedback to webmasters as possible, especially those webmasters who are consciously looking to change their results (which is nearly ALL webmasters). So, all webmasters are perceived by Goliath as "web spammers" or black hatters. Wait and you might see results...in a few months.
Relax Fred... I had a website with 100,000 indexed and a PR2 and all 301 but of course not a 301 where you send all your old sites to the same old index page.... I sent all 100,000 to similar pages on the new site... as google says if you don't have an exact page to send me to then use the one that is the closest that you have.... as the pages indexed many did drop out of the index on the old site and I watched the pr drop from a pr2 to a pr1 and then later about 6 months I still had 20,000 of the 100,000 indexed.. and lo and behold the pr began to climb.. and went back to pr2... how the heck.. google must be dumb as a rock... but i let it go since my new site has come off its rock bottom and moved to a PR1 finally.... but I fear that as the old site gained pr.... was my new site gonna drop back to zero.. never happened.. the new site is sporting a pr2 a year later and the old site has finally dropped to less than 100 pages indexed and a whopping pr2.. so don't try to out guess google.. even they have no idea what they will do next.. Someone may have noticed that I 301 redirected a 100,000 pages to similar pages... and they say isn't that darn near impossible.. that would be 100,000 301's wouldn't it... well.... yes... you could say that at least it would appear so.... If I tell you how I will have to kill you... but if you really want to know PM me......LOL...
Oh.. forgot to tell you.. search google for "header checker" and use one of the many free header checker tools to visit may a random dozen of your pages on the old site...make sure the header returns a 301 permanent redirect... if not fix it quick.....
Also visit a random dozen pages and make sure the browser is redirecting to the new site as expected... and if every page lands on the index page of the new site... you have just met the kiss of death... as google crawls your old site with each page different, and they are immediately taken to the index page no matter what page they crawl on the old site..... wouldn't that piss you off alos if you werre google and your old site had 20,000 pages 301 and when you got forwarded 20,000 times you had the same old index page stuffed down your throat.. and keep telling google well that is the closest page to matching my old site... wow... that will transfer 'ZERO' pr from the old site and will never index a single page... well maybe it will index a single page... that is the index page 20,000 times... Talk aabout dupe content.... 20,000 times google was fed the same identical page.... but there is an easy solution... just send google to a similar page on the new site.. not the index page NOT EVER!
I tried the Google Webmaster Tools option > Change Of Address. And here's what it said after I submitted the move request.
Your old site's server should be configured to serve 301 permanent redirects to your new site.
Ask webmasters to update their links to your new domain, and make sure incoming links to your old site are redirected correctly using 301 redirects.
For other general questions refer to our guidelines for moving your site to a new domain.
Duration of effect
Your change of address notification remains in effect for 180 days, by which time Google's index will be fully updated with your new site's information. After 180 days, you can extend the period by submitting the change of address again.
No truer words were ever spoken.. Varun... however here is where all the work comes in... with the tiny little statement by google "Your old site's server should be configured to serve 301 permanent redirects to your new site.
Ask webmasters to update their links to your new domain, and make sure incoming links to your old site are redirected correctly using 301 redirects."
Which can be a massive job for those who cannot automate the process... it ain't no FUN to build 100,000 301 from the old to the new site... and not only that if the system has to review 100,000 301 each time before it can serve the page... it can slow the site down significantly... so the best method is the Gen up the 301 on the fly... meaning only build the 301 as it is needed... For example if you have 12,000 pages that google never gets around to crawling.. why bother with a 301...just build a 301 on the fly as google finds your pages... make sense?
@tommytx: I'm just using .htaccess to redirect the entire domain to the new one with a 1:1 url redirection. That saves me time and forwards the entire site, be it an image, css or anything else.
|why should it take months to understand that a 301 means "permanently moved". |
Google does not commit to a 301 redirect quickly after they first see it. They need to trust test it and to make sure it stays stable. This is partly because a 301 redirect can often be a tool for deception, spamming and phishing. A site's overall trust history can come into play, here. A high historical trust level can get a new 301 accepted much more quickly.
Another issue is that even technically knowledgeable webmasters make errors with 301 redirects and then need to fix things. So Google is cautious in accepting this "instruction".
|it ain't no FUN to build 100,000 301 from the old to the new site |
We've got just the forum for you :)
the htaccess only works when you are transferring one site to a clone of the old site... which is pretty rare... normally what you see very frequently is lets say for example a real estate site has 50,000 pages indexed under an IDX... its rare for the new site to have the same pages as the old site... expecially when it usually a change in IDX is the reason for the new site... and of course going from a website to a blog very few of the pages match up.... of course its a simple matter when you clone one site onto anoter domain... or even just change domain names.. htaccess is great for just changing the domain name and you simply user the original rest of the url.. but clone is not the normal scenario that I have seen....
Yeah.. in your case its about a two line entery into htaccess and done.