Forum Moderators: Robert Charlton & goodroi
Recently, on one of Matt's video's he also commented that the matter was complex.
When i looked into these forums [ unless i missed something ] i could see nothing that described the elements into a high level format that could be broken down and translated into a framework for easy management.
Does anyone believe they have mastered the comprehensive management of dupe content on Google into a format that can be shared on these forums?
You need to make sure that three of the four variations are served with a <meta name="robots" content="noindex"> tag on the page, so that only one variation can be indexed.
OR
You need to set up the server so that any URL with extra parameters just does a 301 redirect to the canonical form of the URL. That will help your PageRank a little too.
To recap we have had problems with the following, which is causing a large number of pages to go supplemental, primarily since April.
1) https pages been indexed for some not all pages. 301 redirect now in place.
2) Deep pages were pointing to default.htm instead of /
3) Many pages with little content
4) Many pages with similiar title or meta description tag
5) Poor inbound links
We have sorted points 1 and 2 and are in the processing of addressing 3, 4 and 5
We are an E-Commerce site, so tend to include the Brand name of the product, in the title of each page. So we may have several hundred products with the same brand name as an initial part of the title. I had thought that this could be a big problem, so was considering changing this, but I have seen many competitors web sites using the same methods and yet they are not supplemental, they also have little content on each page.
So in summary, could I be going too far trying to fix my site, when maybe I could be wasting my time, if Google has other problems, that could be impacting me.
It would be a fantastic feature of google Sitemaps, if it listed all supplemental pages and also indicated the reason why they were supplemental.
It is running phpBB (16000 posts). I had done a mod rewrite to have a search engine friendly URLs.
I figured out it was from 2 issues:
1) multiple URLS to same page
2) Same Meta Descripton on all pages
I modified the robots.txt to tell it to ignore all URLS but the "correct URL" However, the same meta description on all pages is a problem.
What I have done to try to fix it, is to added the "correct url" to the Robots.txt file, basically, telling to skip ALL files.
Next, I added a .htacess/301/redirect from the now blocked "correct URL", to a "new correct URL".
I am assuming that blocking via Robots.txt ALL the page and starting with new correct ones (meta description removed), is the fastest way to fix the problem. I was thinking of just having a redirect from the correct url to the new correct URL, without adding to RObots.txt, but I thought having all URLS blocked and starting clean was the bset.
Anyone care to comment on this this path I chose to "fix it"?
I had thought that this could be a big problem, so was considering changing this, but I have seen many competitors web sites using the same methods and yet they are not supplemental, they also have little content on each page.
What one domain can get away with doesn't necessary apply to another domain. An older, more established domain, for example, can get away with more because it has Google's trust that it's not out to spam its index. A newer site or site with other shortcomings doesn't necessarily have that luxury. In that case, I'd take a more conservative approach.
It produces results consistantly at the top of the SERP's for "exact match searches", and consistantly at the bottom of the SERP's for "broad match searches".
Is this a sign that more work has to be done on dupe content or that there is something else in play?
When google loses it's massive market share or that share dwindles then duplicate content will again be insignificant to the webmaster.
It takes boffins no time at all to introduce these kinds of problems in a search engine and if it was not duplicate content causing a problem it would be something else.
The best thing to do for yourself is to fix these problems on your site if you want Google to play right by you but at the same time you should spend some energy telling people who don't know any better to give the other search engines a try too. Some may even be converted which in the long run is good for your duplicate content problems.
google does not use meta description on internal pages, who said it was crucial?
How can I 301-redirect a non-www to www.domain for a site using shared hosting ("normal redirect statements" for that purpose in htaccess does not work as well as rewrite code)?
Or is this special task solved by the Google webmaster feature "Preferred Domain"?
Thanks
RewriteCond %{HTTP_HOST} ^maindomain.com [NC]
RewriteRule ^(.*)$ http://www.maindomain.com/$1 [L,R=301]
If you have some other domain that also needs to deliver the user to the same website then also add:
RewriteCond %{HTTP_HOST} ^otherdomain.com [NC]
RewriteRule ^(.*)$ http://www.maindomain.com/$1 [L,R=301]
RewriteCond %{HTTP_HOST} ^www.otherdomain.com [NC]
RewriteRule ^(.*)$ http://www.maindomain.com/$1 [L,R=301]
All of these redirects preserve the original folder and filename request in the redirect.
You only need their "preferred domain" tool if you cannot set up the redirect on your site. Even then I hear from some that it is not reliable.
Be aware that redirected URLs will continue to appear in the SERPs as Supplemental Results for one year after the redirect is put in place. This is normal. You cannot change that - and you don't need to. Everything is still OK if that does happen.
I cannot use redirect becouse many pages are active listings so I'm wondering if is enough to add inside the robot.txt
Disallow: /cgi-bin/
as i did?
will the 48.000 pages inside the cgi-bin directory be removed after one year? or i should do something else?
thanks in advance for your answer
Maybe only a crawl for meta title/ description content to change - so potentially days.
PR and backlinks [ 1 update only ] has occurred, i cannot verify how long it takes for results to return, but I'm hoping it will be in the next month or so. Our only "clue" is exact match results which rank, which seems to suggest to me that another BL update will help compliment the 1st update.
However, we may not be a perfect example [ I just hope we haven't missed something! ]
However, it does seem to me that Google is slowly but surely sifting this problem from their side -- at least when the dupe urls only appear in links from external domains. At least I am optimistic -- in recent weeks I've seen a few troubles sort out with no intervention from the site owners.
We have found a competitor site that offers a 'home service'. They have used the same (1500 word) article to create hundreds of pages, but each page is focused on a unique city - the only unique content on the page is the city name in the title, h1 tag, and in the content, and the page also lists a unique dealer contact info for that city's region.
Esentially, all they are changing out is the city names and contact info, the rest of the page is an exact duplicate (1500 word article!).
BUT - they rank #1 on G for almost all the locations they have created pages for when searching: "home service city" (without quotes)...
Is this white, black, or grey? - and if it's black or grey, why does Google allow this type of duplicate content? :)
Thanks in advance.
Oh yeah - they link to all these pages from their home page...
.
Whitey Re: your index page.
If it is one that does, out of hundreds that do not, it is unlikely to cause a lot of grief. The "/" will be "stronger".
However, do ask them to amend their link, and/or set up an "index to / redirect" on your site.
Sounds like quite a low quality site
quite the opposite in my experience. I watch major brands do similiar things and it appears its the trust element that allows them to get away with it. It apears the focus is on the MO of spam sites rather than duplicate content itself. Therefore it appears google is not bothered with duplciation if it trusts you to not be a spammer or low end website.
So, it seems that if G trusts you're not a spammer - you can spam.
Interesting. Seems to be verified by all the reports of large or corporate sites getting away what us smaller guys get hammered for. We're not 'known' (i.e.: trusted), thus diposable?
g1smd - I need to consider this technique for our site if it is proven to be working - but here is my moral dilemma, tell me what you think:
When a user searches 'home service cityname', and they come to my competitor's page - the page itself is useful - it tells them what they need to know about the topic, and provides a local resource they can contact for further help. So, if the page is useful, BUT the content is duplicate, other than the cityname - IS THIS REALLY SPAM?
This is killing me, because I don't want to be considered a spammer, but I want to do what's best for my business...
Is 'grey' the most beautiful color?
Firstly, a site:website.com ***keyword
It seems to reveal 2-3 commonly used words around the core term , in this case "duplicate content"
e.g. site:www.webmasterworld.com *** duplicate content
Note there are no supplementals as everything is unique, but try this on another site and see what you get.
Not only is it picking up common terms around the core term, if you try it on other sites, you may find that "commonly" used terms throughout the site are also highlighted. For example observe ISO currency code's which may be in exist on an e-commerce site.
I wonder if this means that G is accounting for the use of common terms surrounding the core term to decide if it's duplicate or not.
Secondly, a site:website.com or site:www.website.com may show pages as supplemental,
but when you try it again for a specific page e.g.
site:website.com/ABCD/ or site:www.website.com/ABC/ it may be "clear"
Is this site command [ even if it is "buggy" ] saying anything about the way Google is analysing the number of words/characters and the positioning of terms and characters it calculates to establish pages as "duplicate" and assign "supplemental" status to them?
yet for other sites a couple of pages that are say 60% identical can mean severe drops..its clearly about the type of site rather than the duplication itself...
Put it another way, is there a % of dupe content on the overall site which tips the overall balance of the site's pages in the overall rankings, and is G's filter applied to the whole site, regardless of the keyword searches chosen?