Forum Moderators: open
For example two sites have identical content what determines which content is original? Is it the first site to be indexed? This seems somewhat arbitrary to me.
Also what about if I have a site which provides info about widgets and the site has a different page for purchasing widgets in each state. Each page would have identical content (widget product description) and the only difference would be the state name along with the contact info. From what I've read many pages would either be penalized or dropped from the google's index due to the high degree of duplicate content. But in reality each page is unique and would be useful for people in different geographic locations.
In the end I'm hoping someone can explain to me why i'm wrong or if not tell why google would be so arbitrary?
If this is true I think it really encourages 'bad behavior' so to speak, because it allows people to copy one's competition content which hasn't been indexed yet and present it in a more search engine friendly way. I assume you can magine all the compications which might arise from this.
I originally wanted a specific ".com" domain name however it was not available. I purchased the ".net" variety and began to build a web site, waiting for the ".com" to become avalable...
During this time I knew nothing about search engine submissions so the site went un-submitted...(8 years ago)
(there is a point to all this...bear with me...) 8-)
All of the back end web issues were handled by the ".net" domain (mail, forms, etc).
I purchased the ".com" domain when it became available, and configured it to point to my current (.net)web site. By this time I was starting to submit my site to everywhere...I decided that the ".com" was what I really wanted and began to submit it...
so far...everything is great...my listings are happening....(I'm on my way to fame and fortune...lol).
(the point is coming......) 8-)
My Error: hardcoding a link to my forms so that they were always sent users to the ".net" domain..(my forms only worked if they were submitted from my original ".net" domain name...)(this is before I knew about mods and such)
The Point:When I get indexed as ".com" the SE located my internal link to my form page in the ".net" domain, and began to index the ".net" web site along with the ".com" site...IDENTICAL CONTENT....
My current situation, is as follows:
Google shows ".com" results...If I search for ".net" I show a result....but only if I look for it...Google decided duplicate content fell on the ".com" side...and only shows ".com" results...
Yahoo shows ".net" results...likewise on the search for ".com"..but only diplays ".net" for searches..Yahoo decided that duplicate content fell on the ".net" side
It's deffinately something to consider when linking the 2 sites together. You may not achieve what you're looking for, and you may just kill one of the domains...
Just my 2-cents...
I'm just suprised they don't have some more sophisticated way of dealing with it
But the current setup suits the SEs' purposes -- to present non-duplicate results in response to a searcher's query.
The SE doesn't really care which domain the page is on, just that it (at least somewhat) answered the user's question.
How does blogging software avoid this issue of duplicate content?
Supposedly, Google is able to first strip out all the shared information between pages. The duplicate content comparison is only for the 'content' portion of a page.
But then this might be more Google hot air. I've a site with hundreds of totally unique articles -- but most of them do not show up in a site: search unless you select "repeat search with ommitted results included". I'm guessing that the shared page template confuses Google into thinking that the content is similar. If so, then Google's shared-element removal algo is really weak and many innocent pages are going to be eliminated as similar content.
Perhaps this is why all those auto-generated spam sites hitting high in the SERPs have virtually no page structure. Purely unique "content".
Supposedly, Google is able to first strip out all the shared information between pages. The duplicate content comparison is only for the 'content' portion of a page.
But that's my point about blogs, it is the content that is duplicated, on the main page and in the archives. It should be identical. And this is standard on every blog, so how does that not raise the duplicate content penalty?
It is nearly impossible to have a dupe beat the original content because the original hasn't been indexed. Almost all new pages are found within 0-48hrs of being linked too.
Even if the 2nd page is indexed first, there is still the opportunity for the original to win based upon significantly higher pr.
Using the web archive I could prove that a site ripped of a 300 word paragraph 6 months after I first published it but because I amend the templates of my site often it was my page dropped by google. My page had higher PR and my site was running for longer.
I sent a cease and desist letter to the offending site and they have deleted the duplicate content but my page is still not in the SERPS two weeks later.
Surely google could run an auto enquiry of the web archive pages and check the older incarnation and penalise newer pages with a dupedump.
For example -- if a variety of sites have the same basic product information (e.g. technical specifications), but they all present a very different commentary surrounding the same products, are they still clumped together, with the first indexed or higher PR version coming out on top?
And if significant duplicate content *does* effectively penalize ranking of an entire site, what might the threshold be at which point you don't warrant a rank?
I just am trying to establish the fact that whichever content is indexed first is what google considers original and that they don't check the date that the content was published...
I have a question on the duplicate content that I just discovered.
My site (www.mysite.com) is index by Google, but I have dropped from # 2-3 to about 9 and down to 12 on my 2 word keyword phrase. Held the #2-3 position for years. Am still 1-2 in MSN, Yahoo, Ask Jeeves, etc.
I found out today there are pages named www.mysite.net, and www2.mysite.com that bring up my exact contant home page but still show thier URL in the address bar (mysite.net, etc.)
Neither of these duplicate pages have any PR or BL.
Am I being penalized by google?
They file/saved our ENTIRE website. They then changed our company name to a ficticous name. They left everything else the same! They left our phone numbers, address, etc. The WHOIS for the domain was fake. It was pretty clear that someone was trying to get US penalized as, even to me, it did look like we were trying to put another site up and monopolize the rankings.
QUESTION: CAN THIS REALLY HURT US? WHAT IF WE DIDNT FIND THIS SITE? Could our site, which has had top 5 organic rankings for years really be banned due to a bad persons attempts to defraud us?
Needless to say, we did traceroutes and found it to be a local hosting company (SURPRISE!) and the site was shut down within 24 hrs.
It is nearly impossible to have a dupe beat the original content because the original hasn't been indexed. Almost all new pages are found within 0-48hrs of being linked too.
So only if the person who created the site or who wrote the content on a "homepages" site understands SEO and the importance of linking quickly to avoid duplicate content filters will they "qualify" for first consideration? Seems a bit of a hair-brained detection scheme to me, or is "unrealistic expectations" a better phrase?
Even if the 2nd page is indexed first, there is still the opportunity for the original to win based upon significantly higher pr.
Again, this means that the person with the better understand of SEO (which of course would be the spammer 99% of the time) wins.
But the current setup suits the SEs' purposes -- to present non-duplicate results in response to a searcher's query.
It "suits" them? Since when can they makes decisions on what "suits" them. How about "law-suits", do they suit them? Google lost many cases where they allowed competitors to advertise in Adwords using trademarked names. If they promote competitors who have stolen copywrited content or simply kept some trademarked names on the page by "penalising" the real owners then Google will surely be held accountable? If they want to go down this route then they either have to make sure it works 100% of the time or just forget it and let the results fight each other, then it's not their problem.
I have learn't the hard way not to update pages.
My post has already gotten too long, can anyone else field this one?
>>Every blog has content on their blog home page. That exact content is duplicated in the archives pages.
Not totally exact, but close enough. It depends on the software. Some entries are archived on individual pages and some in groups of posts.
It's not handled, it's kind of dealt with. I put some pages up on a blog and let them just roll naturally as is, to see what happens. It's a total mess with the duplicates, a lot go URL only.
It isn't a penalty as far as I'm concerned, it's just purging the index and not fully indexing extraneous pages that add nothing to the value of the index. With a blog, I don't see how they'd know which are the correct pages to fully index - they can't unless something is indicated server-side.
I have somehow managed to create a number of links which have at least one upper-case character. When one of these is clicked, all links begin displaying the upper-case character. I don't know but I assume this is a Cold Fusion issue.
I am concerned that Google thinks my site is saturated in duplicate content because of this. If so, this might be why it was sandboxed.
Should I change all links to lower-case, or would that be a waste of time? Do I run the risk of losing PR by halving my (incorrectly perceived) number of pages?