Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Blogspot duplicate content - I expected Google to be smarter than this

         

chrisv1963

3:23 pm on Mar 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since Panda I have been searching intensively for duplicate content. It is astonishing to see the amount of articles that have been copied + placed on blogs. Mainly Blogspot and Onsugar.

One example: My text + image and the texts + images stolen from other websites are posted on a Blogspot blog. That same blog is copied more than 20 times => More than 20 new blogs with exactly the same group of stolen images and texts (from my site and the other sites). Everything is nicely indexed by Google.

The algo should be able to detect this and remove the duplicate Blogspot blogs from the index.
Blogspot should be able to detect duplicate blogs, do a manual check and delete them.

I checked 30 articles. More than 20 were copied and posted on Blogspot. Google, please do something about this!

goodroi

4:40 pm on Mar 29, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Being indexed and actually ranking and driving traffic are very different things.

Google does not like duplicate content and they try their best to deal with it. When you have a site like blogspot and it has over 600 million pages (according to a site: search) it is not easy to do manual checks.

How would you code an automated crawler than visits billions of webpages every month to distinguish between sites that are duplicate content vs sites that are quoting some original content and then discussing it? I do not envy Google.

FranticFish

5:10 pm on Mar 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google might make their life a bit easier if they sorted out the Blogger CMS system.

As standard each post appears:
- in full on each tag page,
- in full on each month page,
- in full on its own page, and
- in full on the blog's home page.

That's a minimum of four urls per post for every article.

chrisv1963

5:15 pm on Mar 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How would you code an automated crawler than visits billions of webpages every month to distinguish between sites that are duplicate content vs sites that are quoting some original content and then discussing it?


I was actually trying to make a point about duplicate blogs (a number of blogs with exactly the same content - copies of entire blogs - maybe created with an auto-blogger or something?), not about duplicate quotes.

koan

8:03 pm on Mar 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The fact that blogger/blogspot do not have a standard way to contact the owner of a blog, like their email address or a contact form, just makes it more aggravating. How can you have a web site and no contact page? The only people I know who do this are plagiarists.