Forum Moderators: open

Message Too Old, No Replies

Locating duplicate content / pages

Different results for filter=0 and standard search

         

jrokesmith

5:30 pm on Sep 5, 2003 (gmt 0)

10+ Year Member



I have a site that appears to be downgraded in the SEP for duplicate content. I tested this by appending the &filter=0 to the search string. Question - Does anyone know a good way to determine which pages are duplicated or are causing the problem? We recently altered the navigation stucture and after looking around, we found some duplicate pages, but the site has many pages and it is difficult to track down all of them. Does anyone know a good (quick) way to do this?

SlyOldDog

7:30 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why don't you load your site onto a computer and rank the pages by file size? If you are lucky the duplicates will show themselves ;)

killroy

7:53 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sicne all my sites are completely dynamically generated, Even I often don't know what urls there are.

The esiest is to leave it to the spiders and users to find out.

So I regularly rebuild url lists of my site from the logs, has helped me and I'm pretty much guaranteed to catch everythng.

SN

Jenstar

8:04 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The duplicate content filter doesn't work just for pages within your own site, it also takes into account pages on other domains as well.

If someone has stolen your content, you could be penalized with the duplicate content, even though you are the rightful owner, and yours appeared first. In theory, Google should discount the newest duplicate page. But this is not what I have seen in practice since the new duplicate filter was introduced a couple of months ago.

Another possibility is if some of your content is taken from another site - ie. product description pages etc. This could also trigger the duplicate content filter.

LukeC

10:22 pm on Sep 7, 2003 (gmt 0)

10+ Year Member



Stupid question no doubt, but can you get penalised for duplicate content within a site, for instance you could have two pages:

Page 1 - Guide to Red Widget Shops
Page 2 - Guide to Blue Widget Shops

If you have one shop that falls into both categories and you carry the same description on each page is this dodgy.

Or are we talking about straightforward copying other peoples content?

SlyOldDog

10:32 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



LukeC

You do get penalized within your own site, but it doesn't matter. All that happens is Google nominates one page as good and the rest duplicates. This means users still find your good page.

If you have similar content I haven't seen that penalized.

[edited by: SlyOldDog at 10:35 pm (utc) on Sep. 7, 2003]

SlyOldDog

10:33 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



oops - stupid thing double posted. Sorry

Jenstar

10:39 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This means users still find your good page

Unfortunately, Google might think "Guide to Blue Widget Shops" is the good page, while you think "Guide to Red Widget Shops" is the good one ;)

SlyOldDog

11:01 pm on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it's quite interesting to search your own domain for duplicate content.

I've just done a search and about 20% of our pages are deemed duplicate by Google. That seems high and I'm pretty sure we don't have any on our site, although most of our pages are similar because we have the same content in several languages.

So maybe you do get good content dropped after all.

AnonyMouse

12:34 am on Sep 8, 2003 (gmt 0)

10+ Year Member



Locating duplicate content sounds like a good thing, but I'm not clear as to how to do it, and what the results mean - let alone how to do it in my own site.

COuld you please give me a quick-start guide?!

jrokesmith

5:52 pm on Sep 9, 2003 (gmt 0)

10+ Year Member



Jenstar, you have raised an important point. The best way I know to find duplicate content on our site is to check each file. Problem is this is not systematic enough, quick enough, and I probably miss some dupes. As Jenstar wrote, there may be a site or sites that are out there duplicating our content that we don't even know about. In this case, our log files would not be suffcient. What I am trying to find is an automated tool or a way (preferably using google since the duplicate content filter is google's in the first place) to locate which pages or sets of pages are triggering the duplicate content filter.

Jenstar

6:19 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a number of pages that get ripped off more than others. What I do is find a unique 6 or 7 word phrase within the content - usually in the second half of the article, because if copyright infringers do make any changes, it will be in the first couple of paragraphs before they get lazy. I make sure not to use a sentence with any sort of branding names in it, because those will often get changed to the infringer's own brand names.

I then plug it into Google with " " around it and then click the filter link to bring up all the results, and see what you find. The results will often surprise you.

I am not aware of a tool to do this automatically. Perhaps someone has done something with the Google API, because automated queries are not permitted otherwise.

SlyOldDog

6:30 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One thing I noticed is if I search google for allinurl:www.mysite.com I find several other sites which have downloaded my whole site and display it as a subfolder of their own. e.g. www.otherdoman.com/www.mydomain.com/etc.

This is close to being duplicate content (apart from their own header they display above my page).

I don't really know what these sites are for, but they must be creating a lot of duplicate content.

It's a jungle out there!

Jenstar

6:34 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That will definitely trigger the duplicate content trigger. I have seen plenty of evidence where the same main content/article, but with different headers, footers and side nav bar, will still trigger the duplicate content filter at Google. This never used to be the case, but with their new dup filter we began seeing a couple of months ago, this has been happening.

oodlum

3:47 am on Sep 10, 2003 (gmt 0)

10+ Year Member



I've just done a search and about 20% of our pages are deemed duplicate by Google.

What is the best way to determin this? Is it just allinurl:www.domain.com? When I do this only 2 URLs show, and then the old

"In order to show you the most relevant results, we have omitted some entries very similar to the 2 already displayed."

Surely this doesn't mean Google considers almost all of my site to be duplicate?

or should I just do a

site:www.domain.com keyword

to determin what pages if any are considered duplicates by Google for that keyword?

Thanks

SlyOldDog

7:22 pm on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



deleted

[edited by: SlyOldDog at 7:26 pm (utc) on Sep. 10, 2003]

SlyOldDog

7:26 pm on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Oodlum

Just do the allinurl:mydomain.com search and check how many results get returned (the number on the top right corner of the screen.

Then go to the navigation bar and add &filter=0 onto the end of the existing query.

Now you may get a different number of results returned. The difference between the first number and the second is pages Google thinks are duplicate.

AnonyMouse

1:34 am on Sep 11, 2003 (gmt 0)

10+ Year Member



Here's a weird one - I've seen a couple of sites where that number goes DOWN not up...how can that be? Add in the duplicates, and you get FEWER pages?!

oodlum

2:25 am on Sep 11, 2003 (gmt 0)

10+ Year Member



SlyOldDog

Thanks very much.