Forum Moderators: open

Message Too Old, No Replies

&filter=0

duplicate content?

         

textex

5:30 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My site is missing for some of my terms in this update.
So, I tried adding "&filter=0" to the end of search strings and low and behold my site is listed #1 for my targeted terms.

I checked some of my competition listed in the top 10 and did find one site that scraped some text from me.
What percentage is considered duplicate content? What percentage would trip this filter and why would my site be the one considered the duplicate content?

WarmGlow

11:34 pm on Jun 18, 2003 (gmt 0)

10+ Year Member



chrisnrae wrote:
So, while you can take care of the issue CAUSING the problem, taking care of the problem itself (dup penalty) does not have a solution, aside to wait until the next update, by which time, my site may hit the same issue again. Sigh.

I am also an innocent victim of the Google duplicate content filter.

My home page disappeared from the -fi data center last night when I searched for the top query that brings SE referrals to my site. My home page is returned in the 1st through 8th positions on the other eight data center serps. When I included the "&filer=0" query string in the -fi search, my home page was returned in the 7th position. I then searched -fi for an exact match of a unique keyword that I have on my home page and a listing that appeared to be my home page was returned as the only match but it has a URL for a page at a deferent domain. I went to the displayed URL and found a cached copy of my home page in a web directory that I have never before heard about.

The cached page on the directory site has the following text inserted:

(Note: This page is a text only capture of the url below - images have been stripped from it to make loading faster).
Cached page for [URL removed to comply with WebmasterWorld TOS]

They also included their own CSS on the page. They included all of my META elements within the HEAD element which includes my META robots noarchive element and my META copyright element. My original copyright information is also clearly displayed within the body text.

I could not find a contact E-Mail address on the directory web site. I used their contact input form to submit notification of their copyright violation and direct them to remove my content from their cache.

Even if I am successful in getting the directory to remove my content from their cache, the Google duplicate content filter problem will continue. Google has already indexed 23,300 cached pages that are located on the directory web site. I think that Google should take immediate action to correct the problem created by their duplicate content filter.

WarmGlow

5:55 pm on Jun 19, 2003 (gmt 0)

10+ Year Member



I see an improvement today with the duplicate content filter problem at the -fi data center. My home page, which was totally missing from the serps on -fi for the past two days, is now listed in the 8th position when I search for my relevant keywords.

Searched the web for keyword1 keyword2. Results 1 - 10 of about 2,390,000.

When I search for my unique keyword, the unauthorized cached copy on the directory site is still returned as the only match instead of my home page. I have to add the "&filter=0" query string to see my home page in the serps. It appears that the problem with the duplicate content filter has only been partially fixed.

textex

12:13 pm on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have one site that is in and out of results. I noticed my competitor (who copied my title tag and scraped lots of text from my site) is in when I'm out and vice versa.

When I am in and I type &filter=0, my site disappears and his site appears. The same occurs when he is in and I am out.

Let me note that my site is at least 1 1/2 years older than my competitor.

Anyone else see this? This filter is too sensitive. It would be very simple to drag someone into this predicament.

lk125

1:20 pm on Jun 20, 2003 (gmt 0)

10+ Year Member



I am having this issue as well with the duplicate content of my home page (ie. domain.com and www.domain.com).

I'm on a Windows server, and I have some options (below) regarding the redirect, but none seem to work very well. Can anyone offer any insight?

1. Global.asa file - The problem with this solution is it only works for server side pages. You would not have to change the file extension from .htm to .asp but you would have to map the .htm/.html extension to run in IIS like an ASP page.

2. IIS 301 Redirect - I thought this would be the solution, however, it requires two IIS accounts or two accounts on the same server. What you would have to do is in your DNS record have domain.com going to one IP (account) and www.domain.com going to another ip. Both would physically point to the same directory on the server. Then in IIS you would redirect www.domain.com to domain.com

3. make the change at the DNS record. Right now the A record for www.domain.com and domain.com point to an IP address (11.222.333.444) . This IP address points to the IIS virtual directory that points to the physical directory.

Thanks for any help.

WarmGlow

6:33 pm on Jun 20, 2003 (gmt 0)

10+ Year Member



textex,
I am also seeing the In-N-Out behavior. I agree that it has the potential to adversely affect thousands of innocent webmasters when unauthorized copies of their content are found by Google on other web sites.

WebMistress

6:54 pm on Jun 20, 2003 (gmt 0)

10+ Year Member



For 4 of my 5 main keyword phrases, I come out #1 with www.mydomain.com, where I once was only showing low in SERPS with mydomain.com. That is a huge fix. However, while all those kw phrases were at the low end of SERPs with mydomain.com, 1 other kw phrase was #2 with www.mydomain.com. When the other 4 went to #1 with www.mydomain.com. The 1 previously showing #2 with www.mydomain.com went to low in serps with mydomain.com. Hope that all made sense. At least it probably will to anyone experiencing the same thing.

Anyways, I went into my htaccess file and redirected mydomain.com to www.mydomain.com. But then I discovered a huge problem: All my guestbooks were having problems, saying no text was entered, because the .pl file that processes the guestbook entries is pointed at mydomain.com and not www.mydomain.com, so the redirect messed this up. I can't simply change the .pl file because it is used for several different subdomains individually, which would mean I have to go into each one and change it...100's...not an option. Any other suggestions of getting google to see www.mydomain.com instead of mydomain.com?

Thanks for any help...also for any similarities you're seeing that might shed some light on this duplicate content filter behavior.

polarmate

6:59 pm on Jun 20, 2003 (gmt 0)

10+ Year Member



Anyone have a clue what the other reason might be for index pages not to appear in the index? Apart from the reason that the page has an unauthorized duplicate? Another of my subpages appeared in the index just above the one listed. But my index page is nowhere to be found. It shows up with &filter=0 but the sub pages are buried. My backlinks are 3 to 4 times as many as the sites whose index pages show up on the first few pages of the SERPs. Plus my content is quite exhaustive compared to any of those sites. So any idea why?

textex

7:07 pm on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Should I make changes to my .htaccess so that mysite.com and www.mysite.com is not confusing Google, or just sit tight and hope Google figures it out?

polarmate

5:01 am on Jun 21, 2003 (gmt 0)

10+ Year Member



Now my index page does not show up at all - even with the &filter=0...
but my return policy and secure shopping guarantee page do. And they are not optimized or even in the race for representation for my main keyword phrase.

Jenstar

5:03 am on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



textex, I wouldn't always count on Google figuring it out. .htaccess seems to be the easiest route to go so that Google does figure it out ;)

mmr82

6:25 am on Jun 21, 2003 (gmt 0)

10+ Year Member



No one copied my website till now, it's unique and yet I am filtered out for one my main keywords :( should I worry am I the only one here? or all of this will be fixed soon?

Jenstar

6:34 am on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can follow the steps listed earlier in this thread if someone has copied your content.

You could wait until the update finishes to see if it "fixes" although even if the other person removes the content, you may not see the difference until the next update. GoogleGuy mentioned they are still tweaking the algo for this filter, so you may be back when the update has settled.

If it isn't related to duplicate content, there could be a wide variety of other reasons why your site isn't coming up.

mmr82

6:38 am on Jun 21, 2003 (gmt 0)

10+ Year Member



I don't think it's related to duplicate contents I've been searching and no one copied my contents.

But for one of my main key-phrases I only appear when I add "&filter=0"!

McMohan

7:36 am on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, I am afraid, I am not too well versed with duplicate content thing. But I know, &filter=0 includes the results, which otherwise Google thinks are with Duplicate content. I would appreciate if someone clarifies a simple doubt I have about this -

When I search for a particular competitive Phrase "red widgets" my site lists at #11. All the first 10 results are different sites with no duplicate content. When I apply filter, my site ranks at 9, with 2 higher ranked sites now ranked at 10 and 15 resp. When Ideally, my site should have gone down in ranks after applying the filter, why is it ranked higher?

Thanx

Mc

senkron

8:29 am on Jun 21, 2003 (gmt 0)

10+ Year Member



Another victim here :(
"blue widget" #1, "red widget" #1 still hopefully :)
But
"blue widgets" no where, with &filter=0 #1
"red widgets" no where, with &filter=0 #1
and very interesting point "widgets" #10 --> can you imagine this is not filtered. I should also tell you that "widgets" is a really competitive keyword with 14 milion results.

There should be something wrong with Google, I'm sure my index page is unique, not copied and duplicated.

Still waiting for an answer from GoogleGuy about &filter=0

I want to learn what this filter does exactly?

mmr82

8:37 am on Jun 21, 2003 (gmt 0)

10+ Year Member



I support senkron's request!

senkron

12:53 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



I support senkron's request too!
Upppssss!

Just laughing instead of crying :)
I know GoogleGuy would never answer or even he answers he would say some futuristic stories about 2020 and so.
Or maybe he would say about GoogleX, there is no search engine, there is no top ten listing, all is virtual! Take this blue pill and forget about everything or the red one for seeing ugly but real world :))

Offff!
Think about you're on a court and found guilty, but you don't know what you've done. This makes impossible to correct your fault and worse, you feel very discouraged to build new contents.
Anyway, I've put a lot of work on my site and I'll not give up before finding why!

Why Google filters a site? This is the main question and still hoping an explanation.

Net_Wizard

1:46 pm on Jun 21, 2003 (gmt 0)



I think this would answer some of the questions partially...

Queries without &filter=0...
Would return a serp with clustering turned on.

Queries with &filter=0...
Would return a serp with clustering turned off.

Why the change in ranking(sometimes drastically)?

Best described by example...

Query = widget
SiteA.com = 100 pages, all URL are titled ='Site A - Widget'
Yoursite.com = 1 page, title='yoursite - Widget', highly optimize

Query 'widget' with &filter=0

SiteA.com URLs would be spread through out this serp and ranking is based on a page per page basis thus it is highly possible that a single page Yoursite.com would come up at the top of the serp because it was optimized for 'Widget' assuming everything else are equal.

Query 'widget' without &filter=0 (clustering turned on)

SiteA.com URLs would be clustered at most to two URL. Rankings, in a manner of speaking, are now condensed for SiteA.com which would weigh a little bit more compared to your 1 page Yoursite.com even if it's optimize to 'Widget'.

Cheers

swones

2:17 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



I finally figured that my site is suffering from the duplicate content penalty, i.e. somewhere buried in the 400's without adding &filter=0 it then comes up on page 2 which for me is an improvement (was previously page 4). I've searched to try and find another site duplicating my homepage content and I can't find any. BUT what I have found is sites who link to me have taken the first paragraph of my site and used it as the link description on their site, in fact I submitted to DMOZ with that same paragraph, could this be what is tripping the dupe filter for me and others? If so it's a pretty poor situation that Google must me made aware of if they are not already. Let's hope that Google Guy is still out there and can offer some thoughts on this?

Simon.

Napoleon

2:42 pm on Jun 21, 2003 (gmt 0)



>> Is there anybody else out there who's site(s) appear when the &filter=0 is on and cannot find any other sites who seem to have copied your contect. <<

That's me! Whatever I try I can't find any duplicates... and yes, I've tried everything suggested above.

Conclusion: There are no duplicates.

So why does it appear with the &filter=0, and not without it?

Yes.... at this juncture we certainly need input from GoogleGuy, because without that we can only speculate and guess.

Something else, other than duplicate, must be being applied. Either that, or there is a problem with the duplication filter.

It's looking like this could be the cause of a pretty hefty percentage of those missing index files. It's very widespread indeed.

otnot

2:47 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



I have found that if you mydomain.com in the address bar, I find my site but with a lower PR than if I use www.mydomain.com. The SERPS that Google is displaying on it's data centers are mydomain.com and thus I think the duplicate site penalty.

senkron

3:57 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



On many topics, many people is talking about www.mysite.com and mysite.com would be accepted by Google as a duplicate.
Come on guys! A billion dollar "Algo" shouldn't do such an easy mistake! I will not accept this as a duplication unless GoogleGuy says so.
It means never :=)

What should we do to take GoogleGuy's attention here?

Napoleon

4:04 pm on Jun 21, 2003 (gmt 0)



Is this frustrating or what?

Great links (unsolicited) from all over the place, excellent unique content, squeaky clean.... and it gets nowhere on its main term.

I have spent the afternoon changing every other sentence on the front page. Great fun!

There's no logic to doing that... I can't find any duplicate anywhere... it's just that it was the only thing I COULD do!

I think there is a problem for Google lurking here. This site should rank well on any rational basis, as indeed it does with &filter=0.

The filter has netted this site, and no doubt thousands of other unique content and innocent sites as well.

I frankly have no idea where to go with this from here. Sit on my hands and hope they fix it.

mmr82

7:23 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



Napoleon, I have the exact same feelings. I'll be sitting here waiting for an input from GoogleGuy.

HayMeadows

8:44 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



Patiently waiting for an answer myself.

chrisnrae

2:08 am on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"This filter is too sensitive."

Agreed. I have seen two sites suffering dup penalty with only 2 of their 5 main paragraphs copied word for word. And I will state again how wide open this leaves the door for sabatoge by competitors. Hopefully, they are realizing the potential and current issues and tweaking away.

Rae

Stefan

3:19 am on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hopefully, they are realizing the potential and current issues and tweaking away.

Google doesn't care how our sites do; they're only in it for the money.

I would like to give Google a PR0 for having nothing but duplicate content on their site. There is nothing original to be found and what they use of ours is an unpredictable hodge-podge that apparently is dependant on the latest recommendations of their marketing department. (Geez, isn't it great to see Amazon.com doing so well in the new serps).

By the way Google, my site has a www at the start of the URL. Sorry if that's too confusing for you.

chrisnrae

3:59 am on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nope, they don't, but without good search, they won't keep searchers coming back to them. If they allow such a easy sabatoge penalty, it won't be long before good sites that ARE useful to users are targeted by competitors and filtered, causing a bottom line less fulfilling experience for the searcher. I personally am not asking google to consider my site personally. My comments are directed at the overall problems this filter, at it's current tweak, can cause

Stefan

4:06 am on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Easy sabotage indeed. I'm sure we're not the only ones who see how simple it is to eliminate other site's pages using these techniques.

WebChick

4:41 am on Jun 22, 2003 (gmt 0)

10+ Year Member



If I do a search for www.mydomain.com, the index page of my site shows up. If I click on "find web pages containing the term..." there are over 140 results, mostly other pages of my site that have been indexed. Why would I not be showing up in the SERPS? My site still doesn't show up in the SERPS with the "&filter=0" thing. A couple of months ago I had pages that ranked highly in Google and now none of them even rank. I don't understand why my site is indexed but not showing up. Googlebot spiders my site each month and is currently spidering. Any ideas?
This 114 message thread spans 4 pages: 114