Forum Moderators: open

Message Too Old, No Replies

Incorrect URLs and Mirror URLs

Causing duplication penalties.

         

crobb305

12:39 am on Nov 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has indexed numerous incorrect URLs and mirror URLs all pointing to my index page. Subsequently, the original URL (www.mydomain.com) has been suppressed to the bottom of the results for any search (presumably a duplication penalty). This problem was also mentioned in message 11 of the following thread:

[webmasterworld.com...]

The URLs pertaining to my website that all point to my index page take the following form.

www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?SID=xRSUNVW8R9P44HSYQ6UWED&
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/default.asp?S=AC3&am
www.some-other-URL.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-2.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-3.com/file/callink.php?linkid=3

I have emailed google, but have received no reply. I am unsure what I can do to A) eliminate the incorrect URL's that appear to originate from my site and B) eliminate the mirror URLs that originate from unrelated websites.

Any help would be greatly appreciated.

zeus

11:28 am on Jan 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One thing that is a little a pain is that everyday I get 1-2 hits from google where they have come from my old rankings and keyword search. In the start I thought YES things is getting better, but when it happens everyday I know its just a joke.

Mostly it was from google india, malasia or other asian, but today I got a single hit from co.uk.

crobb305

11:44 pm on Jan 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A few days ago, I had an optimistic attitude as I was seeing some positive changes at Google. Since then, those changes have disappeared, and the number of tracker2.php links to my site have increased. Google is so stale its ridiculous. I like the idea someone proposed a while back about issuing a press release. Afterall, the general public and their investors should know how vulnerable their search results are to malicious sabotage/hijacking.

walkman

11:54 pm on Jan 11, 2005 (gmt 0)



I think Google completely forgot about or ignored (most likely scenario) this. GB now barely gets any pages from my site. They can't say they don't know either becuase many people have sent them examples.

[edited by: walkman at 12:37 am (utc) on Jan. 12, 2005]

crobb305

11:56 pm on Jan 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When Yahoo realized this was a problem early in 2004, they got it resolved in one month!

At the very least they could hard code the alg to ignore tracker2's. Granted, new scripts will arise but it would be a good short-term solution.

Google is being been beaten by spammers. How embarrassing.

zeus

12:47 am on Jan 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Crobb the press release thing was from me, I do a lot of trading on the market and any realeses are watched carefully by investors, If I was investing in Google and I was told that witha simple redireting you could steal others PR plus let the other site drop out of the search results I would drop the stock in a minute and Google has a P/E of about 230 so it is very unstable for any bad news and of cause MSN in 2006.

Just wait until next update you will see more of this hijacking/redirecting, because it is so easy and legal.

I also was a Google fan, but I always was afried it would change as soon they went on the market and yes it changed now I use yahoo and wisenut.

P.s still have the adsense active

crobb305

1:37 am on Jan 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes it is a serious issue for investors. Some may argue that investors are making money from profits earned through Adsense, etc, so what does the hijacking of a single website have to do with anything? It displays the gross incompetence that Google seems to possess right now. Google is incapable of solving these problems within a REASONABLE amount of time (its been over a year!). If webmasters can manipulate Google serps so easily by creating simple redirects, then Google is clearly an unstable, incompetent entity and long term investments may be in jeopardy.

Zeus, I agree with you... lets wait to see if a new update comes soon.

C

Shurik

2:32 am on Jan 12, 2005 (gmt 0)

10+ Year Member



Just received a reply from Google for my request to remove a 302 link to my site. It appears they didn't even read what i wrote them. They just quoted “crawler-friendly” guidelines and stated that "...there is almost nothing a competitor can do to harm your ranking..." It just made me laugh :)

crobb305

3:01 am on Jan 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just proves my point. Investors should take notice and be very cautious. Google is obviously incompetent and incapable of resolving these types of technical issues...and they cleary don't care.

kwasher

9:02 pm on Jan 14, 2005 (gmt 0)

10+ Year Member



I was able to 'steal' one of my own sites right out of google.

walkman

9:15 pm on Jan 14, 2005 (gmt 0)



"I was able to 'steal' one of my own sites right out of google."

what do you mean? Got yourself penalized or?

eyezshine

10:26 pm on Jan 14, 2005 (gmt 0)

10+ Year Member



When you email google they will send you a canned response. So you got to keep replying 2-3 times until they do something about it.

Just annoy them and it will get fixed for your site.

Marshall Clark

10:56 pm on Jan 25, 2005 (gmt 0)

10+ Year Member



Google does seem to be moving very slowly to fix this problem but there's something that we can do in the meantime to both remove the hijacking sites from the index and make Google aware of the extent of this situation.

One of my sites has been hijacked by several spam websites. These sites use a variation of the tracker/go.php redirect script. Many have also scraped a snippet of text from the page content of my website.

Now I'm no lawyer - but based on what I know of intellectual property law this use of my content constitutes illegal use of copyrighted material. Under the DMCA Safe Harbor Provisions online service providers (Google) are free from liability when their customers engage in copyright violation only if:

- They set up an agent to deal with copyright complaints

- After being notified of copyright infringement they expeditiously remove or disable access to the material

What this means is that Google must respond to notifications of copyright infringement and, if positive proof is given of the infringement, must act to immediately remove the infringing material from their index.

This is not something they can write an algorithm for - this is manual work done by real, breathing, salary collecting Googleplex employees. I've got a list of several dozen scraped/redirected sites infringing on my material and I bet the other WebmasterWorld hijackees can come up with several thousand more.

All those Safe Harbor notices would probably get Google's attention pretty quick:
[google.com...]

BTW - there are serious penalties for filing a fraudulent DMCA notice so do your own research and make certain that infringement is in fact occurring before filing a notice.

Here's some additional info on the DMCA Safe Harbor Provisions:
[chillingeffects.org...]

londoh

11:10 pm on Jan 25, 2005 (gmt 0)

10+ Year Member



After being notified of copyright infringement they expeditiously remove or disable access to the material

anybody know how to define 'expeditiously' in this respect?

I've reported several 302 redirects where google is caching my page against the the redirecting url.

I've reported these as DMCA violations and requested they remove the cached pages but so far after 3 weeks they are still showing.

Shurik

11:14 pm on Jan 25, 2005 (gmt 0)

10+ Year Member



As one 302 hijacker, who was kind enough to respond to my emails put it, the content of the redirected pages does not physically resides on his web server and therefore does not constitute copyright infringement. He also added that he cannot be liable for google's screw-ups.

londoh

11:34 pm on Jan 25, 2005 (gmt 0)

10+ Year Member




in my case any email addresses I can find to the jackers sites just bounce so that approach doesnt work.

But google's cache of my page showing against the jackers url is on google's server.

As I understand it that situation is indeed google's problem and I've asked google to deal with it

but so far without success

Marval

11:50 pm on Jan 25, 2005 (gmt 0)

10+ Year Member



In my experience, having actually filed the paperwork per Googles guidelines for DMCA, it takes about 2 weeks for the site to be pulled from the SERPs once you fax it to the correct Google number

Marshall Clark

12:14 am on Jan 26, 2005 (gmt 0)

10+ Year Member



It may take a bit of time to get the hijacked sites removed but that's only part of the point.

A backlog of DMCA notices caused by a faulty Google algo may get the attention of Google where us asking them nicely to fix the problem hasn't.

AlexK

1:42 am on Jan 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Recently discovered that the reason for the 302's is that PHP is configured to auto-send a 302 when a "Location:" header is sent (see PHP header docs [uk.php.net]):

The second special case is the "Location:" header. Not only does it send this header back to the browser, but it also returns a REDIRECT (302) status code to the browser unless some 3xx status code has already been set.

Looking at the rfc (HTTP/1.1, update to RFC 2068), it also seems to be the best, general default response (see 302 Found [salemioche.com]):

Status Code 302 Found: The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.

The above may well be old hat to many, but for me answered for the first time why so many of these re-direction sites send 302 status codes. It is not some kind of evil conspiracy, simply that they do not alter PHP's default behaviour when using a

Location:
redirect header.

eyezshine

7:04 am on Jan 26, 2005 (gmt 0)

10+ Year Member



the 302 redirect's are causing a duplicate penalty for our sites though. I have added random text and random links to my pages and my site's have begun to come back into the results.

Everything is becoming stable again like it used to be before the 302 redirect problem. It hasn't totally went back to the way it was before but my traffic is slowly coming back.

I think by the next update it should come back alot more and I will let you know what happens.

Tallon

1:35 am on Jan 28, 2005 (gmt 0)

10+ Year Member



An update from my earlier post:

Since reading this thread and looking at my linking practices, I've noticed a few things.

#1. Two sites with a fairly high amount of php redirect/tracking links (over a 100) no longer show the php redirect link when I do:

site:www.mydomain.com in Google

Previously the links would show in the results. No title, no description, just the php redirect script. Although I noticed the links showing in the results, I never thought much of it or paid much attention to it. I don't think the links ever showed up in Yahoo or MSN. They don't now if they did. Sites that only have a handful of php redirect links still show in the results for that query.

One more site that had a handful of php redirect links no longer shows them in the results for a site:www.mydomain.com in Google. A 4th site I'm watching still has them. It seems this 'fix', if that's what it is, isn't being applied straight across the board. All four sites use the same php script.

I decided to wait and watch the site linking to me with a cgi scripted link - it still shows up as a result in site:www.mysite.com

crobb305

2:02 am on Jan 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A lot of people have noticed that the damaging rediects often contained "tracker2.php". It is interesting how only 62,000 or so tracker2.php urls remain in the Google index. 2 months ago, this number was greater than 400,000. So something is being done, albeit very slowly.

inbound

2:32 am on Jan 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a disturbing story.

One of my sites has 126 legitimate php-powered redirects (allows tracking of outbound traffic which is sent to partner sites)

The problem is that my robots file forbids all robots from any php file AND specifically lists every url that is used as a link tracker, this was done to stop them indexing what they think is a page.

Guess how many of these 'pages' are indexed by google?

...ALL of them, it does better at indexing them than the rest of the site!

Now it's really easy to find these pages as they are uniquely named as uniquebutdescriptivestring.php

MSN and Yahoo manage to not have a single one in their indices, plus they do a better job of finding more pages on the site. MSN has 100% of pages and Yahoo has 25% more than Google.

Looks as though I'll be getting emails from unhappy partners if the indexed redirects start acting like the spam 302s

If it does effect my partners I will consider a legal route but will probably get nowhere

This 172 message thread spans 6 pages: 172