Forum Moderators: open

Message Too Old, No Replies

Incorrect URLs and Mirror URLs

Causing duplication penalties.

         

crobb305

12:39 am on Nov 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has indexed numerous incorrect URLs and mirror URLs all pointing to my index page. Subsequently, the original URL (www.mydomain.com) has been suppressed to the bottom of the results for any search (presumably a duplication penalty). This problem was also mentioned in message 11 of the following thread:

[webmasterworld.com...]

The URLs pertaining to my website that all point to my index page take the following form.

www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?SID=xRSUNVW8R9P44HSYQ6UWED&
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/default.asp?S=AC3&am
www.some-other-URL.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-2.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-3.com/file/callink.php?linkid=3

I have emailed google, but have received no reply. I am unsure what I can do to A) eliminate the incorrect URL's that appear to originate from my site and B) eliminate the mirror URLs that originate from unrelated websites.

Any help would be greatly appreciated.

Rick_M

5:44 pm on Dec 16, 2004 (gmt 0)

10+ Year Member



Interesting thread. On my main domain that got hit Sept 22 (which still has not returned), I just found 20 copies of my index page with urls like:
www.mydomain.com/?session=fjdsaklfjdfda

They are in the supplemental index, and cache date of 1969. I've just added those URL's to my robots.txt to be disallowed and submitted to the url removal tool. I also notice a few other domains that redirect to my site in the supplemental index, but obviously I can't do anything about those. Finally, I see that my domain name without www is listed - I suppose I should read the threads on what people recommend doing for that.

I'd be very surprised if any of this is the reason my rankings dropped Sept 22, but I'd be happy if removing those URL's returns the rankings nonetheless.

addendum:
when you search for the name of my site, it ranked #1 before Sept 22, then afterwards has ranked anywhere from 6th to 25th for the site name - today, if I add &filter=0 to the search string, I'm first. Not sure if that means anything or not.

Spine

6:58 pm on Dec 16, 2004 (gmt 0)

10+ Year Member



Sounds very similar to me Rick. I dropped Sept 23rd, and found recently that I had 2 different kinds of dup problems, one that google 'invented' overnight, and one that was very much my fault.

I used the removal page around Dec 1st, and most of the stuff was removed within a day. A directory I had somehow submitted for removal twice, once as /directory/ and once as /directory took a while before it was removed. Exactly 7 days after that was gone, my site is back at #1 for it's index page and 'obvious' search term.

Inner pages are still hit and miss, but better than before for sure.

Not out of the woods yet, but being back where I was still #1 with &filter=0 is nice.

Variable

12:48 am on Dec 21, 2004 (gmt 0)

10+ Year Member



Ok, here's where I am at...

I've implemented a dynamic robots.txt that puts up two different robots.txt based on the subdomain:

www.example.com/robots.txt ---> accept robots.txt
valid.example.com/robots.txt ---> accept robots.txt

invalid.example.com/robots.txt ---> deny robots.txt

where the deny robots.txt is simply:

User-agent: *
Disallow: /

As far as the behavior for the index.htm of each subdomain, I've implemented the following:

www.example.com ---> valid main domain index.htm
valid.example.com ---> valid subdomain index.htm

invalid.example.com ---> 404

I'm not sure if it's better to do a 404 on an invalid subdomain or a 301 redirect to the main domain. My hope is the 404 with a deny all robots.txt will get rid of my invalid subdomain problem.

Unfortunately, I have so many invalid subdomains in Google's index that I can't use their tool to remove all of them. I guess I'll just have to wait to see if they get removed through subsequent crawls.

Also, I tried using an absolute URL in a robots.txt and both the validator and Google's removal tool choked on it.

walkman

2:10 am on Dec 21, 2004 (gmt 0)



"I've implemented a dynamic robots.txt that puts up two different robots.txt based on the subdomain"

from MY experience, Google takes a long time on 404s and 301s. Removing them, no matter how much of a paine is better

crobb305

2:34 am on Dec 22, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One week ago, Google rep escalated my problem to engineers. Since then, the number of redirects and tracker2 urls to my site have doubled. The site:mysite.com search shows an increasing number of spammy domains being incorrectly linked to my site. I am flabbergasted at how this problem is expanding with no resolution. Absolutely amazed.

c

walkman

3:06 am on Dec 22, 2004 (gmt 0)



"One week ago, Google rep escalated my problem to engineers.."

how does that work? I mean how were you able to do that?

crobb305

3:08 am on Dec 22, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Walkman,

There was a request by Googleguy to send examples using a special keyword in the email subject line. See message 336 of the following thread:

[webmasterworld.com...]

walkman

3:32 am on Dec 22, 2004 (gmt 0)



"See message 336 of the following thread"

yeah, I posted it ;). I hope they're working on it at least

rocco

4:16 am on Dec 22, 2004 (gmt 0)

10+ Year Member



sorry, post deleted

papamaku

9:57 pm on Dec 28, 2004 (gmt 0)

10+ Year Member



walkman - what was the email + subject line needed for reporting 302s etc?

walkman

10:29 pm on Dec 28, 2004 (gmt 0)



sent it via PM. Not sure if could post a link to another forum here.

crobb305

1:16 am on Dec 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got a nice reply from google telling me that they are investigating this issue. This is the most promising info I have heard. Could take some time, but at least I got a response ;)

walkman

1:35 am on Dec 29, 2004 (gmt 0)



"I got a nice reply from google telling me that they are investigating this issue. This is the most promising info I have heard. Could take some time, but at least I got a response "

easy there...not letting you escape with just a brief comment ;). Can you please share more either here or via PM?

thanks,

zeus

1:51 am on Dec 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Walkman I got that to from Googleguy, but no changes here still another url taking over the search for www.mydomain.com and a lot of suplemental results, lost of indexed pages -80%, still redirecting sites listed under a site:mydomain , inurl:mydomain searches. So Google have done nothing yet and I first got hit by this in Nov.3, but I know many got hit a lot earlier and still nothing.

A lot of this has to do with redirecting, issent cloaking some kind of redirecting?, so why dont they just bane redirecting meta. there is no use for that any way for a longer term.

crobb305

1:58 am on Dec 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reply said to be assured they are "investigating the issue."

C

energylevel

2:34 am on Dec 29, 2004 (gmt 0)

10+ Year Member



crobb305 ... please keep us updated ... I am still seeing several redirects to my site in the allinurl results and my site has not recovered to any degree since it dropped dramatically in Google search results, I sent an email to Google in the same fashion as ouy but didn't get anything back that I can recall?

crobb305

2:46 am on Dec 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Energylevel,

What happens if you do a site:mydomain.com search? Numerous redirects are being incorrectly tied to my site evident in this search.

crobb305

5:14 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have an update to the redirect problem as it pertains to my site. I have been able to get some of the webmasters to remove the links that are getting indexed by Google, and attributed to my site (showing in a site:mysite.com search).

Furthermore, the remaining redirect sites that are showing in the site:mysite.com search are cached November 2, whereas my own pages are cached earlier this week. This MAY indicate that Google is correcting the problem (if the algorithm is no longer requesting those pages). Perhaps another update or two will show them gone! Just speculation :)

I suppose it may only indicate that those pages were deemed mirrors/redirects, and not worthy of a recrawl. Fingers crossed for the best.

energylevel

5:40 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



crobb305 .. are you seeing any improvement in you position in google search resuolts as a result?

crobb305

5:46 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Energylevel:

Not yet. About a week ago I created a robots.txt to remove clutter from the Google index (old, outdated, incorrect, and duplicate urls). Google has removed those and recrawled my site. Unfortunately, the internal pages are still indexed with url only, and the home page is not showing at all in a site:mysite.com search. The redirects are still there, as I mentioned, with cache of Nov 2.

So for now, I am not showing any improvement in my position, but I did get my inbound links back this week!

I am also finding that the inurl: command may not be working normally on some datacenters. I have been periodically monitoring the inurl:tracker2.php for changes in the way those urls were indexed. Recently I noticed the urls listed by url only (possibly the result of penalty/action by Google). Now when I run the search inurl:tracker2.php, I get a 403 Forbidden Access page from Google when I click past the first page. Very odd to me. Wonder whats up with that.
C

crobb305

7:12 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Above, I posted:

I am also finding that the inurl: command may not be working normally on some datacenters. I have been periodically monitoring the inurl:tracker2.php for changes in the way those urls were indexed. Recently I noticed the urls listed by url only (possibly the result of penalty/action by Google). Now when I run the search inurl:tracker2.php, I get a 403 Forbidden Access page from Google when I click past the first page. Very odd to me. Wonder whats up with that.

Additional Info: The Forbidden Access page is telling me "... we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected." Again, this message appears only when I click pages beyond the first page of serps for the inurl:tracker2.php search. All other inurl: searches I have tested (for comparison) are performing normal.

My laptop is brand new (just got it yesterday), so there is no virus causing "repeated requests" for that particular search. And, I myself have only attempted that search two or three times and I get the same result from another computer. I wonder what, if anything, it may signify w.r.t. the future of tracker2.php/redirects/hijacking.

Lorel

7:42 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"... we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected."

I tried the same search on a Mac (immune to PC viruses) and got the same message.

crobb305

7:49 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lorel,

Do you get that message when you try the inurl: search for other terms? I don't. Of equal interest to me is the fact that the datacenters serving even one page of inurl:tracker2.php results are showing the listings as url only.

energylevel

9:18 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



I see the same as you guys .... something else I noticed was I was that I am being automatically redirected to google.co.uk when I type google.com in my address bar (I use google.co.uk more often that not as I'm in UK) .. so I thought maybe this was being done with a cookie based on my usual Google usage.

I cleared my cookies and it still happens .. so is this redirect being done based on my IP address then?

Lorel

9:47 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Do you get that message when you try the inurl: search for other terms? I don't.

No. but now I see a bunch more redirects to my site. Where's my dust pan? Women's work is never done!

crobb305

10:39 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No. but now I see a bunch more redirects to my site. Where's my dust pan? Women's work is never done!

Google clearly has a mess on their hands. Because, I had convinced myself that the redirects containing the tracker2 were the ones responsible for the removal of my index page. However, today I discovered that a simple search of www.mydomain.com indeed reveals my title and description, but if you mouse over the url, or click the "Google Cache" link it is apparent that this url is very simple redirect of the following form:

[some-other-site...]

No tracker2! Google actually thinks my original url was replaced by this one. Of course, searching site:mydomain.com reveals all the other redirects that Google thinks is mine. As I mentioned, those have a Nov. 2 cache date. Maybe they will go away soon. No new redirects have been tied to my site in the past week or so.

AlexK

1:28 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here`s an interesting comparison (search performed on g.co.uk):

allinurl: index.php (any other than 1st page) =>

"... we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected."

allinurl: index.html => normal result

Yup, Google is well buggered (as we say in Yorkshire).

energylevel: the redirect to G.co.uk is also normal for myself and is, AFAIK, based on IP geo-location.

crobb305

2:35 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep...all forms of inurl:example.php searches are showing this Forbidden Access Message.

Indeed, inurl:index.php -and- inurl:go.php both show the error. Among others. Not sure what the significance is yet.

crobb305

3:54 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



UPDATE:

Seeing major changes on 216.239.39.104 with respect to redirects to my site. There are no redirects remaining when I do a site:mysite.com search. Furthermore, the tracker2.php urls are listed with urls only...no title/descriptions pulled from the sites they hijacked. The tracker2.php urls that once were showing in site:mysite.com search are still in the index, just no longer tied to my site. I think we are seeing some changes in the right direction with respect to this hijacking fiasco.

Chris

Marval

4:26 am on Jan 4, 2005 (gmt 0)

10+ Year Member



Crobb - or anyone else that has been experiencing this problem - are you also seeing the &filter=0 diffeerence in results with your site?
I am also one of the people that reported a site of mine through the special email subject and got a reply that the msg was being directed to the senior levels - but still seeing the hijacks, redirects and filter=0 problem as well. I have taken the stance that a similar problem occured last year(2003) and one earlier in 2004 where after a few weeks the problem went away, but I sure would like to see this resolved once and for all.

Forgot to mention - I was also able to get one url removed with the DMCA process and the results posted on chillingeffects but it looks like one of the others is still there although it was contained in the DMCA as well - of course the host took the pages down as well, but dont know how much this feeds into this?

This 172 message thread spans 6 pages: 172