Forum Moderators: open

Message Too Old, No Replies

Incorrect URLs and Mirror URLs

Causing duplication penalties.

         

crobb305

12:39 am on Nov 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has indexed numerous incorrect URLs and mirror URLs all pointing to my index page. Subsequently, the original URL (www.mydomain.com) has been suppressed to the bottom of the results for any search (presumably a duplication penalty). This problem was also mentioned in message 11 of the following thread:

[webmasterworld.com...]

The URLs pertaining to my website that all point to my index page take the following form.

www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/?SID=xRSUNVW8R9P44HSYQ6UWED&
www.mydomain.com/?S=AC3%26Document=document
www.mydomain.com/default.asp?S=AC3&am
www.some-other-URL.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-2.com/go.php?id=aHR0cDovL3d3dy5jcmVkaXRjaGFtcGlvbi5jb20v
www.some-other-URL-3.com/file/callink.php?linkid=3

I have emailed google, but have received no reply. I am unsure what I can do to A) eliminate the incorrect URL's that appear to originate from my site and B) eliminate the mirror URLs that originate from unrelated websites.

Any help would be greatly appreciated.

crobb305

5:00 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Marval,

Have you tried searching site:yoursite.com on the datacenter I mentioned above to see if there are any positive changes? Also, take a look at the inurl:tracker2.php search results. I am encouraged by the fact that the urls containing tracker2.php once were indexed using the title/descriptions from the pages they hijacked. But on that datacenter (and a few others), they are now listed as url only. We are seeing baby steps. But they are steps in the right direction.

kwasher

5:29 am on Jan 4, 2005 (gmt 0)

10+ Year Member



Would you say these people have hijacked webmasterworlds web site? Type webmasterworld into the search, and then click one of the links. They've framed webmasterworld into their own site. And trapped all the outgoing links so they redirect back into their site. Is this really ok to do?

[look4ithere.com...]

crobb305

7:16 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Type webmasterworld into the search

There isn't anything wrong with 'webmasterworld' appearing on other sites, or in the urls of other webpages (as file names, perhaps). The problem lies in Google's abilitity to distinguish between someone else's url and your own. The problem has been that the site:mysite.com search has been showing many unrelated redirect urls as being part of a particular site. The site: command is designed to return just the pages truly associated with a particular site. In my case, when I ran the site: search over the past 6 months, there were over 20 redirect urls (increasing weekly), none of which I owned or created, being tied to my site. Google would ultimately penalize the original site as the algorithm must believe that the one person owns all of those spammy urls and is trying to spam the engine.

C

Marval

12:05 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



Crobb - it does seem that that datacenter has a little better results although I still see at least 6 pages still using a go.php to imitate my site.
Of course - now I also see an additional problem that has just been introduced in the last 24 hours - the dreaded no www before the URL of the index page - up until yesterday the site command and all of the search results showed the URL with the www in fromt - today on that new datacenter its showing a url without the www, and the cached page is from a week ago (which on the other data centers shows a cache page of yesterday) and what makes it worse, the page they have in the cache isnt the one I had on that date a week ago (I only know because I decorate the site for the Holidays)

kwasher

1:31 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



You missed the point. These people are taking all of our sites and framing them into THEIR site, and trapping all the links.

You can view everything on webmasterworld, without ever going to webmasterworld.

Do a search there on any site and then try the link for the site and you will see what you mean. Please dont dismiss this without even looking.

A quote from a friend: "The difference is that they aren't caching the pages or making their results spiderable so it appears that it's their content."

p.s.
This site is now the ONLY site that shows up when I look for my site in google. Otherwise, it is totally gone.

energylevel

1:57 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



Marval .. what's the significance of the www. missing from the URL?

MissusC

2:48 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



My site is experiencing the same thing. Suppression in the bottom results of Google since December 16th. AND I noticed in Google results I have two home pages listed. One with the www and one without. Can someone please tell me what is the significance of this and what can I do about it?
Thanks for ANY help.

Marval

3:30 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



There are a number of threads here about the splitting of the domains with and without the www. in the url - basically it boils down to Google thinking that they are two different pages and ranking one over the other as a dupe page. It can occur for a number of reasons including server setup incorrect, backlinks to both urls, and Google having trouble with distinguishing that they are the same.

kwasher

3:38 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



Subdomains are treated as seperate domains.

And www.domain.com is a subdomain of domain.com

Potentially resulting in two pages with identical content.

There are some threads here on using mod_rewrite to redirect one to the other.

[webmasterworld.com...]

crobb305

6:39 pm on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



kwasher,

My apologies if I misunderstood you. We are all in this together, and I certainly understand your frustration. Hell I started this thread two months ago. LOL

I, too, have the 'www' missing when I search for my site. As I mentioned before, some of the redirects that were being incorrectly connected to my site are gradually disappearing. Still, if I do a specific search for my site using the 'www', I get a redirect site, cached as my own, and indexed in Google with full title and description. Google actually thinks I have replaced my original domain/url with these silly redirects. Fortunately, that particular redirect was the result of an honest link a webmaster gave me a couple of months ago. He agreed to remove the link, which has ultimately left a 404-Page-Not-Found error. I was then able to submit the url through Google yesterday for removal, which normally takes 24 hours using the Google URL removal tool.

Google's notion that these redirects were designed by myself to replace my own, original domain is absolutely ludicrous given that the original domain is 4 years old and has an order-of-magnitude more backlinks (which should signify it's importance/legitimacy as the 'original'). There is clearly some very flawed logic in Google's algorithm(s).

crobb305

1:39 am on Jan 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The positive changes I saw yesterday are intermittent today, and do not appear on any of the known datacenters. They ocassionally appear from a difft center.

Incidentally,
I have read that it is possible to block redirects via htaccess. Is this advisable? I have identified the ip block from which the tracker2s to my site are coming.

I may make this question a new post.

AlexK

2:40 am on Jan 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



kwasher:
(http://www.look4ithere.com/) They've framed webmasterworld into their own site. And trapped all the outgoing links so they redirect back into their site.
...
You missed the point. These people are taking all of our sites and framing them into THEIR site, and trapping all the links.

The first point (the framing) is accurate but I cannot agree with the second (trapping all the links). It is simple HTML4 frames:

<FRAMESET ROWS="100,*"...

<FRAME NAME="BANNER" SRC="http://www.look4ithere...
<FRAME NAME="MAIN" SRC="http://www.webmasterworld...


Thus, all normal links retain the frame structure.

I tried it with my site. Every link (html anchor) on my site contains a

target='_top'
, which causes a break-out from any framing (this is not a new issue). I checked, and it works fine. There is no "trapping all the links" occurring other than standard Frames behaviour. This is the format to follow if you want to make use of it:
    <a href='http ...' target='_top'>the link text</a>

One thing I did see very recently that I thought was odd was a use of javascript to effect a link:

    <javascript document.location='http ...'>

(just one line on the page. Very odd, and I cannot see the point.

guddu

3:40 am on Jan 5, 2005 (gmt 0)

10+ Year Member



I am facing a major problem with one of my client's website.

Duplicate URLs contain %09 in query strings and Google as termed all such pages as SUPPLEMENTAL PAGES.

www.domainname.com/dir/filename.aspx?varid=%0924

When you click on the above URL, it opens the same page as:

www.domainname.com/dir/filename.aspx?varid=24

These cannot be removed through Google removal tool as they "still exist".

These duplicate URLs are creating major problem for us as no new content is being indexed by Google and only the home page is being refreshed by Google.

Is there a way to remove these duplicate URLs.

please.....

crobb305

3:55 am on Jan 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guddu,

Are the urls generated from your domain, just indexed incorrectly?

If so, you can solve the problem the way I did. You are correct that Google will not remove the urls because they "still exist". However, they WILL remove the urls if you create a robots.txt file then go back to the Google URL removal tool and submit the url of the robots.txt file. They will remove them within 24 hours, typically. Mine were gone in 3 hours. You may already understand Robots.txt, but I had to learn real quick last month. Be careful not to disallow files that you DO want indexed. If you see specific incorrect urls being indexed, I would just disallow the files you see, followed by a slash:

User-agent: *

Disallow: /dir/filename.aspx?varid=24/

Leaving the slash off at the end is an implied wildcard, and you will end up disallowing everything in the /dir/filename.aspx?varid= realm.

guddu

4:11 am on Jan 5, 2005 (gmt 0)

10+ Year Member



Thanks crobb305 for a quick answer

User-agent: *

Disallow: /dir/filename.aspx?varid=24/

But don't you think it would also remove the original file

www.domainname.com/dir/filename.aspx?varid=24

crobb305

4:20 am on Jan 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I used the wrong path in my example. In your case, you would want to

disallow: /dir/filename.aspx?varid=%0924/

You are correct, the bad example I used above would disallow the file that you DO want indexed.

Chris

kwasher

4:33 am on Jan 5, 2005 (gmt 0)

10+ Year Member



CROBB - They make the content of your site appear as their own. When I search for my site in G, the only listing of 1500 that shows up, is theirs, like this...

www.theirdomain.com/rd/results/rdq_them/www.mydomain.com/ f=searchemall_

The landing page contains my site in their frame.

Sorry, I thought this was another example of what you were talking about.

guddu

4:33 am on Jan 5, 2005 (gmt 0)

10+ Year Member



Thanks

I would do as you suggest and keep my fingers crossed for Googlebot crawl

Wish you a very Happy New Year.

idoc

2:57 pm on Jan 5, 2005 (gmt 0)

10+ Year Member



"modern day SEO is also about quality webmastering and not just page optimisation and link building. You need to write water scripts, have correct server configurations, accurate usage of mod_rewrite etc to make sure that a site can not be manipulated to make it appear like there is a lot of duplication"

or... you could make the point that modern day SEO is really about using the above defined methods... redirect scripts and other black hat means to duplicate existing well written content and you need to be defensive these days just to have a presence on the web. SEO used to be an art... it was about finesse. Now it is something else altogether.

walkman

4:03 pm on Jan 5, 2005 (gmt 0)



"modern day SEO is also about quality webmastering and not just page optimisation and link building. You need to write water scripts, have correct server configurations, accurate usage of mod_rewrite etc to make sure that a site can not be manipulated to make it appear like there is a lot of duplication"

I assume you've done all that. Wanna bet that anyone can take down your perfect SEOed site with just a few 302 redirects and all you can do is sit and watch as your rankings=0?

crobb305

3:50 am on Jan 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The tracker2.php urls are disappearing, and are being devalued. A site:myite.com for my domain shows no more of these urls. I think the next page rank update and next significant serps modification will reflect this. Inurl:tracker2.php now shows about 120,000 urls, indexed with no title or desc. A month ago, this number was over 400,000.

Fingers crossed.

energylevel

10:02 am on Jan 7, 2005 (gmt 0)

10+ Year Member



crobb305 ... another thing I have noticed is that the error message you get when trying to go betyond the first page of search results for inurl:tracker2.php is also happening now for other commonly used script names used for redirects for example try inurl:goto.php .. I bet there are many more, maybe this is an indication that Google is trying to address redirect issues right across the board

zeus

10:42 am on Jan 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry to say this but I get 411.000 sites which include tracker2.php and that code is not the only one on the market, so nothing has changed.

hm when I wanted to click to page to I got this from Google:

We're sorry...
... but we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected.

We'll restore your access as quickly as possible, so try again soon. In the meantime, you might want to run a virus checker or spyware remover to make sure that your computer is free of viruses and other spurious software.

We apologize for the inconvenience, and hope we'll see you again on Google.

Im almost 100% nothing is wrong with my computer and when I try a new search nothing is wrong.

energylevel

10:52 am on Jan 7, 2005 (gmt 0)

10+ Year Member



Hi Zeus ....that's the error that's appearoing now for many inurl: searches for coomeomnly used scripts for redirects not just the Tracker2 one...

zeus

10:57 am on Jan 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



that is a little interesting, maybe they are realy making a cleaning, but lets stay on ground, we have seen some changes before and then nothing happen.

Localizer

3:03 pm on Jan 7, 2005 (gmt 0)

10+ Year Member



Hi guys,

I decided to remove some of my links without title and description.

I just reviewed the robots.txt material and have 1 question left.

How should I set up the robotfile when I want to remove files from a subdomain?

Let's say I have the following structure:

black.widget.com/small
black.widget.com/medium
black.widget.com/big

I want to keep the medium index, but remove all pages in the medium dir.

Should I set up as follow?

User-agent: Googlebot
Disallow: /black.widget.com/medium/

So...how to cope with a subdomain?

TIA

nolen1

6:08 am on Jan 8, 2005 (gmt 0)

10+ Year Member



The tracker2.php urls that were listed when I use the allinurl command are gone but others like
www.theirdirectory.com/resources/callink.php?linkid=2162 and www.theirdirectory.com/Link.cfm?ListingID=2538917 remain. I sure hope Google gets rid of all of the 302 redirects and not just tracker2.php ones.

walkman

2:51 pm on Jan 8, 2005 (gmt 0)



suppose Google removes all these. Are there any dupe penalties that last for months or do they disappear as soon as the "dupe" page is removed?

nolen1

10:26 pm on Jan 8, 2005 (gmt 0)

10+ Year Member



The duplicate penalties should be removed. I had this happen to a site in September. I exchanged links with a site not knowing they were using 302 redirects. My site was replaced by theirs in the serps. When I saw my site listed at their url I went to their site looking for contact info and saw that I could modify the link. Two weeks after I removed the link their site was gone and mine was back at it's original positions.

crobb305

11:15 pm on Jan 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nolen, that is exactly my problem right now. On some datacenters, the tracker2.php urls that once pointed to my index page are gone. Now, only one redirect remains when I search site:mysite.com. It is a 302 from someone who was simply trying to link to me. I contacted them and that link was deleted and I submitted to Google's url removal tool about 3 days ago. I am waiting for it to be gone and hopefully my site will soon reappear. Right now, searching for www.mysite.com shows that sinking 302.
This 172 message thread spans 6 pages: 172