Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects continues to be an issue

         

japanese

6:23 pm on Feb 27, 2005 (gmt 0)

10+ Year Member



recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

Bobby

12:00 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



How about a robots.txt directive for if you want to use 302s

That would be great if you could convince the hijackers to do so.

Wouldn't it be ironic if you could publish sensitive data of another web site simply by bypassing the robots.txt with a 302..?!

Reality or fiction?

kaled

12:58 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A robots.txt directive would work - I think I suggested it myself in a previous mega-thread (or perhaps I just thought it).

The trick is that the 302 is ignored unless accepted by a directive in the robots.txt file of the target domain.

Kaled.

claus

1:33 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is very possible to find examples involving big name brand domains in the SERPs (where a page on the big name brand domain has the wrong URL listed). Just don't post them here, though, it would be against TOS (and remember that "the hijacker" probably don't even know about this, and certainly doesn't have malicious intent).

>> robots.txt

There's a problem here with vanity domains, parked domains and such. Anyway, the biggest problem is another: If you don't know about this you have implicitly said "yes", and afaik, the majority out there just don't know about this and never will.

Bobby

1:54 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



The trick is that the 302 is ignored unless accepted by a directive in the robots.txt file of the target domain

kaled, please explain how this would work.
I presume it means Google would have to implement a change in the way their bot gathers information.

To me it looks like Google can't figure out where the target really is, at least in my case.
The redirect string is so long that it gets 'forgotten' and simply attributes the content of the target frame to the hijacker's dynamic page.

I posted an example of the script at the beginning of the "Lost in Google" thread which got edited out (in spite of the fact that I didn't mention any specific domains), I can sticky you the source code to show you what I mean.

kaled

4:46 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, Google would have to implement a change in their system. However, it allows for the legitimate use of 302 redirects, etc. Ultimately, all other solutions will require such redirects to be ignored entirely (by search engines) or ignored if they cross domains.
Treating redirects as simple links should be ok.

Personally, I can see no problem with indexing the urls that actually deliver the content - to hell with redirects. However, I appreciate others may disagree with this sentiment.

Kaled.

sunzon

5:26 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



Maybe we should distinguish between cause, effect, and method more clearly.
Treating the method (302) as a blackbox for definition in a footnote, helps keep the focus.

*Google has a method to avoid duplicate content in serps, in itself a commendable objective.
*If Google has to choose between 2 apparent identical sites, it chooses the one with the highest PageRank and pushes the other "duplicate" site out of or way down the serps.
*PageRank is a Google method to determine highest relevance in serps.
*Webmasters well understand they should not submit duplicate content to Google, or face the consequences.

The problem:
Google's duplicate content filter can create an unfair choice of which of 2 duplicate sites is the most relevant.
Unfair happens when someone other than the webmaster, termed a hijacker, using 302 redirects to the webmaster's website, creates duplicate content in Google, which triggers Google to select the hijackers website and push down or out the webmaster's website from Google serps.

Problem explanation:
The terms "unfair" and "hijacker" are used above, because it is completely beyond the control of the webmaster, and Google's methods are allowing it to happen.

Footnote:
The legitimate 302 method used by hijackers works like this........bla bla......

crobb305

5:54 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



*If Google has to choose between 2 apparent identical sites, it chooses the one with the highest PageRank and pushes the other "duplicate" site out of or way down the serps.
*PageRank is a Google method to determine highest relevance in serps.

A page of mine that was hijacked was/is a PR7! While the offending urls were PR 2 through PR6. The PageRank system is NOT without flaw, nor is it used to weed out duplicates (from my observations).

C

sunzon

6:18 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



Crobb305,

Either way, the point is the duplicate content filter "Lets the best page win", the hijacker strategy depends on its page winning.
If we could just understand fully how that duplicate filter works.......... but that's part of the blackbox......

zeus

6:35 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Im on crobbs side here,it has absolute nothing to do with the PR, if a link(302) to you has been created and google has made it as a site(google bug) then the other site slowly looses its PR to 0 and the other sometimes get a higer PR - my site got its PR back for about a month or 2 ago, still NO changes in the googlejacker situation and the hijacker still has a good time.

stargeek

6:49 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



i'm seeing a hijacked site back in google.com

and gone again.

[edited by: stargeek at 6:51 pm (utc) on Mar. 14, 2005]

sunzon

6:49 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



Granted, we all know high PR is not necessarily highest rank in serps.
Whatever way Google uses to select the winner for duplicate content, that is where hijackers are successful.

stargeek

7:18 pm on Mar 14, 2005 (gmt 0)

10+ Year Member



hijacked websites, back in.

seems like an update or something is a brewing.

kaled

7:23 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only reliable way to determine which is the ligitimate page (or preferred page) of two duplicates is to compare indexation dates. If page a.html was first indexed in Jan 2001, it should be preferred to the page aa.html first indexed in March 2004, etc.

However, Google does not seems to store this information, and so is in all sorts of trouble as a result.

For the record, I am less than convinced that duplicate content algos have any relevance to Googlejacking.

Kaled.

crobb305

7:26 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the point is the duplicate content filter "Lets the best page win"

Again, you are incorrect. The "best page" should be the original author, always. Scraper directory sites and hijacking urls that come along and use my content should NOT outrank me.

I am not sure which side of the fence you are on. Sounds like you are arguing against the actions of the hijackers, yet you are claiming Google's method of removing "duplicate" content is correct/flawless. Above, you say that Google lets the "best page win". Are you saying the "Best page" is the hijackers url? Clarifiy what you are arguing and stop trying to kiss up to Google while siimultaneously condemning the actions of others/hijackers, etc.

crobb305

7:33 pm on Mar 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only reliable way to determine which is the ligitimate page (or preferred page) of two duplicates is to compare indexation dates. If page a.html was first indexed in Jan 2001, it should be preferred to the page aa.html first indexed in March 2004, etc.

I like the idea others have proposed...the development of a redirect metatag. If you do not want any other url to redirect to your site and outrank you because of it, maybe a tag along the lines of

<meta name="redirection" content="noredirect">

would help Google determine which page is the intended/original and to NOT allow anything redirecting to that page to be listed in front. If the author of the page is setting up legitimate 302 redirects, then they could set that tag to "redirect" which further authorizes the search engines to use any and all 302s accordingly.

This 713 message thread spans 48 pages: 713