Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects continues to be an issue

         

japanese

6:23 pm on Feb 27, 2005 (gmt 0)

10+ Year Member



recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

twist

1:36 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Alright, looking around at some random htaccess examples I noticed one possible(?) solution although I wouldn't know where to begin to create the code for it.

Rough example (don't actually use anybody),

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{If no TIME_MIN at end of url}
RweriteRule {Append TIME_MIN to end of url}

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{If TIME_MIN on url == current TIME_MIN}
RewriteRule ^(.*)$ [example.com...] [R=permanent,L]

Then remove the appended TIME_MIN in a php script.

You could of course set it for 5 or 10 seconds instead of a full minute.

boredguru

1:45 am on Mar 16, 2005 (gmt 0)

10+ Year Member



RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{If no TIME_MIN at end of url}
RweriteRule {Append TIME_MIN to end of url}

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{If TIME_MIN on url == current TIME_MIN}
RewriteRule ^(.*)$ [example.com...] [R=permanent,L]

Then remove the appended TIME_MIN in a php script.

You could of course set it for 5 or 10 seconds instead of a full minute.

Here is the problem that i see with this.
How do you plan to remove the appended time in a php script. Lets suppose the script removes the time from the url and redirects (it will have to, there no other way if you want to remove the appended time) to the original url, the code in the htaccess again catched it without the time and enter the time and sends it to the php script whhich again....... you get the drift.

Maybe i missed something. If i wrong, i love getting corrected.

Trawler

1:48 am on Mar 16, 2005 (gmt 0)

10+ Year Member



Boredguru:

But...and this is a big But....what if Gbot does not go yipee one more new url. It already knows that the redirected url exists in its index. It just by default assigns that url to the hijackers url without doing an fetch.
_____________

Sorry to shoot that down but,

Gbot does fetch new data at the target and immediatly indexes it under the "302e's) url with an updated cache date. It is routine.

boredguru

1:52 am on Mar 16, 2005 (gmt 0)

10+ Year Member



If that is true Trawler and you are a beautiful babe, then I love you! Else thanks.

PS: Been writing lots of if loops lately

twist

2:36 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do you plan to remove the appended time in a php script. Lets suppose the script removes the time from the url and redirects (it will have to, there no other way if you want to remove the appended time) to the original url, the code in the htaccess again catched it without the time and enter the time and sends it to the php script whhich again....... you get the drift.

How about this for an approach,

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{THE_REQUEST}!^[A-Z]{3,9}\ /.*googletime.*$
RewriteCond %{THE_REQUEST}!^[A-Z]{3,9}\ /.*\?.*$
RewriteRule {Append "?googletime={TIME_MIN}" to url} [L]

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{THE_REQUEST}!^[A-Z]{3,9}\ /.*googletime.*$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\?.*$
RewriteRule {Append "&googletime={TIME_MIN}" to url} [L]

RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{If TIME_MIN on url == current TIME_MIN}
RewriteRule ^(.*)$ [example.com...] [R=permanent,L]

In php just check for variable and pass it along from page to page,

if(!empty( $_GET[ 'googletime' ] ) { pass it along so googlebot wont get stuck in a loop again }

Once again, have no idea if it will work but maybe it will spark an idea in someone smarter than I who can create something that will work.

Reid

2:38 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok here are some detailed results of my homepage problem.Where a#*$!y is a bogus name for the directory (which is owned by tucows by the way according to whois)

Here is the SERP from site:mysite (this is my homepage)
MY TITLE
MY TITLE. Snippet from page.............
www.axxxy.com/cgi/axxxy/go.cgi?id=175653 - 7k - Supplemental Result - Cached - Similar pages

Cached is older page from Nov 1 (very different since then) Ive changed it like 4 times since then .. google is not updating this cache and has been crawling my site every day.

They removed my link but have not associated anything else with this id# when you click it you get a 404 page on their site.

I ran it through a server header checker result:
Domain [axxxy.com...]
IP Address www.axxxy.com/cgi/axxxy/go.cgi?id=175653
Server Location
Host Name
Server Type Apache/1.3.33 Sun Cobalt (Unix) Chili!Soft-ASP/3.6.2 mod_ssl/2.8.22 OpenSSL/0.9.7e PHP/4.3.10 mod_auth_pam_external/0.1 FrontPage/4.0.4.3 mod_perl/1.29

I ran it through a page header checker result:
Page [axxxy.com...]
Response 302 Found
Last Modified No data returned
Content Type text/html; charset=iso-8859-1
Last Cached (Google) 1 Nov 2004 12:26:10 GMT

I go to google URL removal tool and get this message:
remove url check 'anything assiciated with this url'
response:
The web server has given us a redirection but has not provided a final destination. The "Location: " HTTP header is missing.

tried again but checked 'cached version only' this time.
same response.

Reid

2:48 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Im not even sure what I want this directory to do.
It seems they have left the page assosiated with my id# intact but have removed my url from it.
If they point it at a 404 won't google then see my homepage as 404?

Reid

2:53 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh yea my real homepage also appears in site:mysite but further down the list - looks all good (updated march 8 2005)

edit in - another thing....
since the last googledance (last week) all my descriptions have changed fom snippets to my actual META descriptions - even on the few 'supplemental results' except this bogus one - still showing snippet from cached page.
The link was removed after the descriptions changed

another thing.... when I click similar pges on my bogus homepage link it show 28 directories with the oe in question at #1 when I click 'similar pages' on THAT link....exact same results.

Reid

3:44 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sorry Boredguru access logs from nov 2004 are gone (begins on nov 15) gone to never never land.

When they removed that link my traffic almost doubled instantly - getting double the amount of robots too.

coincidence or related?

jk3210

3:49 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you type in the original url (http://www.a#*$!y.com/cgi/a#*$!y/go.cgi?id=175653 ) into your browser's address window does your page come up?

If it returns a 404, then you can delete it via the url console. If it returns your page, you're out of luck.

Reid

4:04 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When I typr that url into my browser (IE6) i get IE error page "the page cannot be displayed"

"Cannot find server or DNS Error
Internet Explorer "

Reid

5:14 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting - very interesting
Ok I did site:a#*$!y.com 175653 (thats my id#)

I get 2 pages


A#*$!Y - Search Engine Directory
A#*$!Y SEARCH engine, Domain Name Search, Whois World Wide Search and Free URL Submit.
www.axxxy.com/cgi/axxxy/ reviews.cgi?id=175653&cid=1096 - 22k - Supplemental Result - Cached - Similar pages

the link itsef is dead like the other but the cache is a voting page with a thumbnail of my current homepage up to date, almost a framed copy of alexa.

the other one - same thing (exact same title and description just like thousands of others only different id's) but different directory with a picture of my homepage it's even got that alexa graph thing 'not in the top 100' its not voting its info about my link.

heres the real kicker;
on both of those pages - MY page title (which is an active link) points to the same url of the link in site:mysite

blend27

5:50 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I dont speak PHP, nor Jane or Billy. Should I read more? or go back to HTML Forum?

[webmasterworld.com...]

walkman

6:08 am on Mar 16, 2005 (gmt 0)



so after 580+ posts:

we can't do anything about it. Google and MSN have to step up to the plate and fix this.

Reid

6:10 am on Mar 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



read more blend - it's not all php. go back few pages to get the picture.

here is something else very very interesting about a#*$!y.com

I found six documents all exactly the same but with one difference each ponts to some poor joe that they must hate.

here is what it looks like in site:a#*$!y.com

[code}
302 Found
Found. The document has moved here.
www.example.com/cgi/example/go.cgi?id=91523 - 1k - Cached - Similar pages
[/code]

when you go to the page it just says the document has moved here. the word here is a real link to some poor shmuck. actually the link goes to 'page cannot be displayed' in IE but the cached copy is viewable

[edited by: rogerd at 5:02 pm (utc) on Mar. 16, 2005]
[edit reason] examplified [/edit]

This 713 message thread spans 48 pages: 713