Welcome to WebmasterWorld Guest from 54.196.2.131

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects continues to be an issue

     
6:23 pm on Feb 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 27, 2005
posts:93
votes: 0


recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

6:12 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1901
votes: 56


--- we can't do anything about it ---

yes we can, register 2 domain names - g-jokes dot com and so on..
make jokes, have big companies sponsor space on the site, pay with it for hosting, and most important make people fill better about what they are the best in.

The POWER of CAN DO

edit: {g-jokes dot com ] already taken...

6:40 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1901
votes: 56


It’s very simple,

I have more the $80.000 in inventory of widgets that was purchased with in last 14 month, relying on traffic from just BIG G$. If the don’t fix the ALGO, I will spent my planned budget for this year on Ads Else where.

Carl Marx said long time ago, - CAPITILIZE ON SOMEONE ELSES EFFORTS.

That simple, really.

6:43 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


Nothing wrong with Overture
7:23 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1901
votes: 56


Reid -- don't go there, i mean the topic, i am a retailer, all of them are the same.
10:44 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


>> The web server has given us a redirection but has not provided a final destination. The "Location: " HTTP header is missing.

Reid, i think they have removed the URL from the database, so their script still works, but it has no longer got a target URL to send the visitor off to. Of course, if you send the visitor "out into nothing" then there is nothing to delete. However, if it points to "nothing" then it does not point to your page :)

The script URL should return a 404 for it to be deleted. It will not be your homepage URL that is a 404, it will be the scrip URL.

boredguru, in other news i heard that you nailed it :)

How do you know if when the Gbot visits it visits thinking its fetching your domainname.com or it is thinking it is fetching hijacker.com/url.php?domainname.com .

I'm sorry i overlooked this, there's just too many threads and posts.

I don't know this - the 302 script by itself sends a referrer which is the page the script is on, not the exact script url. It only does this when accessed from the page via a click on a link, not when the script URL is accessed directly.

It's easy to show: Just put up a php/asp/cgi page that does nothing but display the referrer. Then set up a 302 redirect from some other page to this page (by means of a script). Click the URL on "the other page" and look at the referrer string the first page prints out. Then, try to enter the script link directly in the browser.

Add to this that Googlebot does not send referrer information when it fetches your pages, so there is no way we can know if it's going straight to your URL or if it's going via a redirect script.

[edited by: claus at 10:45 am (utc) on Mar. 16, 2005]

10:44 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 2, 2003
posts:3710
votes: 0


But...and this is a big But....what if Gbot does not go yipee one more new url. It already knows that the redirected url exists in its index. It just by default assigns that url to the hijackers url without doing an fetch.

That seems highly likely to me. I see no reason whatsoever why a robot should follow redirects immediately. They don't follow links immediately, they simply build a database of pages to get later. Redirects, especially ones to other domains, may well be treated the same way.

It should be possible to check the logs of sites where redirects are used to find the answer. However, bearing in mind the potential to send robots into infinite loops, I doubt redirects are followed immediately.

Kaled.

12:42 pm on Mar 16, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 13, 2005
posts:2
votes: 0


Maybe I'm missing something obvious or perhaps I've misunderstood something, but there's something I don't understand.

So... people are using new (throwaway) domains to replace the existing domains of genuine sites in the SERPs. To do this, they need to show a stronger site (PR-wise?) than the original? If so, how can the blackhats do that with brand new, weak, non-popular domains?

Sorry if I'm being ultra-dim here.

1:38 pm on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


rob_casino,

They don't have to use the throw away domains for the site receiving the jacked whatever traffic , although they can.

What you are missing is that the script allows the jacker to do any number of things.

Jacker installs script on a toss away domain and places anchors on the new sites home page pointing to his script with several hi value serp results.

Jacker sends toss away domain several inbound medium PR links. Thus starting the bots off to look at the wonderful new domain. This places the entries in the SE.

Now maybe this first round serves only to trip Googles dup content filter for just the targeted pages or to sit dormant until needed.

It does its job and the jacker is happy. Because his other clean site now has that place in the serps.

But on the off chance that isn't enough he can change how the now implanted page operates by changing a database entry on his toss away domain.

The new action is to take advantage of relative addressing and information likely to be availiable to allow more duplicate content to be inserted in googles index. This time using googlebot to walk the target site duplicating all possible pages. Now Google has several copies of most pages of your site, starting with your home page. Ask all kinds of folks on this forum or me, I just spent a bit of time dealing with that part of this. This part can be done with out a script but hey it saves seting up links by hand everytime.

If this still doesn't work for them they can resort to changing how that script works again.

This time to point links to bad places. Remember that is a script that will be run when Googlebot hits it.

The are "rumored" bad "places" on the net that Google might say you are a bad site too so down you go. Or inserts indications that you run a site that isn't suitable for younger surfers.

Remember what you see when you visit that page may or may not be what Google bot visits. It is a script that runs.

For the jacker to derive profit he needs only to remove you from placing for terms he is interested, he has many ways to do this.

And then we have the case where the object is to just see what it can do.

And we have the case where newb website owner gets a copy of the script from the internet, installs it and doesn't even know all of its ability and the script has a nice administrative backend.

Now the jacker is using newb's site.

2:30 pm on Mar 16, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 13, 2005
posts:2
votes: 0


Thank you, bear - much appreciated.
2:31 pm on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 24, 2004
posts:95
votes: 0


I got the script finally working.
What it does is gives one 301 redirect to itself and then gives a 200 ok. And then again when GBot fetches it will give 301, then again 200. No matter the time difference between each fetch. The reason i have kept it alternating is because of the previous stated reasons. Which i shall post now.
Unfortunately this script only will work on php mysql, and even more unfortunately this whole algorithm will work only on sites that use database and ssi language. Sites having static html pages with no database cannot be benefited by this, (but i think there is a workaround which will take 2 days to a weekif it is successful

Firstly

1) You will need to create a table in your database with these two fields.
i)
a) fieldname = url
b) fieldtypr = varchar
c) fieldlength = 255
d) fieldattributes = unique
ii)
a) fieldname = value
b) fieldtype = int
c) fieldlength = 2
d) fieldattributes = none.

2) Now to the code part

$url2=$_SERVER["REQUEST_URI"];

$ua=$_SERVER["HTTP_USER_AGENT"];

$ip=$_SERVER["REMOTE_ADDR"];

$url='http://www.example.com'.$url2;

if(strpos($ua,"Googlebot/2.1")===false && (strpos($ip,"216.239.")===false ¦¦ strpos($ip,"66.249.")===false) )

//if(strpos($ip,"69.36.190.")===false)

this above backslashed code wont be executed. That ip is that of searchengineworld.com. I included it for the test phase to see what where the headers being given. You can check the headers alternate between 301 and 200 at [searchengineworld.com...] for the url [example.com...] . If lots of people are checking the header simultaneously then you might not see it alternate all the time because
user 1 checks it and gets a 301
user 2 immediately or simultaneoously checks it and gets a 200
when user 1 checks it again he gets a 301.

{

your normal code goes herethe above is the normal code for your page to display.
}

else

{

$dbh = mysql_connect ('localhost', 'username', 'password');

mysql_select_db ('database name');

$sql = mysql_query("SELECT value FROM noredir WHERE url='$url'");

$value = mysql_result($sql,0);

if($value){[code]
[code]if($value==1){

$value = $value+1;

$sql1 = "UPDATE noredir SET value='$value' WHERE url='$url'";

mysql_query($sql1);

header("HTTP/1.1 301 Moved Permanently");

header("Location: ".$url);

header("Connection: close"); 

}

elseif($value==2){

$value=$value-1;

$sql1 = "UPDATE noredir SET value='$value' WHERE url='$url'";

mysql_query($sql1);

your normal code goes here Your normal code has to be added twice but is only executed once. The first time you insert it above on line 10 is executed only when it is not googlebot and it is not from the above ip(ie for all other users). But this code (the same one) is executed only if it is googlebot and the conditions are met for Gbot to get a 200 response

}

}

else{

$sql2 = "INSERT INTO noredir (url, value) VALUES ('$url', '2')";

mysql_query($sql2);

header("HTTP/1.1 301 Moved Permanently");

header("Location: ".$url);

header("Connection: close"); 

}

}

}

3) Now to the algorithm
i) check if googlebot.
ii) if no throws normal page with 200.
iii) if yes, check to see the value.
iv) if value is present
a) if value=1, then
value=value+1
301 permanent redirect to same page
b) if value=2, then
value=value-1
execute your normal code and throw your normal page.
v) if value not present
get the current url($url) and insert into the url field in the database with the 2 as the corresponding value for field value
301 redirect to the same page
vi) exit.

Everytime the value is 1 it redirects after changing the value to 2. And everytime the value is 2 it throws the normal page after changing value to 1.
So 1 = 301 &
2 = 200,
That is why while creating a new row for the new url, you have to set the value to 2 as the redirection has already happened.

As you can see the advantage is that you dont have to worry about how many pages you have. Just inserting it into your php file will add new urls dynamically so that you dont have to do the job of entering each url in the database and setting a value.
But i think using it only on your homepage is enough.

Now to the FAQ

Q1) Why so many continuous redirections? why not do it just once and leave it.
A1) How do you know if when the Gbot visits it visits thinking its fetching your domainname.com or it is thinking it is fetching hijacker.com/url.php?domainname.com .

Because when you are redirecting it, Gbot could really have come asking for yourdomain.com but the next time(more like day) it could be asking for hijacker.com webpage which it thinks has moved to your homepage.

And as your homepage will be visited more often than some page three levels deep on your hijackers site, we would be pretty lucky catching the bot at the right time to make it think that the hijackers page has moved permanently.

Now how it will work is

Day1 : Gbot asks for yourdomain.com. You redirect it once that day to yourdomain.com. No harm done today and no gain also.
Day2 : Gbot asks for yourdomain.com. You redirect it once that day to yourdomain.com. No harm done today and no gain also.
Day3 : ditto
Day4 : ditto
Day5 : ditto
Day6 : Gbot asks for yourdomain.com thinking it is fetching hijacker.com/url.php?url=yourdomain.com. Today no harm done but lots of good done.

Q2) How much will it slow down my server
A2) Not much ( and that too only in the eyes of gbot) as long as you dont plan on implementing this on all your 1000's of pages. As Gbot will request for your page mostly once every day (atmost two times the same page) it will depend on the no of pages you are planning to implement it on. If you implement it on all 10000 pages (an example) of your site, you will be facing a server slowdown which will be noticable to all when Gbot comes on a full crawl. Its mostly the homepage that gets hijacked. So implement it on your homepage alone. But otherwise its your call depending on the resources you have.

Q3) I dont use php/perl/asp etc or database. Can i implement this technique?
A3 You cant. But i have been thinking of a way. Give me 2 days to a week. Ill try and work something up as its lots of static sites that face this problem.

Q4) I use php and dynamic urls is there anything i have to add if i am implementing it sitewide?
A4) No. Eventhough you might have a 1000 urls generated by a single file, it will work as this keeps track of the urls and not how many time the file is called.
eg yourdomain.com/forum/viewtopic.php?id=34
and yourdomain.com/forum/viewtopic.php?id=55 will be recognised as different urls eventhough the call is to the same file viewtopic.php

Q5 Will you take responsibility for the code and any liabilities that occurs due to our use of your code.
A5 Nope. We are all free thinking individuals who have the rights to use what is offered or not. So though the script is free to use and implement and copy and change and (add anything else you might want to add except for taking credit for this idea)

Q6 Will it change my google rankings?
A6 Only G knows [God? Google? You select!). But this script is more of a preventive step. Lots of people have their hijackers link removed yet G shows in their index the hijackers link with almost 3 months old cache. If your hijackers cache of your site is recent and is Gbot is still crawling you, then you have a good fighting chance.

Q7 Have you tried it?
A7 My site was thankfully G (again you select) never sucessfully hijacked. Though i had lots of 302 links from allover the digital wilderness pointed at me. G removed them. But i am using this script on my homepage from the last 12 hours. Will inform if anything diabolic happens to my site.

Q8 What is the theory this is based on?
A8 On the theory that G respects HTTP protocols a lot. If it can respect 302 protocol to the letter and misjudge your page to be someone elses, then it must respect 301 protocol as much to disown all previous urls for that page and take the new url as the new destination for that page (which is the same url by the way)

Q9 Any presumptions?
A9 Yes a few.
1) G is good. All this is a glitch because of the http protocols and not for purposeful effort by G to target your page. The glitch could be because G does give more emphasis on educational and gov site where it is pretty common for a resource to be temporarily relocated somewhere else. This might seem odd to mom & pop stores on the net and small Business but it is quite common there. G has afterall come from the same terrain (edu) so it probably knows how things could be affected if it changes its ways to ignore the protocols.
2) The http protocol is not quite clear about a temporary redirect to a permanent redirect.
That is if site A 302 redirects to site B which 301 redirects to site C.
Case 1) (favourable) all previous urls are dropped and only site C is taken to be the defacto site for that content.
Case 2) (unfavourable) Content for site B can be found only at site C and update url for site B, and here is the kicker, content for site A is at site C, dont update url as site A has only temporarily stored the content at site C.
We will never know how G will react when faced with this though i think it is case 1 as the protocol states

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use the returned URI/URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
emphasis mine[1]

As you can see it does talk a lot about dropping all previous references and take the last uri as the only one. What do you think?

Q10 We are an SEO firm. Can we use this script for our clients
A10 Yup go ahead and get your clients back in the serps. But dont fleece them!
And you agree that i dont have any liabilities if you use them. Actually using them means you agree there are no liabilities.

Q11For how long should i have it
A1115 days to a months time should be enough i guess to catch Gbot when it comes asking for the hijackers url. But i will be keeping it for some more time to see if anything else happens

Q11 Cant think of anymore. if you have any, ask and i know lots of more experiences members here can answer you to your satisfaction.

[1][edited by: rogerd at 5:07 pm (utc) on Mar. 16, 2005]
[edit reason] examplified [/edit]

3:58 pm on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


boredguru,

I agree with your defense setup.

Now using a little bit of lwp magic and or file reading it might be extended to help the static folks as well.

Provided php and mysql is installed on the server.

In the spots where you say code goes here the code that went there would be file reads of the requested page by the php script along with an echo of the read content.

This is what I was calling 1 shot 301 last evening when I posted about site hardening.

Of course site design can play havoc with this. Trust me I work on a site that has evolved since 1998 and a lot of the old is still a running (but not the same way).

6:24 pm on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 4, 2002
posts:1830
votes: 3


I'm trying to reach the end of this thread (3 more pages were posted since I started reading about an hour ago--sigh!) so please excuse if this is "old" news:

Persist by contacting another similar site until you find a suitable volunteer. But like I suggested, do net accept a refusal easily, what right do they have to deny us this opportunity to excersise our ability to bring their websites to its knees. They have the reasurance also from google that nobody can affact their ranking.

Hmmmm. this sound just like the response I got from Alexa when I asked them to remove all their 302 redirects to my sites:


If you're referring to the fact that we redirect before the site leaves Alexa.com, but still deliver the visitors to the site in question, it is our right to use redirects to track where people go on our site and that
behavior will not be changed.
6:58 pm on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 27, 2005
posts:93
votes: 0


Lorel,

Alexa and similar sites that automatically add a redirect to their search results are a timb bomb waiting to explode.

""I bet no webmaster here would accept googlebot to follow that redirect""

It could herald the end of their site in google's index.

Any volunteers? It is only a link from alex's redirecting system. May even help your inbound links count.

7:04 pm on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


You guys are losing me with that code etc.
but keep up the good work.

We have caused quite a stirr this time 302 is the buzzword in SEO world right now.

That a#*$!y.com site I had trouble with - their server is down today.
I found they are doing the exact same thing to one of my clients - I was going to try a new method I learned from the other thread on this forum "not all 302's are hijackers".

Heres the trick
set up a noindex tag on the page being jacked and then use the url removal tool to remove the offending link.

I was going to try this on my clients site but the offending server is down - probably not for long though.

7:09 pm on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 27, 2005
posts:93
votes: 0


THE ARROGANCE OF ALEXA

They refuse to remove their 302 redirect to my sites.

Despite my sending them evidence that they have rank popularity pages with crawlable redirect links that will cause googlebot to create duplicate content.

This 713 message thread spans 48 pages: 713