Welcome to WebmasterWorld Guest from 54.162.139.105

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects continues to be an issue

     
6:23 pm on Feb 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 27, 2005
posts:93
votes: 0


recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

8:51 pm on Mar 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


> how many big sites do you know that have url the length of your hand, and that which keep changing

Those sites tend to have an enormous amount of link power, continuously pointed at one of those URLs. I tend to assume that Google's decision is easier in those cases.

11:05 pm on Mar 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


I'm getting nervous now. After reading those earlier threads. After getting a few of those 302 links to my site removed.

Here is a quote from a post last september in another webmsterworld thread

They took over a week to answer my first email. I sent it to webmaster@google.com and the replies were coming from help@google.com. I tried a different address for them and the reply still came from help@google.com. I implored them to please refer my questions to somebody higher up and put ATTN:Googleguy in the message title. I started getting responses from googlebot@google.com.
I can tell you that the responses make me want to laugh, cry, and scream at the same time.

In the meantime, my hijacked index page has moved up from number seven to number three in the SERPS even though they did remove the redirect and the link now goes to a 404 error page.

11:21 pm on Mar 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


twist and others, here's an .htaccess code you could use:

-------------------------
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
-------------------------

Make it eg. the first rule (or the last rule) in your .htaccess file. It does this:

If Googlebot requests a file (any file), redirect that request one time only to the exact same URL with a 301 status code, and do no more. What happens after this is that Googlebot will get the file with a code "200 OK" or whatever code your webserver would otherwise throw at it (eg. if it's a dead link it will of course get a 404). [NC] means that the spelling of "GooGLeBot" is not case sensitive.

It also makes sure that Googlebot will always just see the domain with "www." in front of it (if you don't want this, just remove "www." from the rule).

This way, each and every URL that Googlebot requests will get some sort of "extra verification stamp" saying "the right URL for the file you requested is the same URL as the one you used"

(actually it says: "the URL you requested has been moved permanently to the exact same place - ie. to the location you already requested once". So, if there were no hijackers this would be pure nonsense. The "www." part adds a small bit of real and useful functionality.)

It is a bit similar to the (second part of the) method posted by boredguru, but it does not change any URLs and it does not use 302 status codes, so it will not create extra duplicate content for you.

>> slashdot

yeah, i noticed that ;) Too bad the slashdot crowd only need to see the word "adult" one time in an article to be talking about pr0n for hours. However, the point was picked up after a few screens of off-topic posts.

[edited by: claus at 11:33 pm (utc) on Mar. 15, 2005]

11:23 pm on Mar 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


Googleguy is here to help with the small things website/google related and a big thanks for that.

Respect, Zeus, but I'm unsure of what that help could be. Other than advice on whether I should post a pic of my dog on the site, I don't know what might be forthcoming. It ain't like the good old days when one could almost believe they meant that, "Do no evil", stuff, (and GG would actually check on obvious injustices). IPO or not, there's little credibilty left if they can't even comment on this.

11:55 pm on Mar 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


Thanks Claus - that sounds like an excellent solution.

My only worry now is that one link that I requested a directory to remove, it was hijacking my homepage and they removed it at my request but when i went to the URL removal tool it responds "www(dot)othersite.com/go.php?id=58585 returns 302 found but the the HTTP response header is empty"
In other words they removed my link but the php redirect sends to an empty url - resolves to a 404 on their server but google can't seem to figure that out.
I e-mailed them back but they don't seem to care anymore.

12:04 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


Reid, did you check that URL with a server header checker? It is more important that the script url itself returns a 404 than it is to get the link removed from a physical page.
12:07 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


It should also be noted Claus that would cancel out any META refresh 302 pages on your own site as well.
12:09 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


Claus,

Have you tested that rewrite setup?

I added that to my .htaccess file on a test site and I get a redirect loop detected .... but the site has tons of things that get remaped ....

I also changed the agent to test .... and I have a tool that allows ua changes ...

12:15 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


>> you tested that rewrite setup?

theBear i have read ears. No i did not - i just wrote it like that.

Don't use it. I'm sorry, it will loop of course. Back to the drawing board.

12:33 am on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 24, 2004
posts:95
votes: 0


Claus
I think this could be it. But only one doubt (which i posted elsewhere and you did not notice i guess)

How do you know if when the Gbot visits it visits thinking its fetching your domainname.com or it is thinking it is fetching hijacker.com/url.php?domainname.com .

Because when you are redirecting it Gbot could really have come asking for yourdomain.com but the next time(more like day) it could be asking for hijacker.com webpage which it thinks has moved to your homepage.

And as your homepage will be visited more often than some page three levels deep on your hijackers site, we would be pretty lucky catching the bot at the right time to make it think that the hijackers page has perrmanently.

This is the only flaw, but it is un-countable times more safer and cooler (no ciml i really think cool urls too dont change:) ) than what i suggested. I think taking this idea a step further will bring us closer to some realization of our goals.

How about doing it once every day for google bot alone.

That is
Day1 : Gbot asks for yourdomain.com. You redirect it once that day to yourdomain.com. No harm done today and no gain also.
Day2 : Gbot asks for yourdomain.com. You redirect it once that day to yourdomain.com. No harm done today and no gain also.
Day3 : ditto
Day4 : ditto
Day5 : ditto
Day6 : Gbot asks for yourdomain.com thiking it is fetching hijacker.com/url.php?url=yourdomain.com. Today no harm done but lots of good done.

I need to refine this better. So i am planning to look at my logs for the past year to see how Gbot has requested my pages, starting from the homepage and how many times a day.

Will post if i think i see any pattern and ask for your ideas.

12:37 am on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 24, 2004
posts:95
votes: 0


Claus
But that idea can be implemented without looping using a ssi language like php.

I will take a shot at writing it and post it later on in the day.

But it will involve using database to sort of have an memory of what happened in the recent past.

12:48 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 8, 2004
posts:865
votes: 0


claus - Don't use it. I'm sorry, it will loop of course. Back to the drawing board.

I made the same mistake in msg #:552

I wonder if jdmorgan (jim) from the apache forum could come up with a workable solution, he's good.

-------

Another idea,

For example, when people link to your site ask them to use www and then have this in your htaccess (or vice versa),

RewriteCond %{HTTP_HOST} ^www.example\.com
RewriteRule ^(.*)$ [example.com...] [R=permanent,L]

That way even if they use a 302 to www.example.com it would be corrected automatically when it was 301'ed in your htaccess file. Although this could potentially mess with your backlinks. Thoughts?

12:59 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


I know of several ways to harden your site.

1. random content shuffle
2. programmed 1 shot 301 redirects (kinda like the random content).
3. massive invasive insertion of code.

Now I already do a lot of 1, 2 is on the boards, 3 I just did for another reason (related to this) and had a sore wrist for a week or so after (not going to do it again).

1:11 am on Mar 16, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 12, 2001
posts:1150
votes: 0


<<Back to the drawing board>>

Keep'em coming, Claus. We appreciate your efforts.

1:35 am on Mar 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 24, 2004
posts:95
votes: 0


I have got his one sinking feeling.
Because whatever we do is dependent on how G is treating 302 redirections.

You stated claus that Gbot sees the 302 redirection and goes yipee one more url and indexes it again as hijackers url. Are you certain? Because if this is the way its done then we can get over it.

But...and this is a big But....what if Gbot does not go yipee one more new url. It already knows that the redirected url exists in its index. It just by default assigns that url to the hijackers url without doing an fetch.

We can get to know this with the help of gregdi & idoc &other victims (no too strong a word.. more like casualties), we can get to know.

I suggest this. gregdi & idoc check for the hijackers page in the index and check the cache date. if you have more than one hijacker, then check all their cache date and then check your logs to see gbot activity on that date. Is there any difference, like your page that was hijacked being fetched twice etc. Or if there any pecularities that you see please post them here concisely. Also dont be afraid to use your gut instincts, after all no-one knows your site better than you. Because in those pecularities lies our answer. really

<edit reason> Corrected typos</edit>

[edited by: boredguru at 1:36 am (utc) on Mar. 16, 2005]

This 713 message thread spans 48 pages: 713