Welcome to WebmasterWorld Guest from 54.224.121.67

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects continues to be an issue

     
6:23 pm on Feb 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 27, 2005
posts:93
votes: 0


recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

10:39 am on Mar 21, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 8, 2005
posts:38
votes: 0


this problem, assuming people are using it to gain PR or SE, could also be done with 301s possibly.

What if i have a PR 6 site, then created a blank directory with a htaccess 301 redirect to the most popular file/page on a site with a PR 2?
eg:
redirect /mydir/mysite.html h**p://somesite.com/somefile.html

The googlebot would also see this as duplicate content, and also think that the content it directs to belongs to my site.

1:22 pm on Mar 21, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2412
votes: 5


I thought this thread had been closed and the discussion was continued here: [webmasterworld.com...]
----------------------------------------------------

entropicus,

If you point a 302 at another page from one of your pages, then the search engine will keep your URL and throw away the URL of the target page.

If you do it with a 301, then the search engine will throw away your URL and keep the URL of the target page.

So, 301's and 302's work in opposite directions.

10:28 pm on Mar 21, 2005 (gmt 0)

New User

10+ Year Member

joined:Jan 12, 2005
posts:21
votes: 0


I do not know what is more disturbing that I have had to spend so much of my time trying to figure out what has happened to my customers web site or the recent comments by Japanese and Zeus...

I have to say that I am disturbed by the comments of Japanese "...Also note that your site may never gain its rank even after the removal of the offending links. " and Zeus... "...I dont see any solution to this topic, so maybe we most face the music and start to build pages with some redirecting scripts to good sites and then our own content, that way the scripts will be a form of SEO"

What do I say to a client that has invested $300,000 in web site development and SEO since 1997? Give up? Start over?

10:49 pm on Mar 21, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


For many folks who previously benefited from G traffic, the SERPs have become an unreliable source of income.

Anyone who has been SEOing for a few years probably thinks of pleasing G most of the time. We do things we think G will like, and avoid perfectly reasonable things we think G won't like, and that's in addition to building good content day after day. It's time to try something different.

I have decided to start thinking of G as a welcome but unreliable source of traffic. I'm going to concentrate on other engines and I'm going to go back to cross linking my network of sites where appropriate.

You see, I've actually deleted scads of self referencing links over the past couple of years as my SERP positions dropped. I used to get some nice cross traffic that way, which was nothing compared to G traffic when my positions were good. But it was real, legitimate traffic that I could predict and control.

Why should I sell myself short and expend all my energy on bowing and scraping before a god that will not be appeased?

On a final note -- do you recall those days of wandering in the desert after the fall of AltaVista but before the rise of Google? We took our traffic where we could get it. We got by, and we will continue to get by. And we'll meet with high profits again some sunny day...

5:15 pm on Mar 23, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 3, 2003
posts:963
votes: 0


Heres are responses directly from GoogleGuy (or someone disguised as him),
[threadwatch.org...]
[slashdot.org...]

Please discuss regarding this at,
[webmasterworld.com...]

7:07 pm on Mar 24, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:3
votes: 0


Yseterday, I cooked up an idea for a web server-based defense against this exploit and posted it to slashdot([slashdot.org ]) where it received no comments. I'm not sure if I should take this as a good sign (nobody found a serious flaw) or a bad one (nobody thought it worth discussing).

I'm considering recommending that my organization implement this, but am airing it out in public first to see if someone can find a flaw in it.

Proposed Defensive Solution

Problem Statement
Robots that index pages for a search engines may be tricked into believing that content from one site actually belongs to another. The sequence of events looks like this:

  1. The robot visits [badguy.xyz...]
  2. The web server at badguy.xyz responds with an HTTP 302 redirect that informs the robot that the content has been temporarily moved to [victim.xyz...]
  3. The robot dutifully follows the redirect to [victim.xyz...]
  4. The robot receives content from the web server at www.victim.xyz and indexes it. However, because it believes that the content has been moved only temporarily, it indexes it under the www.badguy.xyz domain instead of the www.victim.xyz domain.
  5. Some time later, a user hits the robot's search service (google in most examples) and types in some keywords that appear at [victim.xyz....] The search engine finds the keywords which it has indexed under www.badguy.xyz, so it returns a link to [badguy.xyz....]
  6. The user selects the link and is taken to the [badguy.xyz...] site where badguy has complete control over the content.

Proposed Defense
To protect against the scenario above, the administrator of victim.xyz can install a filter on her web server which will issue an HTTP 301 redirect back to itself if it thinks that the request might be the result of a malicious/erronious HTTP 302 redirect.

Here is how it works:

  1. The robot visits [badguy.xyz...]
  2. Badguy issues its 302 redirect as above
  3. The robot follows the redirect to [victim.xyz...]
  4. The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz.
  5. The filter determines if it has seen this particular web client recently. (This check could be as simple as scanning the last few lines of the Web server's access log.)
  6. If the filter has not seen this client (the robot) recently, it issues an HTTP 301 ("moved permanently") redirect pointing to [victim.xyz...]
  7. The robot follows the redirect to [victim.xyz...]
  8. The filter at victim.xyz intercepts the request. This time, it recognizes that it has seen the robot bofore and lets the request through normally.
  9. The robot receives Web content from the sever at victim.xyz and indexes it. Because it reached this site from a 301 (moved permanently)rather than a 302 (moved temporarily) redirect, it knows that the content belongs to victim.xyz rather than badguy.xyz and indexes it under victim.xyz. badguy.xyz never gets associated with the content.

Because a robot might be smart enough to recognized that it is being redirected back to the current page, it would probably be a good idea to obfuscate the http 301 redirect by rewriting the URL in a technically insignificant way. For example, "http://www.victim.xyz/" might be rewritten as "http://www.victim.xyz/?"

Exactly how this filter would be implemented depends on the Web server platform and possibly the requirements of the organization. For example, it could be implemented as an Apache httpd module, an IIS ISAPI filter (or whatever the .Net equivalent is. It's been a few years since I've worked with Microsoft products), or a servlet in a J2EE setup. In some cases, it could even be implemented in a more localized scope using globally included PHP or ASP scripts, although I think I'd steer away from this because of the performance penalty.

I'd greatly appreciate feedback.

7:25 pm on Mar 24, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz. <<

A web browser visits pages by going from one to the next by clicking on links, and may (it might not, as some people surf with referrers off) leave a referrer in your log (the referrer is the URL of the previous page that it was visiting, if that page linked to you). If someone typed the URL in, clicked on a bookmark, or has referrers off then you will not see a referrer. Don't confuse Referrer with User Agent. The User Agent part of the log entry says which browser and OS was used.

For search engines, they do not crawl the web going from one page to the next. They spider a page and add all links found on that page into a database. When they finish that page, they ask their own database for the URL of the next page to spider. It might be one on a different site! Multiple bots will adding to that database, and getting their next job from it, so you can have several bots from the same search engine on your site at the same time. Search engine bots leave User Agent information in your log, but they do NOT leave any referrer information, ever.

8:26 pm on Mar 24, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 27, 2003
posts:28
votes: 0


Here is how it works:

1. The robot visits [badguy.xyz...]
2. Badguy issues its 302 redirect as above
3. The robot follows the redirect to [victim.xyz...]
4. The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz.
5. The filter determines if it has seen this particular web client recently. (This check could be as simple as scanning the last few lines of the Web server's access log.)
6. If the filter has not seen this client (the robot) recently, it issues an HTTP 301 ("moved permanently") redirect pointing to [victim.xyz...]
7. The robot follows the redirect to [victim.xyz...]
8. The filter at victim.xyz intercepts the request. This time, it recognizes that it has seen the robot bofore and lets the request through normally.
9. The robot receives Web content from the sever at victim.xyz and indexes it. Because it reached this site from a 301 (moved permanently)rather than a 302 (moved temporarily) redirect, it knows that the content belongs to victim.xyz rather than badguy.xyz and indexes it under victim.xyz. badguy.xyz never gets associated with the content.

you mean well, but theres flaws to the defense. mainly the already posted fact that a robot will look at the badguy.foo?url=victim.foo, see the 302 to victim.foo and just record it as if your content existed there under the badguy site. it at a later date then goes to index victim.foo.

so when googlebot is doing a normal crawl on your site your log checker will trigger and you'll start tossing 301s at goolgebot left and right.

its already been mentioned as one of the helpful possibilities (tossing 301s to robots upon their next vists to your site after you've been 302 serpjacked) amongst other things. all those things mostly being hail-mary type defenses, not really sure things.

to recap the long thread your possible defenses are:

1. adding dynamic elements to all your urls
2. sending 301s once per page to robots
3. dynamic content elements on all pages
4. 301/302s from victim.foo to www.victim.foo
5. contacting badguy.foo, asking them to remove listing
6. change domain names
7. try to get PR boosted over badguy.foo
8. changing link/navigation structure
9. removing badguy.foo via google removal tool
10. reporting site as spam to google
11. yelling and screaming at ww and /.

thats most of them- as you see, none are optimal in terms of seo practices.



Thread continued [webmasterworld.com]

[edited by: ciml at 4:21 pm (utc) on Mar. 25, 2005]

This 713 message thread spans 48 pages: 713
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members