homepage Welcome to WebmasterWorld Guest from 54.204.231.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 713 message thread spans 24 pages: < < 713 ( 1 ... 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 > >     
302 Redirects continues to be an issue
japanese




msg:748407
 6:23 pm on Feb 27, 2005 (gmt 0)

recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

 

theBear




msg:748857
 2:42 pm on Mar 13, 2005 (gmt 0)

japanese your answer of ignoring the 3** location field is fine except that G says to use 301 to move your site to a new domain so the proper answer is to place the page a location in to the domain it is in as with a url of location. the location is the important part here they simply put location's page into the slot for location's url.

To do what you suggested for all 3** would cause as big as or bigger problem than you want to fix.

The point here folks is that Google's bot is not a web browser it is a tool that Google wrote to collect web page content for indexing. If Google wishes to to index web content it should store that content under the location (url) where the content actually was found.

[edited by: theBear at 3:19 pm (utc) on Mar. 13, 2005]

japanese




msg:748858
 3:06 pm on Mar 13, 2005 (gmt 0)

theBear,

In my original post at the head of this thread and a couple of others. I explain how the 302 has been exploited, used incorrectly and misinterpreted by many people using the 302 status code in conjunction with incompatible scripts that were never approved by google.

What I am trying to demonstrate is that google should have seen this problem coming and built its bots to either ignore the LOCATION FIELD instruction produced during the process of the redirect and proceed to cache only the code produced page.

Here, the inexperienced webmaster cannot do any damage to another site. No matter how inadvertently or intentionally, he cannot get googlebot to take a snapshot of the target page. It really is as simple as that.

And why is google not explaining anything about this problem?

Their bots seem to handle 307 redirects marginally better, hang on that’s not the correct terminology, less worse is a better term.

And do their bots actually conform to the set standards laid out?

This is where we all get lost and anybody with a little detective ability can exploit the loopholes.

I think that this is a solution. Not the best but the best available so far.
======================
1, No matter what script is used, googlebot can detect serverside directives.

2, Within this environment, be it 301, 302, 303, 305 (proxy) and 307 the bot must obey that a redirect is indeed been implemented.

3, The bot must ignore the LOCATION FIELD.

4, The bot must take a snapshot of the generated CODE PAGE.

5, The generated CODE PAGE be indexed in google as the final destination of the redirect.

6, The end user can click on the redirect if the user so whishes.

7, If a META REFRESH exists in the generated CODE PAGE then the bot must ignore it.

Simple, effective and a robust solution.

jk3210




msg:748859
 3:13 pm on Mar 13, 2005 (gmt 0)

<<unless removed by URL-console or returning a 404 or 410 >>

Exactly. And I was only able to figure this out because it happened to me. I got one offending site to removed the "=www.mydomain.com" from their link, but the link still wouldn't return a 404 to allow it to be deleted because the script kept doing what it was designed to do.

I had a 20 minute screaming match on a conference call with a bunch of suits at this one site, and they kept telling me "THE LINK HAS BEEN REMOVED!"

I'd say "BUT, TRY IT....SEE?...IT STILL GOES TO MY SITE!"

Then one of the suits said something really stupid like "THAT'S A \\\GOOGLE LINK/// YOU'RE SEEING." And while I was thinking to myself "geez, what a moron," I realized he was right --that link *with my url in it* only existed in Google's database.

Thankfully I got the suits to put a true geek on the phone and before I finished describing the situation, he understood exactly what the problem was and said "No problem, dude, I'll set it to return a 404."

5 minutes later the problem was solved.

japanese




msg:748860
 3:22 pm on Mar 13, 2005 (gmt 0)

theBear,

Don't forget, we are talking about the loophole. If You wanted, can you not tell googlebot that CNN is now your 301 permanent redirect?

The redirect is an alternative method of a near seamless link to another page. It is the standard status code definitions that are being exploited.

It is clear and without a shadow of a doubt.

302
My page has moved to a temporary location, whilst the old server is repaired, and I have no choice but to use another shared IP or/because I orinigally used a static IP, or, now useless domain name that no longer works for my product, or I simply want a domain name change temporarily. Reasons can be endless.

Keep coming back here for further info because I may temporarily again reside my page elsewhere or I want to keep moving it around like a merry-go-round.

It is also possible to create a website for pure exploitation of these status codes. Just fill it up with thousands of internal links pointing to a CGI, ASP, PHP etc redirectors and point them all over the internet haphazardly. All the 3** range with a nice meta refresh in all of them. No html link whatsoever. Make all the links pointing to the redirector a mile long to make it look jazzy.

The end result is havoc. And it can be stopped by google and msn by restricting their bot not to cache the target page as I have described.

theBear




msg:748861
 3:25 pm on Mar 13, 2005 (gmt 0)

Japanese except that Google already created a problem, they inform webmasters to use a 301 permanent redirect to move sites.

The truely robust solution is for Google to store the page at location under location's url.

This way if the page exists at location (target site page) the page at location (target site page) get stored in the slot for the page at location's url (target page).

Your solution causes major problems.

rehabguy




msg:748862
 3:29 pm on Mar 13, 2005 (gmt 0)

> How to Make Google Fix This? - Publish a HowTo Everywhere

This post by kenmcd is right on the money... Google's money that is.

Now that Goog is a a public company, they now have to be aware of thier stock price, which is often more important to the media than the actual service that they offer.

I for one plan to follow kenmcd's advice and get this info out there.

Since it's too hard to explain the problem at length, we just need to come up with a good headline like "Google Glitch Puts Companies Out of Business" and get the testimonials from 9-10 businesses that have been severly affected by the problem.

Feel free to sticky me with the name and url of your business if you want to participate.

theBear




msg:748863
 3:33 pm on Mar 13, 2005 (gmt 0)

Actually status 302 is found and the location is where it can be found.

Status 307 is moved temp.

activeco




msg:748864
 3:34 pm on Mar 13, 2005 (gmt 0)

...or I simply want a domain name change temporarily

Not trying to be too much picky about this but it is worth noting that 3xx redirections don't have to involve domain change at all.
They are just location redirections, whether on the same domain or not.

theBear




msg:748865
 3:41 pm on Mar 13, 2005 (gmt 0)

activeco,

You are correct on the redirects.

japanese




msg:748866
 3:42 pm on Mar 13, 2005 (gmt 0)

theBear,

Perhaps you are correct. I am not a highly paid software engineer in charge of googlebot's algorithm. Nor are you nor anybody else outside of google's current ambitions that are stock market driven.

And I certainly am not dictated to by fat cigar smoking cognizant individuals who took a couple of educated chaps dream machine and pontificated that it be used to bring in revenue for stock holders.

I would bet that more money is lost by sites that are void in search results than google is worth in Wall Street.

theBear




msg:748867
 3:47 pm on Mar 13, 2005 (gmt 0)

If I 301 my pages to CNN I baisclly just transfer my PR as my equivalent locations will return 404's or 410's depending on CNN's setup for things not found.

Speaking of which anyone with custom error handling pages had best run a header check on what is returned you might get a huge surprise (think 302 found, followed by a 200 with the custom error page) if they didn't do it correctly.

If you want to get things removed from the SEs you will only wind up expending bandwidth.

theBear




msg:748868
 3:58 pm on Mar 13, 2005 (gmt 0)

rehabguy,

There is still another shoe to drop.

So far what we appear to have here is a sure fire means to dup content out specific pages.

I suspect that the same script may also grab the targets pages content and present it to Google as its own with maybe a meta refreash to the hijacker's buddy site somehow.

japanese




msg:748869
 4:27 pm on Mar 13, 2005 (gmt 0)

theBear,

Actually 302, found, the requested resource resides temporarily under a different URL.

So unless I am mistaken, this status codes definition indicates that a temporary URL exists for this status code and that the description of a 302 is "found". The temporary URL must be a totally different URL by definition.

Very little exists in the above to misinterpret the instruction. You must take the whole as the directive and not break it up. And how are we to expect Joe Public to understand this crap?

Besides I think we should stay on track of the thread and not deviate into pedantic illustrative descriptions of exactly what status codes standards are other than the 302 which is very relevant to this topic. And it surely means, without a shadow of a doubt that it is a temporary redirect instruction.

I still stand by my suugestion until it can be proved to be flawed. If it is, then we look to an alternative untill we find one that works.

ciml




msg:748870
 4:41 pm on Mar 13, 2005 (gmt 0)

> The generated CODE PAGE be indexed in google as the final destination of the redirect

japanese, I would hope for that not to happen.

Imagine URL-A, which has links from many sites, giving a useful variety of anchor text. Then, if URL-A redirects to URL-B the standard link from the body of URL-A will carry the PageRank (minus about 0.03 of Toolbar PR), but not the anchor text of the benefit given by the diversity of links.

Also, while Apache's default behaviour for a 302 redirect is to output the following text in the body, this cannot be relied upon in other circumstances:
The document has moved <A HREF="http://www.example.com/">here</A>.<P>


According to HTTP/1.1 [faqs.org] (10.3.3.3):
Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.

So, the problem we see with Google and 302s is quite consistent with the standard.

Google is not "unconditionally compliant" with HTTP. So, when Googlebot comes across a link from URL-A to URL-B that then 302 redirects to URL-C, it would seem sensible to treat it as a link from URL-A to URL-C.

That way, a mischievous webmaster would be able to pass links to a destination (which can be done with straight links anyway), but he would not be able to usurp someone else's listings.

Also, this should stop accidental "hijackings", where some software on the web site uses a 302 instead of the more usually sensible 301.

This might help certain types of webmasters to collect links to their pages more easily, but such a webmaster would probably know to use a 301. This would also allow many PPC listings to count as straight links, but many count now anyway.

For some sites this would show the content URL in Google's results, rather than the vanity URL that redirects (e.g. www.example.com/default.aspx rather than www.example.com). Yahoo! have opted for a fairly contrived set of assumptions for identifying what to do with 302s, but I believe that not listing vanity URLs is a price worth paying for fixing the serious problem of good and important content not being found.

One important risk of not adhering to the above recommendation in HTTP's specification is that if a webmaster purposely uses a 302 instead of a 301 redirect, then the search engine could link to an out of date URL after the redirect is removed or changed. However such a use of a 302 redirect is a rare occurrence, and Google has such a fresh index for the vast majority of important pages that this risk is close to negligible.

theBear




msg:748871
 5:01 pm on Mar 13, 2005 (gmt 0)

japanese,

My solution leaves both sites intact your blanket 3** status code handling is not the correct way for an indexing system to work.

You are confusing a data collector's (Googlebot) visiting of pages with the actual index of the webs data.

They are two different things. Googlebot is nothing more than a data harvester. The data it harvests is always at the location it retrieved it from.

Googlebot only has two choices that make any sense ignore 302s or to catalog the data it finds from following them under the location at which it found it.

To do anything else causes massive duplicate content problems.

Which is exactly what is happening now.

japanese




msg:748872
 5:15 pm on Mar 13, 2005 (gmt 0)

ciml,

Noted, good post. Excellent, descriptive and informative.
---------------------

theBear,

Likewise, Nice post.
--------------------

I withdraw my offered solution. At least I accepted that a flaw exists in that method and was quickly indicated by learned individuals.

Back to the drawing board.

martingale




msg:748873
 5:18 pm on Mar 13, 2005 (gmt 0)

I agree: the least damage is done by treating the 302 as an ordinary href link.

deanril




msg:748874
 5:18 pm on Mar 13, 2005 (gmt 0)

In response to Post #366 from Dave
a possible indication of a fix on some datacenters?

deanril: I the same phenomenon on Liane's site a few weeks ago and posted about it at Danny's forum. I think it is part of a fix.
Liane uses absolute addressing and a 301 from non-www to www. If memory serves, your site uses absolute addressing?

It's strange that people would rather debate "Google is broke" than fix their sites, but there are many things I'll never understand.

Yes my site is using Absolute addressing, and recently after reading your suggestion, I have placed a 301 from non-www to www.

Again I did an allinurl:mysite.com and found an offending site, when I clicked the cache my page showed but some of the images are missing, the ones that we didnt use the full url in. So to me I think its cacheing it from their domain. If I do a cached of any of my pages from my domain the pics show up.

On some data centers it seems this offending site is not there in my allinurl:mysite.com on 64.233.167.104 currently it is. For example on 64.233.161.99 it is not there the site is gone.

So maybe has a fix in the works?

theBear




msg:748875
 6:00 pm on Mar 13, 2005 (gmt 0)

deanril,

I take it you got hit by the site split problem.

deanril




msg:748876
 6:55 pm on Mar 13, 2005 (gmt 0)

No I was following Dave's suggestions post #112

ncgimaker




msg:748877
 6:58 pm on Mar 13, 2005 (gmt 0)

What I think is happening:

Firstly, I think *my* sites problem is simply caused by the change of IP address.

I think Google stores its pages indexed under an number built from the current IP address and a hash of the URL.

So it doesn't know page www.fredbloggs.com/page.html it knows an 8 digit number that represents the page. If you change the IP address but not he URL of the page, it still thinks its a new page on a new site.

All the links change too, because their index number resolves differently they look like new links to Google.

That would explain why Google is reporting approximately 2 times the number of pages for my URL than actually exist and would explain our drop in rank and the loss of our pages from the index I think.

It also means time will fix it.

The 302 problem I think is different. Notice that Google groups results, so that 5 sites from the same company on the same subject will fight each other for rank. Google seems to determine that some sites are connected and so should displace each other.
I think they have a run that looks for 302, linking patterns, manual spam reports etc. and they build a list.

I think allegra simply inroduced a new list.

So it isn't the hijackers page that is displacing your page, it is the *other* sites the hijacker has stolen, some of which will be higher rank and if Google thinks you're all related from the same company, his page gets shown and yours doesn't.

That is what I think is happening. With Allegra, it was probably just a fresh run of the program and some new hijackers appeared. If I'm right Google will fix that table and time will fix the problem, but of course new hijackers may appear, so new, different, sites may disappear the next time they do this run.

The only fix for them is to check the table manually, and the Craig list job might be related to this.

Just my opinion.

incrediBILL




msg:748878
 7:16 pm on Mar 13, 2005 (gmt 0)

"incompatible scripts that were never approved by google"

Google doesn't appove scripts

theBear




msg:748879
 7:42 pm on Mar 13, 2005 (gmt 0)

ncgimaker,

You had best look at those urls ....

2 wagers:

1: You have your site listed under www.yourdomain.com and yourdomain.com, possibly ip addy, and mybe under a parked domain as well.

and

2: You have been whacked by Googles duplicate content filter.

and possibly this one as well.

3: There may or may not be a 302 problem with your site, my bet is you may find that as well.

Solution for the split site problem:

Search your server software documentation for canonical hostnames:

Canonical Hostnames

Description:
The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of example.com, you might use a variant of the following recipe.
Solution:

# For sites running on a port other than 80
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{SERVER_PORT}!^80$
RewriteRule ^/(.*) [fully.qualified.domain.name:%{SERVER_PORT}...] [L,R=301]

# And for a site running on port 80
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) [fully.qualified.domain.name...] [L,R=301]

Then:

You might want to go through the results of a site:yourdomain.com and look at the green highlighted urls they should all have yourdomain.com before the first slash.

If any don't then you have the other problem as well.

surfgatinho




msg:748880
 8:22 pm on Mar 13, 2005 (gmt 0)

I haven't read every reply in this thread yet so apologies if this has already been suggested.

How about a robots.txt directive for if you want to use 302s. You would have to specifically state you want to use 302s otherwise the default would be to treat 302s as 301s.

Sounds a litle simple so I've probably missed something like G$ wouldhave to admit the problem first!

ncgimaker




msg:748881
 9:20 pm on Mar 13, 2005 (gmt 0)

theBear, I don't think so, I search under the subdomains and the only one Google knows is www.host.com, it doesn't know about host.com or the IP address, they don't show anything.

I also don't think there is a 302 problem. All the pages that come up when I search for my site with a site command are mine, (at least the first 1000 I can check).

If I pick obscure phrases from my pages and search, I get 1 copy and no exact duplicates. And although Google says I have 2 times as many pages, 2 copies of my pages text are *not* appearing, only 1.

AllinURL shows other sites, but then its supposed to show pages with the specified words in the URL, so it *should* show other sites if they use a server side redirect script. Its been doing this for a year or more with this site and this is not new.

Many URLs Google lists are shown blank, this is consistent with a new site not yet pulled. But then I changed IP addresses and if they index their pages by IP address, this is what I'd expect.

So I don't think there is a problem, it just needs time to adjust to the new IP address, drop the old pages from the old IP address and pull the new ones. A few update cycles will restore my site without problem I think.

theBear




msg:748882
 10:00 pm on Mar 13, 2005 (gmt 0)

incrediBILL,

Yep, google sure doesn't put its stamp of approval on any script.

zeus




msg:748883
 12:52 am on Mar 14, 2005 (gmt 0)

I see some changes here 66.102.11.99 in link count and site counts/listing, BUT less indexed pages,backlinks for me and the hijackers/redirecting/scraper sites are having a great day once again.

Another thing I made a scraper site here first this month, because I know its the future for google and its what it want so Im making my self redy if nothing changes this month, Im not waiting another 6 month, to create quality content if its not what google wants.

3 weeks and the whole site was indexed fully with description and all, respidered every week, that site I used 20 min. my other that is suffering from the google bug I used 3 years.

surfgatinho




msg:748884
 1:17 am on Mar 14, 2005 (gmt 0)

I see some changes here 66.102.11.99

I noticed this a couple of days back on 66.102.11.104 - it look pre-Allegra to me.
Whatever it is it suits me. My site is back where it "should" be on this data centre.

What's weird is I don't check the data centres by ip address. I think I just went to google.co.uk (without the www) and the results were different (to www)

stargeek




msg:748885
 10:41 am on Mar 14, 2005 (gmt 0)

that DC does show some reduction in ranking for redirect based sites but i've been seeing that index on various DCs for a while now. its not just pre-allegra.

The_Hitcher




msg:748886
 11:46 am on Mar 14, 2005 (gmt 0)

What TOTALLY staggers me is that Google STILL haven't got a grip on this and its now leaking out all over the place. Three months ago you'd have found nothing for a search on 'hijacking' - now its all over the Net. Clearly unless Google sort this out it'll be the ruin of them. As a fellow SEO said this week "They seem to have their head in the sand". One wonders if they tweaked their algo so far its way too complicated to fix now - must be, its been known about since last Summer.

Bobby




msg:748887
 12:00 pm on Mar 14, 2005 (gmt 0)

How about a robots.txt directive for if you want to use 302s

That would be great if you could convince the hijackers to do so.

Wouldn't it be ironic if you could publish sensitive data of another web site simply by bypassing the robots.txt with a 302..?!

Reality or fiction?

This 713 message thread spans 24 pages: < < 713 ( 1 ... 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23 24 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved