homepage Welcome to WebmasterWorld Guest from 54.211.68.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
Solutions for 302 Redirects and META Refreshes in Google
Ideas for Google or webmasters to help with the "hijacking".
ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 4:14 pm on Mar 25, 2005 (gmt 0)

We have had plenty of discussion on the META refresh and 302 redirect issue, and this thread is intended as a repository for each of the ideas to help or solve the problem.

If you have an idea that Google could use to alleviate this problem, or that a webmaster could use to fix or avoid this problem, please post it here.

Each post should contain only one idea. Each idea should have only one post. There's no need for a long code example, just the mechanism.

Any followup discussion belongs in the Google's 302 Redirect Problem [webmasterworld.com] thread, not here.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 4:28 pm on Mar 25, 2005 (gmt 0)

I already posted it yesterday, and could not get anyone to take any notice of it.

[webmasterworld.com...]

.

Hmm. Having read the threads, seen the SERPs, and thought about it some more.... <tidied wording>

If there are two pages URLa and URLb, Google would cache, index, and rank both of them. If one provided a normal link to the other then it would "pass" some PR too. Both could appear in SERPs. If their content was the same, then the one with the higher PR would win out, but if one used stolen content you have laws that can deal with that.

If URLa provided a 301 redirect to URLb then I would assume that just the URL for URLa would be stored internally in Google (there is no content there to index), and would be internally marked as being a redirect. I also assume that URLa would be dropped from the SERPs for the period that it returned that status, and that URLa would be respidered occasionally to see what its status was. The content residing at URLb would be spidered and indexed and would appear in the SERPs with the URL for URLb against it. If at any time URLa went 404 then it would be dropped from the index, likewise URLb.

If URLa did a 302 redirect to URLb, then this is a temporary redirect. URLa is saying that the content temporarily resides at URLb. There is no reason to include URLa in the search results though. Google could quite easily include URLb in the results with its associated content being cached and indexed. However, Google should also be keeping an internal note that it had been redirected there from URLa, and if the status of URLb ever changed from 200 to 404 then Google would know to go back to URLa and ask it for the new location of the information. That is, Google "remembers" URLa as being the starting point for the 302 redirect but does NOT show URLa in the SERPs as there never was any content AT that location.

Does this make sense? What flaws would there be in that?

idoc

10+ Year Member



 
Msg#: 28741 posted 5:10 pm on Mar 25, 2005 (gmt 0)

Trawler, Interesting... this has been worse since Florida.

I posted this in the other thread:
IMHO that having at least *some* absolute url's in your site back to the site index helps to immunize from this. I can't explain why convincingly as I am not privvy to the inner workings of the bot. The only thing I know is *if* this page is spidered and is attributed to belong to some site"b" it will contain an absolute link to a site with original and duplicate content. I think that is poison to that url for site"b" with the bot.

Expanding on that how about having an absolute link on each page served: http: //site.com/page.html that links to itself. This way The jacked page contains a regular non-302 link back to the original site"a" page. I need to think this through, but this could be served just to googlebot, though it really wouldn't hurt anything with the other bots I don't believe. A simple include file on each page could then insert the canonicalpage url into each page served.

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 5:59 pm on Mar 25, 2005 (gmt 0)

Idea for Google: Treat links via 302s as straight links to the destination.

From msg 464 [webmasterworld.com]:

According to HTTP/1.1 [faqs.org] (10.3.3.3):
Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.

So, the problem we see with Google and 302s is quite consistent with the standard.

Google is not "unconditionally compliant" with HTTP. So, when Googlebot comes across a link from URL-A to URL-B that then 302 redirects to URL-C, it would seem sensible to treat it as a link from URL-A to URL-C.

That way, a mischievous webmaster would be able to pass links to a destination (which can be done with straight links anyway), but he would not be able to usurp someone else's listings.

Also, this should stop accidental "hijackings", where some software on the web site uses a 302 instead of the more usually sensible 301.

This might help certain types of webmasters to collect links to their pages more easily, but such a webmaster would probably know to use a 301. This would also allow many PPC listings to count as straight links, but many count now anyway.

For some sites this would show the content URL in Google's results, rather than the vanity URL that redirects (e.g. www.example.com/default.aspx rather than www.example.com). Yahoo! have opted for a fairly contrived set of assumptions for identifying what to do with 302s, but I believe that not listing vanity URLs is a price worth paying for fixing the serious problem of good and important content not being found.

One important risk of not adhering to the above recommendation in HTTP's specification is that if a webmaster purposely uses a 302 instead of a 301 redirect, then the search engine could link to an out of date URL after the redirect is removed or changed. However such a use of a 302 redirect is a rare occurrence, and Google has such a fresh index for the vast majority of important pages that this risk is close to negligible.

markus007

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 6:21 pm on Mar 25, 2005 (gmt 0)

Why can't google just support a new tag in the robots.txt file.

Something like, "NO302" which would basically mean ignore and discard all 302 redirects to this domain.

Problem solved?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 6:27 pm on Mar 25, 2005 (gmt 0)

That would need to be retrofitted to billions of pages -- which just isn't going to happen.

You could invert the logic so that you add a tag only if you ARE going to allow the 302-page-pointing-at-you to be indexed as the canonical URL for the content.

However that raises a new problem. If you do allow the 302 page to be indexed instead, then what is there to stop a spammer putting up their own additional 302 redirect pointing at you? Faced with two 302 redirects pointing to your content, which one does Google choose now? It can't possiblly know which of the 302 pages you are authorising and which belongs to the hijacker.

Lorel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 6:36 pm on Mar 25, 2005 (gmt 0)

This fix involves those who are finding a 302 redirect in the site:domain.com command and ALSO have a Shared IP address. Contact the hosting company if you're not sure (a lot of hosts started offering this cheaper service a few years ago so they could lower their rates (without explaining the consequences) however if one site gets banned, for whatever reason, they all get banned).

Also, a bug in Google's algorithim finds a 302 redirect on one of the sites and attributes this redirect to other site(s) on the same IP address.

Check the IP address of both sites in a server header checker tool. If they match you have a shared IP address.

Solution: upgrade to a dedicated IP address* which is usually only $1.00 more per month.

*(note: this is not a Dedicated Server which is much more expensive).

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 6:45 pm on Mar 25, 2005 (gmt 0)

Simply do it the way, or something similar to the way, Yahoo does. Tim refreshed our memories in another forum yesterday on the presentation he gave at Orlando Pubcon.

[ysearchblog.com...]

Go down to slides 14 and 15, Yahoo Redirect Handling. The important points are the general rules:

For 301s or 302s between domains, Y keeps the target
A.com >> 301 or 302 >> B.com

For 301s or 302s within a domain:

Y keeps the source if it is the root:
A.com >> 301 >> A.com/asp?y=1

And keeps the target if it is between deep pages:
A.com/page1 >> 301 >> A.com/asp?y=1

For 302s within a domain, Y keeps the source:
A.com/page1 >> 302 >> A.com/asp?y=1

I remember we were all quite happy with it then as a way to solve Y's redirect problems. It seemed to work. It's simple. It requires no algorithmic magic, no playing around with PR, popularity or any of that nonsense, just a couple of simple yes and no questions.

Here's the book you wanted, The Big Sleep by John Grisham. Grisham? I want the one by Hammett! Well, Hammett's not as popular today so I taped over his name and put Grisham's name on the book.

kire1971

10+ Year Member



 
Msg#: 28741 posted 7:08 pm on Mar 25, 2005 (gmt 0)

Why can't google just support a new tag in the robots.txt file. Something like, "NO302" which would basically mean ignore and discard all 302 redirects to this domain.

You could invert the logic so that you add a tag only if you ARE going to allow the 302-page-pointing-at-you to be indexed as the canonical URL for the content.

There are many reasons for 302 redirects from within your own website, your own outside website and from people redirecting to you. They all need to function and should at least count as a link.

If the solution is to be a robots.txt or meta tag, then part of it would need to define the allowed page that does the redirecting. Basically, if no tag is present, you're stating "This is the permanent location of this page. Index this page and assign all content and PR to this page. Any 302 redirects should be counted as links." OR if you have a valid 302 redirect going, add the tag which states to the robot "This is a temporary location for this page. Assign all content and PR to the page located at http:..." This way, the owner of the actual content decides what the correct page is. Essentially it would be a check and balance for a redirect. If a redirect is valid, the page points back to it.

<meta name="valid302" content="http://www.onlyvalidredirectingpage.com"

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 7:43 pm on Mar 25, 2005 (gmt 0)

Removing a 302 redirect from the Google index:

You can do this by using their URL removal tool. Since the url redirects to YOUR page, you have control of it's removal by using the metarobots tag on the destination page. When you authorize removal using the removal tool, the program will instantly check the metarobots tag on the destination page to make sure it is set to "noindex". This is to protect you from unauthorized removals of your page(s).

Removing a url that redirects to any page on your site will not cause harm to the intended url's indexing/listing for that page (unless you forget to immediately change the metarobots back to "index"...I will reiterate this point a few more times below).

To remove a 302:

1) Make a list of 302s that are indexed in Google that you want removed. The inurl:mysite.com search is very helpful for finding these.

Then, set the meta robots tag to "noindex" on the page that gets a redirect and submit the redirect url. I would work only one page at a time (i.e., do not set more than one of your pages to "noindex" because the longer you leave a single page set to "noindex" you risk having the intended url dropped the next time it is spidered).

Once you get the offending url submitted, instantly return the metatag to "index" (within a few seconds of clicking "submit" on the removal tool, you will get a "Success!" notification. It is then safe to change your metarobots back. Again, if you forget to change the tag back, you obviously risk having the intended url removed next time Googlebot - or any other bot - checks your site.)

I have had 30+ redirects removed using this method.
The only thing I am not sure about is if Google still knows about the url(s) that are removed and uses them in ranking calculations, while hiding them from view. Does Google only remove the url from visible index? If Google removes the urls from visible index but retains the url/information somewhere else for it's own purpose.

Chris

zeus

WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 7:47 pm on Mar 25, 2005 (gmt 0)

As some have said before it important to understand that most of the problems some sites have is with 302 redirecting and NOT from bad webmasters that try to hijack sites with trackers....

Now the solution is still in googles hands, we can not expect every webmaster to ad a possible solution to there sites, I can tell what I have done since the problem started.

I knew it was not my fault that my site went down on google serps and was loosing pages, so I was VERY carefull with making changes to my site.

Fisrt I have hardly made any real changes, only on the frontpage I have changed the text with over 60%, because of still active caches of 302 sites, still listed.

Then I tryed to remove some hijackers/302 with the google removetool, so I added a meta tage no index on my frontpage, typed the bad url in the remove tool, waited a few sec. then removed my meta tag again.

About contacting the bad sites, mostly you will not get a responce, so start with the removal tool.

Now the problem with google bot, which dont like my site anymore, I tried to offer some cookies :), but nothing, then I downloaded googletoolbar, went trough my main pages with googlebar active, after that I removed the toolbar. I also uploaded the whole site again to the server so I got a new date.

Now Im sorry to say Im still in troubles, but I have got my PR back last month, but still nothing.

zeus

Idaho

10+ Year Member



 
Msg#: 28741 posted 8:02 pm on Mar 25, 2005 (gmt 0)

Google Removal Tool

A search for "allinurl:mysite.com" often didn't show my index page at all but instead showed somebody else's domain (located in Turkey). When I clicked on this link, my index page came up. When I clicked on the cached version of the site, it showed a very old cache of my page. This same site also showed up after all my results when doing a "site:www.mysite.com"

Using a header checker tool on the hijacker's URL (found in the "allinurl" search, I was able to see it was using a 302 link to my site.

Once sufficiently convinced I was the victim of a hijacking I removed the page from Google's index like this:

1. Go to Google's removal tool page:
[google.com...]

2. Click on the "urgent" link.

3. Sign up for an account with Google and reply back to them from an email will send you;

4. Place this meta tag on your page that was hijacked:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
Upload it to your site (and pray that Googlebot doesn't stumble on to it in the next 3 minutes.)

5. Using the instructions to remove a single page from the Google index, add the hijacker's URL that is pointing to your site. (copy and paste from the result found on "allinurl" search)

(If you get a message back saying something about the invalid character " ", it is telling you there is a space in the URL you pasted. Find it and remove it and then try again.)

6. You will get a message back saying that the request will be taken care of within 24 hours and it will show your request as "pending." This means the spider has already checked the page and found the tag. (If you don't add the tag it will immediately tell you that it can't find the tag.)

7. Don't wait for 24 hours! Immediately remove the "noindex" tag from your page. Hopefully this whole process will only take you about 2 minutes and your tag won't be up there if Google comes looking around your site.

This works because of the same Google flaw that creates the whole problem. Google thinks there are two pages instead of one. One is on your site and it thinks one is on the hijacker's site. All you are doing is sending Googlebot to the hijacker's site to look for the "noindex" meta. When if finds it through the hijacker's URL it removes it from the hijacker's site. If it happens to find it through your URL it will remove it from your site too.

In my experience, the offending page will probably disappear from the "allinurl" search within 4 to 8 hours. You can then log back into your account with Google and check the status. Within 24 hours it will say "removed."

I did this to several of my index pages that had gone missing. One of them that had been gone since January/December came right back to it's positions in the SERPs within about 7 days. The other one (which seemed to be sandboxed before the hijacking) is still nowhere to be found.

I have tried to contact the webmaster to remove the link but he doesn't respond. If the page ever get's reindexed, I guess I'll have the same problem again. Judging from the age of the cached version of his page before I removed it, Google doesn't fresh his site very often. Hopefully Google will fix this problem before the next time the bot visits his pages.

zeus

WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 8:09 pm on Mar 25, 2005 (gmt 0)

Yes, just that we are able to remove another site, must be evidence enogh for google, that they have a 302 bug on there hands.

JanFer

10+ Year Member



 
Msg#: 28741 posted 8:09 pm on Mar 25, 2005 (gmt 0)

For weeks nothing showed up in a site:[url]www.mysite.com[/url] in google, except for sites that were not mine.

I emailed google about it but never received a reply.

I lost all serps - the problem was so severe that I stopped getting any traffic at all from google. Now, even though the site is #1 on yahoo for great phrases, such as "XXXXXX", I had been getting more traffic from google than from Yahoo and MSN combined for over a year. It was a big loss to me. Msn serps were schizo, taking me from first to third page and back again - another loss of traffic.

I made a desperate attempt to correct this and shake off the hijackers. I had the whole site removed from google's cache and had it 'de-indexed' using the google tool for that purpose.

I still saw the hijackers on a site:[url]www.mysite.com[/url].

I repeated the removal three times, and finally, today, when I do a site:[url]www.mysite.com[/url] I see my site's urls. I about fell off my chair when I saw that after all this time.

Using the removal tool was a desperate measure, and I'm not entirely sure that having the site removed was the reason it is now back on. I didn't just remove the bad urls, I removed the site. I'm not all that educated in this stuff, you see. (Not yet, but thanks to Webmasterworld, I am getting educated.)

I also rewrote the index page, as I found another site had copied the first two paragraphs word for word.

Warning: The removal tool is just that - removal. Removing a page, a site, or a directory REMOVES it from google. This includes all backlinks and PR.

But I had nothing to lose.

Can't wait for the update to be complete so I can see where I stand.

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 8:10 pm on Mar 25, 2005 (gmt 0)

Thanks Idaho. Your post was a bit more descriptive. I didn't know there was an "urgent" option when setting up your removal account.

Yes, just that we are able to remove another site, must be evidence enogh for google, that they have a 302 bug on there hands.

Well, the fact that in November, over 400,000 redirect urls using the tracker2.php script were indexed in Google. Today, that number is down to about 10,000. That also tells me they are aware of the problem and are working to resolve it someway, somehow, maybe.

Chris

claus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 8:13 pm on Mar 25, 2005 (gmt 0)

Just a quick post clearing up the misunderstandings with the capital lettered "should" word. For RFC's, words such as "must", "shall", "may", and so on have very well defined meanings - there's actually a whole RFC (RFC 2119 [faqs.org]) that only deals with how these words are to be interpreted.

About the word "should" it states:
SHOULD
This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

So, for Google, it would actually be perfectly in line with RFC2616 (linked to by ciml above) if they should decide not to follow it regarding 302's.

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 8:44 pm on Mar 25, 2005 (gmt 0)

In keeping with ciml's request and one that at first may not seem relevent:

Solution for the split site problem:

Search your server software documentation for canonical hostnames:

Canonical Hostnames

Description:
The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of example.com, you might use a variant of the following recipe.
Solution:

# For sites running on a port other than 80
RewriteCond %{HTTP_HOST}!^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST}!^$
RewriteCond %{SERVER_PORT}!^80$
RewriteRule ^/(.*) [fully.qualified.domain.name:%{SERVER_PORT}...] [L,R=301]

# And for a site running on port 80
RewriteCond %{HTTP_HOST}!^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST}!^$
RewriteRule ^/(.*) [fully.qualified.domain.name...] [L,R=301]

Then:

You might want to go through the results of a site:yourdomain.com and look at the green highlighted urls they should all have yourdomain.com before the first slash.

If any don't then you have the other problem as well.

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 9:16 pm on Mar 25, 2005 (gmt 0)

Page hardening can be an effective block to the duplicate content problem that the 302 injection causes.

What follows is a php fragment that retreives and caches the news feed of a website devoted to Java if your site was about working with or using Java a collection of newsfeeds from other sites can be used on your pages to provide changing content.

Similar scripts can be written in other languages and invoked by various include facilities to randomly display related content on your page.

Please note that this example is in PHP and ran on an Ensim server.

As with all scripts YMMV.

This method of hardening probably stopped a large site from totally tanking.

<?
$link_prefix="";
$link_postfix="<br>";
$cache_file="/home/virtual/example.com/var/www/html/include/javaboutiqe.article.cache";
$cache_time=3600;
$max_items=12;
$target="_top";
$backend="http://javaboutique.internet.com/articles.rdf";
$items=0;
$time=split(" ", microtime());
srand((double)microtime()*1000000);
$cache_time_rnd=300 - rand(0, 600);
if ( (!(file_exists($cache_file))) ((filectime($cache_file) + $cache_time - $time[1]) + $cache_time_rnd < 0) (!(filesize($cache_file))) ) {

$fpread = fopen($backend, 'r');
if(!$fpread) {
//echo "$errstr ($errno)<br>\n";
//exit;
} else {

$fpwrite = fopen($cache_file, 'w');
if(!$fpwrite) {
//echo "$errstr ($errno)<br>\n";
//exit;
} else {
fputs($fpwrite, "<tr><td bgcolor=\"#000080\"><font face=\"sans-serif,arial,helvetica\" size=2 color=\"#FFFFFF\"><b>Java articles</b></font></td></tr><tr><td class=dept></td></tr>");
while(! feof($fpread) ) {

$buffer = ltrim(Chop(fgets($fpread, 10000)));

if (($buffer == "<item>") && ($items < $max_items)) {
$title = ltrim(Chop(fgets($fpread, 10000)));
$link = ltrim(Chop(fgets($fpread, 10000)));

$title = ereg_replace( "<title>", "", $title );
$title = ereg_replace( "</title>", "", $title );
$link = ereg_replace( "<link>", "", $link );
$link = ereg_replace( "</link>", "", $link );

fputs($fpwrite, "<tr><td class=dept>$link_prefix<a href=\"$link\"><b>$title</b></a>$link_postfix</td></tr>");

$items++;
}
}
}
fclose($fpread);
}
fclose($fpwrite);
}
if (file_exists($cache_file)) {
include($cache_file);
}
?>

You will need to have a number of these and a means of randomly picking a cache to display or to randomly rearange the display order. The more related content the better.

This may also cause a bit of variation in serp placement. But it is better than getting your pages completely blown out of the water.

Look into SSI and executing scripts within html.

Google can be your friend in this matter.

DaveAtIFG

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 12:19 am on Mar 26, 2005 (gmt 0)

There is nothing wrong with Google! They need not fix anything!
Now that I have your attention... :)

I've posted many of the following comments already in other threads, but I've added a little detail here to support some of my opinions. Bear with me.

PageJacking and the "302 redirect bug" both result from ambiguity about what pages are associated with what domain. It is a webmaster's responsibility to remove that ambiguity, not Google's. Google's job is to run the Google web site in the way that they feel best serves their users. If a webmaster is concerned with Google traffic or rankings, it is the webmaster's job to make their site as Google friendly as possible.

CONSOLIDATE YOUR SUBDOMAINS
dave.example.com, ciml.example.com, www.example.com, and example.com are all unique and legitimate domains. Unless webmasters take appropriate action to control ALL them (such as redirecting them all to www.example.com), they have not done their job thoroughly. Their PR may be split among the various domains but more importantly, they have left the door open for PageJackers and "302 redirect problems."

Server admins often configure servers to serve pages from either www.example.com or example.com by default and novice webmaster just don't realize the implications. (It's not Google's fault, it's those damn server admins!) ;) This is one of the biggest sources of ambiguity to SEs. Webmasters concerned with SE rankings must manage ALL subdomain variations associated with their domain name. Use a 301 redirect to point them all to one single version!

USE ABSOLUTE ADDRESSING
The other major source of ambiguity is using relative addressing on a web site, I.E. /somepage.html instead of absolute addressing, I.E. h**p://example.com/somepage.html. claus and others have described Google's "disjointed spidering" and discussed at length how the canonical domain (the domain rightfully associated with a page) can be "misplaced," in other threads.

In a recent SlashDot post GoogleGuy sez:
Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

If your domain name is embedded in each page (as it must be when using absolute addressing) your page is very unlikely to be associated with the wrong domain. Additionally, it is equally unlikely to face GoogleGuy's "incorporating PageRank into the decision for the canonical url" test.

FIXING A PAGEJACK
Over the years, I've seen numerous unsuccessful PageJack attempts to numerous sites that use the above techniques. Also, benign 302 redirects have not yet caused a problem. Will adopting these techniques fix a PageJack? After reflecting on this question for a few weeks, I'm convinced it will, eventually.

As your site loses prominence/rankings, spidering will become less frequent, so it will take Google a while to find your changes, should you choose to make them. Major Google updates occur every three months post-Florida and it may require several update cycles for Google to get it all sorted out. That may mean six months, nine months, or longer to sort it out. And using Google's remove URL tool where appropriate certainly won't hurt while you wait.

In this thread [webmasterworld.com] from last fall, I PageJacked a few pages of my own. Both domains used absolute addressing and the time between being spidered and the PageJacker page being dropped was surprisingly short! The same was true when I removed the PageJacks.

WHY HAS THIS BECOME SO MUCH MORE COMMON RECENTLY?
Hell, I don't know! I'm not convinced it has. This discussion [webmasterworld.com] is dated January 2002.

Google made big changes with Florida, perhaps some of those changes were to their spider squadron. Subsequently, they've made numerous changes to the algo, how PR weighs into the algo, and which links count toward PR. Some of those types of changes could certainly impact who has enough PR to PageJack who I suppose.

THE REAL WORLD
Webmasters MAY be able to make enough noise to get Google to handle 302s differently eventually, but it's not Google's responsibility to do so. If a webmaster wants to rank on Google, it's their job to give Google a web site that Google wants, unambiguously.

That's how the relationship between SEs and webmasters has always worked.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 12:54 am on Mar 26, 2005 (gmt 0)

I don't see any connection between assimilating all your subdomains into one version and the ability of someone to place a 302 redirect to one of them.

They would all return a status of 200, as duplicates. If you redirect all but one to a main domain that one will still serve a status of 200. So I assume that it would still be vulnerable to a redirect pointing at it, no?

Collecting subdomains together is good for one reason - eliminating what appears to Google to be duplicate content, such that in reality you are in fact "competing against yourself". I don't see, other than rationalising all your domains into one and boosting your own PR, what affect this would have against a site that 302 redirects to you and Google now thinks that your content belongs to them.

[edited by: g1smd at 12:56 am (utc) on Mar. 26, 2005]

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28741 posted 12:55 am on Mar 26, 2005 (gmt 0)

it's their job to give Google a web site that Google wants, unambiguously.

However much I would like to throw in a few ands, ifs, buts or maybes, can't much argue with that.

claus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 1:40 am on Mar 26, 2005 (gmt 0)

I will not be able to put one idea only into this post, as it will be more of a listing of the various suggestions i have seen sofar. Some of these have already been mentioned.

Suggestions for hardening your site or preventing hijacks

  • Put some always updated content on your site (counter, timestamp, random quote, newsfeed...)
  • Use the Base HREF [w3.org] tag
  • Rewrite the www-subdomain to non-www or the other way round - using a 301
  • Rewrite all "vanity domains" to the main domain - using a 301
  • "Hack your site": Make sure that it does not return content on URL's that should not return content (ie. always 404 on URL's that you would not use yourself). If it does, then fix it.
  • Use absolute URL's for your internal linking
  • Make all page requests that does not have a referrer 301 refresh to the same URL once per request (not easy)
  • Include an absolute link on every page to the same page (self-referencing)
  • Include an absolute link on every page to the main domain
  • Make sure you've got lots of direct links to your pages (ie. not redirect links)

Suggestions for people using 302's to avoid hijacking others

  • Change all 302 redirects to 301 redirects
  • Change all redirects to absolute links
  • Put your redirect script URL(s) in your robots.txt file, AND
  • Search Google for your script URL's and if you find them, ask to get them removed.

Suggestions for Google, MSN, and other SE's

  • Treat a 302 between two pages on the same domain (same host) like a 302 (keep the "source" URL)
  • Treat a 302 between two pages on different domains as a straight link
  • Treat a 302 between two pages on different domains as a 301 (keep the "target" URL)
  • If it is a redirect script (origin is a link on a page - click tracking), treat it as a link. If it is a server redirect (origin is a server header - load balancing, caching, etc.) treat it as a 302.
  • Treat a meta refresh as a comment tag with a link in it.

--
All of the above are suggestions, nothing more than that. You might want to try some of them or all of them, or you might find them all totally stupid.

[edited by: claus at 2:18 am (utc) on Mar. 26, 2005]

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 1:57 am on Mar 26, 2005 (gmt 0)

Some more suggestions for people using 302's to avoid hijacking others.

* Place the redirector script call inside another page that says in the robot meta tag <noindex,noarchive,nofollow>

DaveAtIFG

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 28741 posted 4:24 am on Mar 26, 2005 (gmt 0)

They would all return a status of 200, as duplicates. If you redirect all but one to a main domain that one will still serve a status of 200. So I assume that it would still be vulnerable to a redirect pointing at it, no?
If that one, single, consolidated URL/page also contains your preferred URL (absolute addressing), my experience is that it is impervious to PageJacks or redirects pointed at it.
kilonox

10+ Year Member



 
Msg#: 28741 posted 6:41 am on Mar 26, 2005 (gmt 0)

Adding dynamic addressing

Basically add any changing element to the http GET request of all pages on your site. Keep in mind that you can add elements such as your.domain.foo?var=dynamic1&var2=dynamic2 etc, and it doesnt internally affect your site (unless you iterate the variables sent via GET in a array and expect the array to always be a finite size etc). Just remember to put up 301 redirects on your old pages if you try this.

Some examples of dynamic elements you could add into a get variable:

  • session id (if you start a session for every user)
  • date (suggested in an earlier thread)
  • random numbers
  • md5 hash or hex encode the remote ip address
  • increment a counter
  • kilonox

    10+ Year Member



     
    Msg#: 28741 posted 7:04 am on Mar 26, 2005 (gmt 0)

    Reporting a hijacker via spam reports, abuse compiants

    Simply go wild reporting a hijacker site as a spammy site.

    Here are some methods for reporting them:

  • Google spam report - [google.com...]
  • Contact Google support - [google.com...] use the word "canonicalpage" in the subject.
  • Yahoo spam report - [add.yahoo.com...]
  • MSN abuse reporting - [support.msn.com...]
  • Report them to their Internet Provider and their Internet Providers upstream.

    Normally single reports dont get a lot of mileage, aim to send a report every-other-dayish when really trying to get a message across (this comes from experience on the ISP side). Also getting a friend or two to help with posting spam reports will get you great mileage as well.

  • geekay

    10+ Year Member



     
    Msg#: 28741 posted 9:14 am on Mar 26, 2005 (gmt 0)

    >>it's their job to give Google a web site that Google wants, unambiguously.

    How nice! I for one know that now. But I believe 99 % of the world's webmasters will never get that message. So they will continue to design their sites with just the VISITOR in mind. However, the internet is not only about making business; non-commercial sites are of importance too.

    "It's totally their own fault if they fall out of SERP's", I hear you saying. So Google could end up showing mainly professionally SE optimised money-making sites on the first half dozen result pages, as well as out-of-control 302 pirate copies of web pages. But Google users may become more and more dissatisfied and switch to Yahoo or MSN. An arrogant Google would no doubt eventually get what it deserves.

    GuinnessGuy

    10+ Year Member



     
    Msg#: 28741 posted 11:19 am on Mar 26, 2005 (gmt 0)

    Hi Claus,

    I've taken great notice of the need to do a 301 redirect from non-www to www and it was implemented on one of my domains yesterday. However, I have since checked out some MAJOR sites and they ALL seem to be doing 302's from non-www to www -- not 301's. These include, CN++, M$, Or*cle, S*n, Newswe*k, IB*, and Genital Motors. You get the picture. (I assume I can't use real company names but I hope I've been clear, nonetheless)

    Given this, and the knowledge that they have a lot higher IT budget than anyone posting here, do they know something that we don't? It would seem natural to me to do the 301 instead of the 302, but I'm not technically endowed. Why do they do this instead of the highly recommended 301 that I read about here?

    Any comments?

    GuinnessGuy

    zeus

    WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



     
    Msg#: 28741 posted 12:14 pm on Mar 26, 2005 (gmt 0)

    DaveAtIFG - google does have a problem, because googlebot creates pages from spidering 302 links and those show up in site:yourdoman.com search which is also not normal and not everyone uses subdomains, some have redirected 301 the mydomain to www.mydomain and uses full url links within the site, but still some are in troubles because of the 302 crated pages.

    You suggestion is good, but they dont protect every site.

    zeus

    WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



     
    Msg#: 28741 posted 6:12 pm on Mar 26, 2005 (gmt 0)

    GuinnessGuy - just because they have more cash to work dont make there method the right way, I also use 301, because as you said it sounds best and it is a permanet redirecting so its the right one.

    This 39 message thread spans 2 pages: 39 ( [1] 2 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved