can mod rewrite fix 302 googlebug? - need opinions - Apache Web Server forum at WebmasterWorld - WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

can mod rewrite fix 302 googlebug? - need opinions

Here is the problem and a possible solution

Reid

7:02 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

When a site links to your homepage with a 302 redirect googlebot happily fetches your homepage and sometimes associates the two domains. 302 (for future reference goto this page to get that page). This has become a popular technique to gain rankings in google and MSN. There are several ways to do it (some innocent and some sinister). The homepage is most vulnerable to this as it is picked up and submitted all over. The homepage is also targeted by auto generated scripts exploiting this bug.
After much discussion about this in the google forum we decided that the only way to combat this (since google hasn't in over a year) is to feed googlebot a 301 redirect when it comes to fetch the target.
301 (moved permanently - for future reference come to this url for this page)

here is how googlebot fetches the index.html on mysite
"GET / HTTP/1.0" 200 7987 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

so what Im thinking is if I rewrite "/" to /index.html with a 301 then it will fetch /index.html and get a 200 OK That protects my homepage from 302 redirects.

I'd like to hear your thoughts on this. Wether it would work and the best way to do it.
How does apache know that / means index.html and could this be overridden with a mod rewrite to insert a 301 into googlebot on it's way to the target?

Reid

7:40 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

One more possible solution.
Is "-" a referer string that could be queried?

jdMorgan

8:18 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

In one of the earlier threads about this problem, I suggested the 301 technique, and it was tested and failed to make any difference.

The behaviour we're seeing from 302 redirects is inherent in the HTTP definition of a 302 redirect. The problem is not that Google is doing anything wrong, the problem is that a 302 means to take the content from the new URL but keep the old URL. This is what they are doing, and this is what leads to the hijacking problem.

To reiterate a comment I posted in a more recent thread: "Google and Yahoo are now working to perfect ways to determine when to treat a 302 like a 302-Moved Temporarily redirect, and when to treat it like an exit-tracker. It's far from a simple problem, so it's going to take some time."

There is one technique that may work temporarily, and that is to ask Google to remove your hijacked page from the index or rename it temporarily. But in that case, the cure is almost as bad as the disease -- with attendent loss of incoming links and PR.

Other than that, I'm afraid we just have to wait. :(

Jim

Reid

8:55 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

well there you have it.
For those who have been hit by this:
After reading about this and talking to lots of people the only thing that can be done for now

First identify the problem. - forget allinurl: doesn't mean anything.
use site:w*w.yoursite.com
All url's listed should be your own, any foreign url's in this search are being associated with your domain.
Typically they will have your title and description with a URL from another domain. Or it could be just a url with no title or description. Look at the cache (likely a picture of your page).
Typically it will be a dynamic url (? in it)
or an appended url (w*w.yoursite.othersite.com)

The fix is to place a disallow in the robots.txt for the affected page (your page). Then use googles url removal tool to remove the hijacking link (which points at your page). Then remove the disallow or else the page won't get crawled again. Then go to the website that owns the offending link and ask them to remove it (ask nicely it's not always on purpose).
Just keep your "site:" results clean until google fixes it. Nothing else we can do.

same thing with MSN (but I know nothing about removing links in MSN)

Reid

9:11 pm on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thanks again Jim - I forgot to reply to you on another thread - actually I just got a reply about it. Well it's a different matter but related so heres the thread (with my reply).

[webmasterworld.com...]