Forum Moderators: phranque
here is how googlebot fetches the index.html on mysite
"GET / HTTP/1.0" 200 7987 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
so what Im thinking is if I rewrite "/" to /index.html with a 301 then it will fetch /index.html and get a 200 OK That protects my homepage from 302 redirects.
I'd like to hear your thoughts on this. Wether it would work and the best way to do it.
How does apache know that / means index.html and could this be overridden with a mod rewrite to insert a 301 into googlebot on it's way to the target?
The behaviour we're seeing from 302 redirects is inherent in the HTTP definition of a 302 redirect. The problem is not that Google is doing anything wrong, the problem is that a 302 means to take the content from the new URL but keep the old URL. This is what they are doing, and this is what leads to the hijacking problem.
To reiterate a comment I posted in a more recent thread: "Google and Yahoo are now working to perfect ways to determine when to treat a 302 like a 302-Moved Temporarily redirect, and when to treat it like an exit-tracker. It's far from a simple problem, so it's going to take some time."
There is one technique that may work temporarily, and that is to ask Google to remove your hijacked page from the index or rename it temporarily. But in that case, the cure is almost as bad as the disease -- with attendent loss of incoming links and PR.
Other than that, I'm afraid we just have to wait. :(
Jim
First identify the problem. - forget allinurl: doesn't mean anything.
use site:w*w.yoursite.com
All url's listed should be your own, any foreign url's in this search are being associated with your domain.
Typically they will have your title and description with a URL from another domain. Or it could be just a url with no title or description. Look at the cache (likely a picture of your page).
Typically it will be a dynamic url (? in it)
or an appended url (w*w.yoursite.othersite.com)
The fix is to place a disallow in the robots.txt for the affected page (your page). Then use googles url removal tool to remove the hijacking link (which points at your page). Then remove the disallow or else the page won't get crawled again. Then go to the website that owns the offending link and ask them to remove it (ask nicely it's not always on purpose).
Just keep your "site:" results clean until google fixes it. Nothing else we can do.
same thing with MSN (but I know nothing about removing links in MSN)
[webmasterworld.com...]