homepage Welcome to WebmasterWorld Guest from 54.167.11.16
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 467 message thread spans 16 pages: < < 467 ( 1 2 [3] 4 5 6 7 8 9 10 11 ... 16 > >     
Google's 302 Redirect Problem
ciml




msg:732619
 4:17 pm on Mar 25, 2005 (gmt 0)

(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]

 

claus




msg:732679
 11:34 am on Apr 4, 2005 (gmt 0)

Joeduck, you should ask for removal of these links from Google.

You will probably have to re-enter them in your robots.txt (*), but it will be easier if they have a generic componenet, as; when you request removal from Google with the URL-console, there's a limit to the size of your robots.txt file.

Also, you might have to enter them one-by-one in the url-console, which will take some time.

---
(*) That is, if you can't make them return html with the meta tag <meta value="robots" content="noindex">

joeduck




msg:732680
 5:00 pm on Apr 4, 2005 (gmt 0)

Thanks Claus - this makes sense though I'm worried these are a symptom rather than the problem itself? The links all appear to be from our cgi and cf directories that are referenced to send people to our major affiliates. Hopefully can use wildcards in the Google process but I have not checked yet. We'd excluded these directories when the problem started, now we allow them.

vincentg




msg:732681
 5:31 pm on Apr 4, 2005 (gmt 0)

Reid

I would be interested in seeing it

Email me the info so I can take a look at it.
I think if I can look at a few cases I maybe able to see if there is a way to find out if there is anything we can do to either stop it or identify it.

Vin

claus




msg:732682
 5:34 pm on Apr 4, 2005 (gmt 0)

Yeah, they are symptoms, but the problem is not on your end, it's on Google's. In this case they see an URL (your redirect URL) and assume that this equals a document, even though it does not.

Once an URL is indexed it will not be removed by putting it in "robots.txt" - this will only keep the spider from revisiting the URL. In order to get it removed you must specifically request removal.

If you've got your redirect script in some folder, like, say:

example.com/redir/redir.php?id=1234567890

... then you can just put "/redir/" (or "/redir/redir.php") in your "robots.txt", you don't need to put in every single redirect.

(i have removed such URLs from my own sites a few times, so i know the process)

joeduck




msg:732683
 5:45 pm on Apr 4, 2005 (gmt 0)

Excellent and thanks for any advice. Claus - do you think these bogus "link pages" replace other legitimate pages?

g1smd




msg:732684
 7:04 pm on Apr 4, 2005 (gmt 0)

They can do, I suspect, if Google sees that they are duplicates of something else (which may well happen with the screwy way that they treat some sorts of redirects these days).

claus




msg:732685
 7:47 pm on Apr 4, 2005 (gmt 0)

I'm with g1smd here: If SE's are allowed to follow those links you might be "hijacking" some of the target pages before you know it - i've got all mine robots.txt'ed for the same reason

(and the additional reason being that i like to have control over what is indexed - i especially don't want "internal" things or "errors" to be indexed. All i want in the index is my real pages and nothing more - one URI per page. For that reason i do remove all kinds of different stuff that should not be there whenever i see it. I like to keep things clean, as this helps me avoid "surprises" of many different kinds.)

Reid




msg:732686
 12:30 am on Apr 5, 2005 (gmt 0)

Claus this could be a real problem for people running adsense because your not allowed to exclude googlebot from any part of your site.
If you run a robotstxt file you have to allow google full access in the first line.

g1smd




msg:732687
 12:49 am on Apr 5, 2005 (gmt 0)

What about the rel="nofollow" (or was it rel="noindex") attribute that Google "invented" just a few months ago...

Can you use that? Would it work?

joeduck




msg:732688
 1:10 am on Apr 5, 2005 (gmt 0)

Reid why are you saying that? We've had several excluded directories and have run adsense for some time. To Google's credit (but our frustration) our adsense reps have been nice talking about this but unable to help with our problems because they are very separated from search side of things.

RE: Nofollow - we've been discussing that and I favor placing them at most of our outbound links.

sunzon




msg:732689
 2:57 am on Apr 5, 2005 (gmt 0)

Non-riot thinking (points we all agree on?):
1. Search Engines cannot function properly if one webpage can influence the serps of another webpage.
2. Protocols must exist, and redirects serve a valid purpose.
3. It is commendable that Search Engines want to avoid duplicate content.

As I read all the discussion, everyone is barking at points 1 and 2, because of point 3.

If it wasn't for point 3 (duplicate content), who cares about redirects from other webpages, as long as your own page is still in serps on it's own merit.

If bark we must, then I suggest we bark at Google for their method of eliminating duplicate content (define fair/unfair or show certain duplicates).
Some suggest the current G choice is based on PRank, arguing that the redirect badguys have a higher Prank and focus on webpages with lower Prank so that they get chosen in the duplicate content dilemma, and not the other guy. Maybe so.
The problem of criteria for eliminating duplicate content has always existed, maybe it is aggravated by the inventiveness of badguys using redirects, maybe protocols need to be changed (point2) to deal with that, but duplicate content is close to "similar" content, and before you know it we are talking about how serps ranking choices are made.
A level playing field is impossible....except maybe for napoleon.
We can only watch as Google copes.

my 2cts

Reid




msg:732690
 7:01 am on Apr 5, 2005 (gmt 0)

Reid why are you saying that? We've had several excluded directories and have run adsense for some time. To Google's credit (but our frustration) our adsense reps have been nice talking about this but unable to help with our problems because they are very separated from search side of things.

I should not have said that. last month I read on adsense guidelines 'do not use a robots.txt file. and then buried in the optimization tips there is a line


If you have a robots.txt file, remove the file or add the following two lines to the top of the file:

User-agent: Mediapartners-Google*
Disallow:

This change will allow our bot to crawl the content of your site, so that we may provide you with the most relevant Google ads.

I wonder if media-partners-google* includes 'googlebot'?

claus




msg:732691
 8:53 am on Apr 5, 2005 (gmt 0)

Claus this could be a real problem for people running adsense because your not allowed to exclude googlebot from any part of your site.
If you run a robotstxt file you have to allow google full access in the first line.

I have started a new thread on this topic in the AdSense forum, so let's continue with regard to that issue over there:

How much should Google be allowed to spider WRT AdSense? [webmasterworld.com]

Reid




msg:732692
 9:53 am on Apr 5, 2005 (gmt 0)

If it wasn't for point 3 (duplicate content), who cares about redirects from other webpages, as long as your own page is still in serps on it's own merit.

I think the problem is much deeper than that.

if your home page is the temporary location of the hijacking page then the hijacking page takes the home page's place in the SERP's

This could also happen aside from duplicate content issues. ie googlebot see's that it has indexed the page AND the temporary location of the page. Aside from duplicate content it say's 'I have indexed the same page twice' so it removes the temp page (your home page) and leaves the hijack page which points at the temp.

larryhatch




msg:732693
 10:19 am on Apr 5, 2005 (gmt 0)

Having followed this thread from the start, I took action.
I found 302 redirects to 3 of my pages, low traffic 3rd level stuff,
but the jackers rated above me for test phrases of mine.

Here's what I did: I temporarily renamed files at the host forcing 404 errors.
I used the Google emergency removal tool and selected "remove all".

I changed the filenames back quickly before they might get spidered by anyone.
The next day, I checked on my requests, and all three reqs were "DENIED"!

I have no idea why denied, but it seems to have done some good anyhow!
All 3 pages fell to the bottom dead end of the listings when
I ask for site:mysite.net Dead last, all three, plus a 4rth one I forgot about.
AND, all four jacked URLs are now marked 'supplemental result'.
None of my pages are so marked.
None of the jacks rate above me for my original
test snippets any more.

Comments anyone? Do I have this under control at least? - Larry

Reid




msg:732694
 10:43 am on Apr 5, 2005 (gmt 0)

larryhatch - I had the same thing. When i removed one of these URL's I got a 'successful' but the url is not gone, it only fell into the 'ommitted pages' in site:search. didn't change the sandbox situation though but that may be another issue.

On another site I had a similar url from the same place appearing in site: this was before the URL removal tool became known to me.
I contacted the offending directory and asked them to remove my site. They removed it without a response. My site then came out of the sandbox and has been rising rapidly in google traffic. On this one I can't remove the url from google though because the go.cgi file is not returning a header for that perticular id#.
On the previous site they are refusing to remove the site from their directory, they must have been hit hard by the 302 scare. Thay have banned my ip too.

larryhatch




msg:732695
 10:51 am on Apr 5, 2005 (gmt 0)

Hi Reid:

On a separate thread, there is discussion of Google looking into
files that they ordinarily dont .. java stuff and the like.
I wonder if there's a chance they are looking for dodgy 302 redirects.
I can dream, can't I? - Larry

Reid




msg:732696
 12:48 pm on Apr 5, 2005 (gmt 0)

Personally I think this is the reason behind this whole strange activity going on at the plex since mid March. Eveyone is reporting rollbacks and 'old database' rolling rolling databases.

I seriously think they are running tests and doing rollbacks changing algos trying to fix the 302 thing.

I bet they know exactly what the problem is (but they'll never tell) but it is not so easy to get rid of.

larryhatch




msg:732697
 12:45 pm on Apr 9, 2005 (gmt 0)

Hi Reid:

I've seen things like this a thousand times in the semiconductor
industry. Somebody screws up royally due to an oversight.
They will never admit it, but quietly, they make sure the problem gets fixed.

If that's the case, I don't need mea-culpas .. just some reassurance
that the problem is indeed being addressed.

I can't help but feel that my "denied request" did have some beneficial effects.

One odd thing. Unlike so many pond scum scrapers, the particular one
I dealt with actually had good taste! He only scraped the very best
sites in my arcane field (UFOs) avoiding all the loonies and
amateurish junk. If he put all his black-hat efforts into some honest
research, he could actually be a positive influence in the field!

As it is, he has the most authoritative people in the field ready
to cut his head off; those who can figure out what he's doing that is.

Very ironic. - Larry


zeus




msg:732698
 12:56 pm on Apr 9, 2005 (gmt 0)

We also have not seen a real update sence alegra.

We also can see its a big problem because we have NEVER seen so many pages in the serps as suplemental results

[edited by: zeus at 1:08 pm (utc) on April 9, 2005]

zgb999




msg:732699
 1:06 pm on Apr 9, 2005 (gmt 0)

Did anybody remove a hijacked page with the Google emergency removal tool more than 90 days ago?

After those threads
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]

I wonder whether Google will visit the pages again after 90 days (or maybe more) and the problem will start all over again.

g1smd




msg:732700
 7:07 pm on Apr 10, 2005 (gmt 0)


Are all the non-dmoz.org URLs, found in a site:www.dmoz.org search, an example of the problem that Google says doesn't exist?

claus




msg:732701
 8:20 pm on Apr 10, 2005 (gmt 0)

g1smd : Exactly!

All these seem to be flagged as "Supplemental" which, i think, means that they will probably not show up in a regular search. So, DMOZ might not have problems becaue of these - it's when they turn up in regular searches that they can cause problems.

larryhatch




msg:732702
 11:13 pm on Apr 10, 2005 (gmt 0)

Keeeeripes! Thanks for the tip.

I Googled up site:www.dmoz.org .. what a menagerie!

<snip>

This is scraper central.

If nothing else comes to the attention of G, this should. - Larry

[edited by: ciml at 9:39 am (utc) on April 11, 2005]
[edit reason] No specifics please. [/edit]

larryhatch




msg:732703
 11:15 pm on Apr 10, 2005 (gmt 0)

Ooops! I should have read Claus's post first.
Yup, all 'supplemental'.
That brings up another question. Can I assume that
supplemental results are penalized in some way? -Larry

theBear




msg:732704
 11:24 pm on Apr 10, 2005 (gmt 0)

larry,

The supplemental results rarely show in a search and the flip side is that the non supplemental duplicated page shows a lot further down in the serps.

g1smd




msg:732705
 11:34 pm on Apr 10, 2005 (gmt 0)


A page might not be a Supplemental Result for all search queries that it is returned for.

claus




msg:732706
 3:10 pm on Apr 11, 2005 (gmt 0)

I'm not 100% clear on the effect of a "supplemental" stamp, i have to admit that. Thinking about it, i do see these in results for regular queries sometimes (which is probably also what they're there for)

g1smd




msg:732707
 5:22 pm on Apr 11, 2005 (gmt 0)

The snippet for a Supplemental Result is never updated. It comes from an ancient archive deep in the Googleplex. It can easily represent content last seen on the page 3 or 4 years ago.

For a different search query the same page might be returned in the results, but might be a normal result and with a more up to date snippet.

At no time is there a rule to say that the words in the snippet can still be found in the cached page or on the real live site.

larryhatch




msg:732708
 8:35 am on Apr 18, 2005 (gmt 0)

Something surprising. [Thanks to the fellow who stickied me this tip]

site:www.dmoz.org brings up NOTHING AT ALL any more.
site:dmoz.org (no www.) yields 11.2 million pages, all real dmoz URLs
as far as I looked (several pages worth) and not a scraper in sight!

Did this thread embarrass somebody, or is it just coincidence?
I don't understand why the www should make any difference. -Larry

theBear




msg:732709
 1:46 pm on Apr 18, 2005 (gmt 0)

larry,

I can confirm that the "jacker" urls within a site view are no longer showing.

I had a sticky from a fellow member who I was working with, he went looking for the 302's I stickyed to him that were showing up as being part of his site.

I also confirmed that the leaches attached to one of our sites also no longer show up in a site: search.

And a certain Drudge no longer has any attached to his site. In fact I looked at 15 sites that I knew about having leaches and they were all gone.

Now is the problem fixed?

I don't know it could just be hidden

This 467 message thread spans 16 pages: < < 467 ( 1 2 [3] 4 5 6 7 8 9 10 11 ... 16 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved