302's - Page Jacking Revisited - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

302's - Page Jacking Revisited

This problem isn't going away

ighandy

10:05 pm on Oct 18, 2004 (gmt 0)

10+ Year Member

This is my first post after lurking - so hi to all.

A recent thread (http://www.webmasterworld.com/forum3/25638.htm)

Highlighted the problem of page jacking. We've suffered a lot with this - hundreds of our pages appearing in Google with our titles, descriptions and the 'evil' sites URL.

Tried writing to Google which was a waste, tried writing to the offenders, both nicely and then in a more legal and threatening tone. Both resulted in silence.

We are really annoyed at this latest threat to our business. We have worked harder than we thought we could to create a legitimate, valuable and popular network of sites. We have struggled financially in the early couple of years and have only recently enjoyed a healthy income from the business. And we earned it.

Now, a combination of Googles irrational behavior and page jacking by cheap link sites has severly dented our finances and traffic.

What we see here and on other forums is that we are one among many suffering the same. Some have suffered much more than us. It seems like more and more of our time and energy has now to be devoted to 'guarding the shop'.

We do have a couple of ideas, we don't know if they'll stand up to technical scrutiny, but here goes:

1: Is there any way of blocking the referer? As in .htaccess / IP Deny. Can anyone tell us why this would or would not work? Although we do get the traffic from the sites in question, it's traffic we can do without.

2: Can we redirect the redirect? So, if 'evil.com' redirects to oursite.com/page1.html and then gets credit for 'owning' oursite.com/page1.html can we then put a redirect from oursite.com/page1.html to oursite.com/page2.html and thus we get credit for owning our own page. In affect, we could 'shift' along all our pages, redirecting old page addresses to the new page addresses.

Hope this is clear enough and that someone out there can help.

jdMorgan

11:21 pm on Oct 18, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

ighandy,

Welcome to WebmasterWorld!

In the previous thread [webmasterworld.com], I proposed redirecting the redirect (msg#34). One of that thread's participants reported trying it (msg#70), and that it did not work.

If the search engine spider provides a referrer, you could block the spider when it was referred by the offending site. But practically no SE spiders provide a referrer when spidering.

I'm afraid it's up to Google and other leading search engines to incorporate a "quality filter" to detect these sites. If a large percentage of outgoing links on a site are implemented with 302 redirects or meta-refreshes, the links should be ignored. This will require legitimate directory sites to use more advanced exit-click tracking, and they might like to use Google's own exit-click-tracking method [webmasterworld.com] as an example of a "non-destructive" tracking.

I wouldn't call Google's behaviour irrational. The 302 redirect needs to be supported as defined by the HTTP protocol [w3.org], and meta-refreshes should be supported so that sites hosted on limited-capability hosting accounts (such as free hosting) have a method to implement a pseudo-redirect. But like many techniques, these are now being abused, so the solution will probably have to be algorithmic in order to preserve functionality of "good" sites while discouraging abuse by "bad" sites.

Implementing a filter is also likely to break link PR transfer from thousands of sites that use PHP's "location" method without specifying a "301 status" to go with it. By default, the "location" method produces a 302.

I don't have anything more to add; I just wanted to point out that these subjects were previously covered in the original thread.

Jim

ighandy

1:58 am on Oct 19, 2004 (gmt 0)

10+ Year Member

Thanks jdMorgan for your advice,

How about I take the whole site, change every url, say by adding an 'a' to each. Redirect one page only (index.html) for googlebots benefit. Each site redirecting to us then would receive an error page.

The Google consequences of doing such a thing are quite frightening but really we are so sick of this problem we are prepared to consider this.

Any thoughts?

jdMorgan

2:54 am on Oct 19, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Because this can be construed as presenting a copy of pages under a false domain, others have pursued this problem using the DMCA procedure at Google, and other legal copyright processes. You might want to look into that.

As I said, I doubt there is any technical solution, short of the search engines treating this as a "quality" problem, and applying filters to prevent it. You *could* do special redirects for Googlebot, but the problem is that you have no way to tell if it is Googlebot spidering your site from a "good" link -- one that is not based upon a 302 redirect or a meta-refresh -- or if it is Googlebot spidering through the redirect-link on the "bad" site. If Googlebot provided an HTTP-Referer header it would be possible, but like almost all other SE spiders, it does not.

Jim

idoc

5:01 am on Oct 19, 2004 (gmt 0)

10+ Year Member

"we do get the traffic from the sites in question"

Exactly what I have seen and I have been systematically denying these by i.p. address. Most turn out to be from servers on shared web hosts, cheap colocation facilities and home dsl lines. Though some would doubt the worth of blocking these... the sites try their best to come back in under other i.p.'s so it must adversely affect them someway if not only because they need to confirm your site is up before they redirect. I am beginning to believe that at least in *some* cases there are two domains involved in a hijack... i.e. domain 1 redirects to domain 2 and domain 2 meta refreshes to your site. I do know that since blocking the trash bots and other scrapers... my google serps seem to be slowly returning.

ighandy

5:30 am on Oct 19, 2004 (gmt 0)

10+ Year Member

Thanks iDoc

Thats given me something to think about. Do you use htaccess to deny them?

Also, when you say:

I do know that since blocking the trash bots and other scrapers... my google serps seem to be slowly returning.

How exactly would that be done, this is all fairly new to me as I innocently went along for the last few years never suspecting that I would be a target. Now I'm having to learn, - fast!

Thanks again...

charlier

6:59 am on Oct 19, 2004 (gmt 0)

10+ Year Member

Could you explain this a bit more. When you say the result in Google has the 'evil' URL where do you go when you click on it. IE do you go to their site and get redirected properly to to your own or do you see a page on their site first? If you are going to your own how are they being benefited and you hurt by this? Is it just a case of branding?

ighandy

6:22 pm on Oct 19, 2004 (gmt 0)

10+ Year Member

No I don't see a page on their site. The redirect is near instantaneous.
They benefit by the fact that Google credits my page as being theirs, and then penalizes my page as being duplicate content. Result is that my SERPS go tumbling down, my pages start disappearing, my traffic falls, my revenue from sites goes way way down, the bills don't get paid and ultimately I go to live under a bridge.

Seriously though, it's bad news, not just for me but for many site owners.

I've been looking around and what strikes me is that the solution seems fairly simple for Google to implement, the thing that bugs me is why hasn't it been fixed if it is so simple and so obviously unfair?

I don't blame the offending sites. Whilst I wouldn't consider what they are doing as a legitimate way of doing business I am grown up enough to know that if you leave the door open then someone will come in.

I'm starting another round of writing to ISP's, hosts and site owners...but in the meantime, come on Google, live up to your mission in lfe and fix this unfairness!

Anyone know of a Google address I can write to that might be not just a blackhole?
Cheers
ighandy

webdude

6:38 pm on Oct 19, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

ighandy,

I was pretty heavily involved in that other thread. I had two sites that had this problem with two directories.

The way I solved the problem was by contacting the webmaster of the two offending directories via email and telling them that if my link was not removed, I was going to report the infraction to google, yahoo, dmoz and every other search engine I could find them on. I am not sure if you tried this yet, but some of these directories also may have click-through scams and any threat to their income may spur them to action. I'm sure they don't want to be dropped from any engine or directory or even take the chance that they will be.

Both the directories responded by removing my link. The next problem I had was that in the SERPs, the link that no longer existed was now a 301 redirect to their main home pages. You have to sit back after that and let google figure it out. Eventually, their links disappeared and mine came back.

It was hard though, knowing all my traffic was going to their homepages. It took about a month for this to straighten out.

Just my 2 cents

ighandy

7:18 pm on Oct 19, 2004 (gmt 0)

10+ Year Member

Thanks webdude,

I'm onto that now. I've sent you a sticky btw.

ighandy

eyezshine

8:35 pm on Oct 19, 2004 (gmt 0)

10+ Year Member

Webdude,

I've got about 20 different sites hijacking my pages and have emailed them nicely and then threatened them and then I even showed them how to block the bot's from indexing those redirect urls and about half of the emails bounced back to me and most didn't even respond.

It's hopeless?

Come on Google! Let's fix this?

webdude

8:54 pm on Oct 19, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Do an NS lookup to get the IP address, then I would go through the ARIN database and find who owns that IP. I think you are going to find with a lot of these sites that you are not going to get a legit email from their site. You need to do some investigation to get to the right people.

idoc

3:26 am on Oct 20, 2004 (gmt 0)

10+ Year Member

ighandy,

I use apache 2.x on linux and I deny the i.p. in apache config. To this point, that has been o.k... as the list grows, I might take out some of the larger blocks at the firewall level soon so that the server doesn't have to deal with those requests at all. htacccess should work also, but the list will undoubtedly get larger daily when you really watch your logs. You can also use a spider trap that modifies .htaccess and then manually update your apache config or firewall to keep the .htaccess cleaner. If you have a large site and really watch the logs, you will get a good feel for these guys. I have to say the longer this goes and the more I learn, the less of a bot bug I believe this is and the more of an exploit I see it as. Still, bug or not the bots are going to need to deal with this soon.

Marcello

12:13 pm on Oct 20, 2004 (gmt 0)

10+ Year Member

"It's not going away"....No, it's even getting worse

After having my Index-Page hijacked and my site slowly disappearing from the SERPS due to duplicate content after a 302 redirect using the Meta-Refresh tag in the first tread.

Further after the tests done by "DaveAtIFG" and the intervention of "GoogleGuy", I also thought that Google had finally resolved this problem.....

I was at that moment so happy that my Index-Page came back in the SERPS and that since last weeks backlinks are showing again.

BUT her we go again....
I have again 4 other URL's showing the exact same content of my Index-Page

The New Hijacking pages are in the format of:
www.foo1.com/links.php?action=link_id=111
www.foo2.com/redir.asp?link=222
www.foo3.info/get_url.asp?SiteID=333
www.foo4.com/links/click.php?id=444

All of the above are directory-style websites that are using the:
<meta http-equiv="refresh" content="0; url=http://www.widget.com/">
Meta-Tag to redirect (link) to my site.

I do not think that the hijacking is done intentionally by those sites, as their whole linking structure is set up with 302's and Meta-Refreshes.

I have checked many outgoing links (302's) from those directories and the strange thing is that the Google-Bot only registers about 5 to 8 percent of those links as a 302 and not the others.

So .... what is the extra factor that makes the Bot decide to cache some pages under an other URL and some other not?

I start to presume that pages that are a lot updated are more likely to be hijacked by the Google-Bot, something to do with the "last modified date" maybe.

The new hijacking pages have not yet replaced my Index-Page in the SERPS but I can feel that, after regaining position and momentum again, I am now sliding one more time as surely again the automated algo of Google has applied a duplicate content penality for having 5 identical pages in the database.

This is really becoming hopeless

webdude

1:28 pm on Oct 20, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Marcello,

Have you done header checks on the offending links in the SERPs? I am finding that this is either an Apache problem or a PHP problem or a combination of both.

I have been sticky'd a lot of these types of directories and I have found a common theme on all of them...

PHP
Apache

Could this be a canned directory software problem?

Plumsauce supposedly was getting together information on these types of sites, but I have yet to hear any conclusions from him.

Anyway, just some thoughts. The only way I finally got the issue resolved was by having the directories delete my link.

One question that needs to be asked though, did you submit to these directories? Or did they just up and grab your pages for a redirect?

I think this is very important.

Just my 2 cents

Marcello

2:08 pm on Oct 20, 2004 (gmt 0)

10+ Year Member

Hey Webdude

The response in the header-check is 302 redirect
Most of them are from Apache-Server and some of them on Windows server.

3 are from directories created in PHP, 1 is from a directory created in perl.

For me reason you are seeing a lot of PHP is because many of those new commercial "directory softwares" are written in PHP and only a few in PERL

Same goes for Apache or Windows, as more Apache servers are online than Windows servers.

BUT ... I do not think this has anything to do with all of the above, its a simple mis-use (or call it a non-authorized) use of a 302-redirect.

What many of those directories are doing is using a redirect instead of a LINK, and until now Google is following and accepting this 302 redirect as it was intentionaly created for .... a redirect from one of YOUR own pages to another of YOUR own pages.

When, in my opinion, the 302 and the Meta-Refresh was created, no-one ever thought that it would be used instead of a normal link.

Its because of this NEW STYLE of use of the 302 and Meta-Refresh that everything is going wrong.

There was a time that search-engines did not index pages that contained a Meta-Refresh with a refreshing time of less than 15 or 30 seconds, but for one or other reason this seems to have changed.

And I did not submit my site to those directories, but our widget.com site is very important on the topic of widgets, so that any directory or portal that contains a widget-category will add our site there themselves.

ighandy

7:27 pm on Oct 20, 2004 (gmt 0)

10+ Year Member

Hi all

I'm just taking a break from composing letters and research to wonder...

If this is such a big subject that is affecting so many people, not just in terms of traffic but in terms of revenue and paying real world bills, and, if Google could fix this relatively easily and are not as yet doing so....

Then surely this is something the media would like to get their hands on.

ighandy

idoc

1:28 am on Oct 21, 2004 (gmt 0)

10+ Year Member

"something the media would like to get their hands on"

I think this is maybe the main reason the "bots that be" won't acknowledge the situation even exists.

"There was a time that search-engines did not index pages that contained a Meta-Refresh"

I don't see the worth in indexing a placeholder to an actual site. I guess the folks that write the algorithms will have to decide for themselves.

Marcello

6:32 am on Oct 22, 2004 (gmt 0)

10+ Year Member

Something is moving....

Since this morning, the 4 redirected and duplicate pages (see msg #14) have disappeared from the Google SERPS.

I surely hope this problem is now finaly fixed.
Let's wait and see.

ighandy

4:57 pm on Oct 22, 2004 (gmt 0)

10+ Year Member

Great news Marcello,

It gives hope to the rest of us.
I've all but given up finding a way of blocking these referers - a way that doesn't shoot myself in the foot at the same time.

I'm concentrating on building the evidence to present to the appropriate people and will report back.

Here's hoping....

webdude

7:18 pm on Oct 22, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Something is moving....
Since this morning, the 4 redirected and duplicate pages (see msg #14) have disappeared from the Google SERPS.
I surely hope this problem is now finaly fixed.
Let's wait and see.

Good luck! That's pretty much how the fix started with me.