Forum Moderators: open
A recent thread (http://www.webmasterworld.com/forum3/25638.htm)
Highlighted the problem of page jacking. We've suffered a lot with this - hundreds of our pages appearing in Google with our titles, descriptions and the 'evil' sites URL.
Tried writing to Google which was a waste, tried writing to the offenders, both nicely and then in a more legal and threatening tone. Both resulted in silence.
We are really annoyed at this latest threat to our business. We have worked harder than we thought we could to create a legitimate, valuable and popular network of sites. We have struggled financially in the early couple of years and have only recently enjoyed a healthy income from the business. And we earned it.
Now, a combination of Googles irrational behavior and page jacking by cheap link sites has severly dented our finances and traffic.
What we see here and on other forums is that we are one among many suffering the same. Some have suffered much more than us. It seems like more and more of our time and energy has now to be devoted to 'guarding the shop'.
We do have a couple of ideas, we don't know if they'll stand up to technical scrutiny, but here goes:
1: Is there any way of blocking the referer? As in .htaccess / IP Deny. Can anyone tell us why this would or would not work? Although we do get the traffic from the sites in question, it's traffic we can do without.
2: Can we redirect the redirect? So, if 'evil.com' redirects to oursite.com/page1.html and then gets credit for 'owning' oursite.com/page1.html can we then put a redirect from oursite.com/page1.html to oursite.com/page2.html and thus we get credit for owning our own page. In affect, we could 'shift' along all our pages, redirecting old page addresses to the new page addresses.
Hope this is clear enough and that someone out there can help.
Welcome to WebmasterWorld!
In the previous thread [webmasterworld.com], I proposed redirecting the redirect (msg#34). One of that thread's participants reported trying it (msg#70), and that it did not work.
If the search engine spider provides a referrer, you could block the spider when it was referred by the offending site. But practically no SE spiders provide a referrer when spidering.
I'm afraid it's up to Google and other leading search engines to incorporate a "quality filter" to detect these sites. If a large percentage of outgoing links on a site are implemented with 302 redirects or meta-refreshes, the links should be ignored. This will require legitimate directory sites to use more advanced exit-click tracking, and they might like to use Google's own exit-click-tracking method [webmasterworld.com] as an example of a "non-destructive" tracking.
I wouldn't call Google's behaviour irrational. The 302 redirect needs to be supported as defined by the HTTP protocol [w3.org], and meta-refreshes should be supported so that sites hosted on limited-capability hosting accounts (such as free hosting) have a method to implement a pseudo-redirect. But like many techniques, these are now being abused, so the solution will probably have to be algorithmic in order to preserve functionality of "good" sites while discouraging abuse by "bad" sites.
Implementing a filter is also likely to break link PR transfer from thousands of sites that use PHP's "location" method without specifying a "301 status" to go with it. By default, the "location" method produces a 302.
I don't have anything more to add; I just wanted to point out that these subjects were previously covered in the original thread.
Jim
How about I take the whole site, change every url, say by adding an 'a' to each. Redirect one page only (index.html) for googlebots benefit. Each site redirecting to us then would receive an error page.
The Google consequences of doing such a thing are quite frightening but really we are so sick of this problem we are prepared to consider this.
Any thoughts?
As I said, I doubt there is any technical solution, short of the search engines treating this as a "quality" problem, and applying filters to prevent it. You *could* do special redirects for Googlebot, but the problem is that you have no way to tell if it is Googlebot spidering your site from a "good" link -- one that is not based upon a 302 redirect or a meta-refresh -- or if it is Googlebot spidering through the redirect-link on the "bad" site. If Googlebot provided an HTTP-Referer header it would be possible, but like almost all other SE spiders, it does not.
Jim
Exactly what I have seen and I have been systematically denying these by i.p. address. Most turn out to be from servers on shared web hosts, cheap colocation facilities and home dsl lines. Though some would doubt the worth of blocking these... the sites try their best to come back in under other i.p.'s so it must adversely affect them someway if not only because they need to confirm your site is up before they redirect. I am beginning to believe that at least in *some* cases there are two domains involved in a hijack... i.e. domain 1 redirects to domain 2 and domain 2 meta refreshes to your site. I do know that since blocking the trash bots and other scrapers... my google serps seem to be slowly returning.
Thats given me something to think about. Do you use htaccess to deny them?
Also, when you say:
I do know that since blocking the trash bots and other scrapers... my google serps seem to be slowly returning.
How exactly would that be done, this is all fairly new to me as I innocently went along for the last few years never suspecting that I would be a target. Now I'm having to learn, - fast!
Thanks again...
Seriously though, it's bad news, not just for me but for many site owners.
I've been looking around and what strikes me is that the solution seems fairly simple for Google to implement, the thing that bugs me is why hasn't it been fixed if it is so simple and so obviously unfair?
I don't blame the offending sites. Whilst I wouldn't consider what they are doing as a legitimate way of doing business I am grown up enough to know that if you leave the door open then someone will come in.
I'm starting another round of writing to ISP's, hosts and site owners...but in the meantime, come on Google, live up to your mission in lfe and fix this unfairness!
Anyone know of a Google address I can write to that might be not just a blackhole?
Cheers
ighandy
I was pretty heavily involved in that other thread. I had two sites that had this problem with two directories.
The way I solved the problem was by contacting the webmaster of the two offending directories via email and telling them that if my link was not removed, I was going to report the infraction to google, yahoo, dmoz and every other search engine I could find them on. I am not sure if you tried this yet, but some of these directories also may have click-through scams and any threat to their income may spur them to action. I'm sure they don't want to be dropped from any engine or directory or even take the chance that they will be.
Both the directories responded by removing my link. The next problem I had was that in the SERPs, the link that no longer existed was now a 301 redirect to their main home pages. You have to sit back after that and let google figure it out. Eventually, their links disappeared and mine came back.
It was hard though, knowing all my traffic was going to their homepages. It took about a month for this to straighten out.
Just my 2 cents
I've got about 20 different sites hijacking my pages and have emailed them nicely and then threatened them and then I even showed them how to block the bot's from indexing those redirect urls and about half of the emails bounced back to me and most didn't even respond.
It's hopeless?
Come on Google! Let's fix this?
I use apache 2.x on linux and I deny the i.p. in apache config. To this point, that has been o.k... as the list grows, I might take out some of the larger blocks at the firewall level soon so that the server doesn't have to deal with those requests at all. htacccess should work also, but the list will undoubtedly get larger daily when you really watch your logs. You can also use a spider trap that modifies .htaccess and then manually update your apache config or firewall to keep the .htaccess cleaner. If you have a large site and really watch the logs, you will get a good feel for these guys. I have to say the longer this goes and the more I learn, the less of a bot bug I believe this is and the more of an exploit I see it as. Still, bug or not the bots are going to need to deal with this soon.
After having my Index-Page hijacked and my site slowly disappearing from the SERPS due to duplicate content after a 302 redirect using the Meta-Refresh tag in the first tread.
Further after the tests done by "DaveAtIFG" and the intervention of "GoogleGuy", I also thought that Google had finally resolved this problem.....
I was at that moment so happy that my Index-Page came back in the SERPS and that since last weeks backlinks are showing again.
BUT her we go again....
I have again 4 other URL's showing the exact same content of my Index-Page
The New Hijacking pages are in the format of:
www.foo1.com/links.php?action=link_id=111
www.foo2.com/redir.asp?link=222
www.foo3.info/get_url.asp?SiteID=333
www.foo4.com/links/click.php?id=444
All of the above are directory-style websites that are using the:
<meta http-equiv="refresh" content="0; url=http://www.widget.com/">
Meta-Tag to redirect (link) to my site.
I do not think that the hijacking is done intentionally by those sites, as their whole linking structure is set up with 302's and Meta-Refreshes.
I have checked many outgoing links (302's) from those directories and the strange thing is that the Google-Bot only registers about 5 to 8 percent of those links as a 302 and not the others.
So .... what is the extra factor that makes the Bot decide to cache some pages under an other URL and some other not?
I start to presume that pages that are a lot updated are more likely to be hijacked by the Google-Bot, something to do with the "last modified date" maybe.
The new hijacking pages have not yet replaced my Index-Page in the SERPS but I can feel that, after regaining position and momentum again, I am now sliding one more time as surely again the automated algo of Google has applied a duplicate content penality for having 5 identical pages in the database.
This is really becoming hopeless
Have you done header checks on the offending links in the SERPs? I am finding that this is either an Apache problem or a PHP problem or a combination of both.
I have been sticky'd a lot of these types of directories and I have found a common theme on all of them...
PHP
Apache
Could this be a canned directory software problem?
Plumsauce supposedly was getting together information on these types of sites, but I have yet to hear any conclusions from him.
Anyway, just some thoughts. The only way I finally got the issue resolved was by having the directories delete my link.
One question that needs to be asked though, did you submit to these directories? Or did they just up and grab your pages for a redirect?
I think this is very important.
Just my 2 cents
The response in the header-check is 302 redirect
Most of them are from Apache-Server and some of them on Windows server.
3 are from directories created in PHP, 1 is from a directory created in perl.
For me reason you are seeing a lot of PHP is because many of those new commercial "directory softwares" are written in PHP and only a few in PERL
Same goes for Apache or Windows, as more Apache servers are online than Windows servers.
BUT ... I do not think this has anything to do with all of the above, its a simple mis-use (or call it a non-authorized) use of a 302-redirect.
What many of those directories are doing is using a redirect instead of a LINK, and until now Google is following and accepting this 302 redirect as it was intentionaly created for .... a redirect from one of YOUR own pages to another of YOUR own pages.
When, in my opinion, the 302 and the Meta-Refresh was created, no-one ever thought that it would be used instead of a normal link.
Its because of this NEW STYLE of use of the 302 and Meta-Refresh that everything is going wrong.
There was a time that search-engines did not index pages that contained a Meta-Refresh with a refreshing time of less than 15 or 30 seconds, but for one or other reason this seems to have changed.
And I did not submit my site to those directories, but our widget.com site is very important on the topic of widgets, so that any directory or portal that contains a widget-category will add our site there themselves.
I'm just taking a break from composing letters and research to wonder...
If this is such a big subject that is affecting so many people, not just in terms of traffic but in terms of revenue and paying real world bills, and, if Google could fix this relatively easily and are not as yet doing so....
Then surely this is something the media would like to get their hands on.
ighandy
I think this is maybe the main reason the "bots that be" won't acknowledge the situation even exists.
"There was a time that search-engines did not index pages that contained a Meta-Refresh"
I don't see the worth in indexing a placeholder to an actual site. I guess the folks that write the algorithms will have to decide for themselves.
It gives hope to the rest of us.
I've all but given up finding a way of blocking these referers - a way that doesn't shoot myself in the foot at the same time.
I'm concentrating on building the evidence to present to the appropriate people and will report back.
Here's hoping....