Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
everything but someone else's domain
so if you are www.example.com you are also example.com and by extension the ip addy and port on the server .... so now you see a so called split site.
In addtion you can park a domain which is what folks normally do when they buy the .com .net .darnx's huh Brett versions of their site name.
This can lead to many sets of duplicate content if the site is built using relative hrefs.
[edited by: theBear at 9:00 pm (utc) on Mar. 16, 2005]
C
[edited by: crobb305 at 8:59 pm (utc) on Mar. 16, 2005]
Yes, you are correct about site:mysite.com. That command should ONLY show pages that are truely part of your site. In my case, that search (for my homepage) still shows 3 unrelated urls (redirect urls); at one point, there were as many as 20 redirect urls showing. My site was hijacked by malicious tracker2 scripted urls last May and subsequently disappeared from Google serps 2 weeks later. It had previously disappaered in Yahoo because of the same problem. But, thanks to their quick action, the problem was resolved within 3 months.
[webmasterworld.com...]
Chris
Anybody familiar with PostNuke/PHPNuke open source content management/portal system? Well in it's links module it uses, guess what 302 redirects. When I look at site:myurl I find one of my nuke sites listed in with the other site and sure enough the cached page brings up the page from my (hijacked) site.
When I look in the code the redirect is done using the php header location:... method.
So, this shows how you don't have to be a nasty scraper site to **** things up for someone you're actually trying to do a favour.
Does anyone know how to create a 301 redirect using the PHP header?
Claus Wrote:
Related Threads - topic: Redirect bug
I've collected a few related threads, please contribute if you see one that i haven't mentioned. Do list name of thread and starting date:
- Is there a new filter? [webmasterworld.com] (May 4, 2004)
- Big problem with Yahoo [webmasterworld.com] (Apr 30, 2004)
- PR 7 - 0 and Address Nightmare [webmasterworld.com] (Apr 28, 2004)
- Problem with Googlebot and robots.txt? [webmasterworld.com] (Apr 12, 2004)
- Meta Refresh leads to ... [webmasterworld.com] (Mar 18, 2004)
- weird link showing up for my site in Web results [webmasterworld.com] (Feb 10, 2004)
- Google indexing redirect pages [webmasterworld.com] (Jan 31, 2004)
- free hosting sites banned from google? [webmasterworld.com] (Jan 31, 2004)
- Is using a redirect to track outward bound links bad? [webmasterworld.com] (Jan 27, 2004)
- Our company Lisiting is being redirected. [webmasterworld.com] (Jan 5, 2004)
- 302 Redirects showing ultimate domain [webmasterworld.com] (Dec 21, 2003)
- Strange results in Allinurl [webmasterworld.com] (Dec 20, 2003)
- Domain name mixup [webmasterworld.com] (Dec 9, 2003)
- Using Redirects [webmasterworld.com] (Nov 17, 2003)
- redesigns, redirects, & google -- oh my! [webmasterworld.com] (Oct 22, 2003)
- Google Partial Indexing? [webmasterworld.com] (Oct 21, 2003)
- Not sure but I think it is Page Jacking [webmasterworld.com] (Oct 9, 2003)
- Unindexed URL Google Ranking Trick [webmasterworld.com] (Oct 9, 2003)
- http://click.fastsearch.com.... [webmasterworld.com] (Oct 8, 2003)
- Duplicate content - a google bug? [webmasterworld.com] (Sept 26, 2003)
- Banner ad redirect-page indexed as mirror site by Google [webmasterworld.com] (Aug 13, 2003)
- Indexed AlltheWeb pages causing Google duplicates [webmasterworld.com] (Aug 14, 2003)
- Banner ad redirect-page indexed as mirror site by Google [webmasterworld.com] (Aug 13, 2003)
- DeepFreshBot's 301 Handling [webmasterworld.com] (June 16, 2003)
Okay, i admit, the last one might be a litlle too old by now, but it seems the problem goes back to August 2003 at least.
Sending request:
GET /links/link.php?id=622 HTTP/1.0
Host: www.example.net
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
• Finding host IP address...
• Finding TCP protocol...
• Binding to local socket...
• Connecting to host...
• Sending request...
• Receiving response...
Total bytes received = 659
Elapsed time so far: 0 seconds
Header (Length = 555):
HTTP/1.1·302·Found(CR)
(LF)
Date:·Wed,·16·Mar·2005·21:57:19·GMT(CR)
(LF)
Server:·Apache/1.3.33·(Unix)·mod_gzip/1.3.26.1a·mod_auth_passthrough/1.8·mod_log_bytes/1.2·mod_bwlimited/
1.4·PHP/4.3.10·FrontPage/5.0.2.2634a·mod_ssl/2.8.22·OpenSSL/0.9.7a(CR)
(LF)
X-Powered-By:·PHP/4.3.10(CR)
(LF)
P3P:·CP="CAO·DSP·COR·CURa·ADMa·DEVa·OUR·IND·PHY·ONL·UNI·COM·NAV·INT·DEM·PRE"
·policyref="www.somesite.com/w3c/p3p.xml"(CR)
(LF)
Set-Cookie:·hits=++622+;·expires=Wed,·16-Mar-05·21:58:19·GMT(CR)
(LF)
Location:·http://www.example.com/index.cfm/
action/cat/tp/cost(CR)
(LF)
Connection:·close(CR)
(LF)
Content-Type:·text/html(CR)
(LF)
(CR)
(LF)
Content (Length = 104):
<meta·http-equiv="refresh"·content="0;url=http://www.example.com/index.cfm/action/cat/tp/cost">
Done
Elapsed time so far: 0 seconds
[edited by: lawman at 11:46 pm (utc) on Mar. 16, 2005]
[edited by: ciml at 10:15 am (utc) on Mar. 17, 2005]
[edit reason] Examplified [/edit]
what gets me is why on these unrelated pages in site:mysite are not updated. The cache shows an old outdated page (my homepage) from last November but my real homepage is updated in the cache last week.
Yes! I don't get this either. There are three urls that once pointed to my homepage via 302 that are STILL showing up under a site:mysite.com. These urls are NOT mine, and they have NOT redirected to my home page in 5 months! They were last cached on November 2. Until Google revisits the site, the bot will never know they no longer redirect to me. Google IS aware of this because I have emailed them 10 or 12 times with the same information. The fact that they let their cache go this stale is just pathetic. It has an influence on all other aspects of the search engine.
I see that something is going on. I'm not sure I like the new stuff better than Allegra. I'm doing well in both except that the new stuff has the DMOZ description instead of my own meta description. Also most of my competitors are missing. Good for me but bad for the users.