Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
IF G is TRULY in the process of fixing the problem, after 18 months, there really is no further need to ponder the 'problem'.
>The direct consequence is that ...and no longer get spidered.
Have you TRIED resubmitting them recently?
However, it is my supposition that I had a domain name server glitch back in January and this failure gave the sites redirecting to mine added weight and thus Google saw these sites as my sites. The direct consequence is that my top four sites are now in Google’s supplemental index in a state of perpetual dormancy and no longer get spidered.
Sounds plausible. The site of mine that lost its home page does go down every now and then.
My other site which has had 1 sub page 302'd in the index hasn't gone down as far as I know, though at heavy traffic times it may have problems.
Compliance = (Adwords.budget.profit – company.size/competitor.Adwords.budget)
[google.com...]
The back bone are the webmasters who affect the compliance. Adwords or not. If you do evil – every one should make you go broke and not you should make you competitor go broke. The 302 should only effect the pages on the inside structure of your own domain structure – otherwise I should borrow $500.000 to invest into 5 “nuts” to bring all of my competition and officially be sued for monopoly on selling widgets on the internet due to software compliance of major SEs.
Duplicate content should only count if ……they make the rule….
To people at Google – I want to see a listing of all URLs that point to my site – not just the most important once as “you” think. I also want to see the exact number of pages indexed on my site – all 4714 of them – with exact URL Indexed, Cached Copy and the correct Time when you were here(not 1969). I want to see all WebPages that have my domain name in them, sites that advertise for my domain name, not GEO Content delivery(just got off the phone with my ex-advertiser - she is in LA, I am in NJ, I see the Her Adwords, she does not, she calls her NY Office, They don’t see it, I say go to Google Software to check, she says Ohh. - I want to know when some one Advertises for my domain name in the keywords at all times – I AM DA OWNER – YOU MAKE MONEY)
Do I depend on Google –NO.
Then why would I comply – you come to my site – get content – then say its good for nothing. Last Year We spent more that 2Cents Per URL in G*Index.
My 2Cents Again.
But I would really like to see evidence for the "the URL with higher PR wins" theory.
I want to say that some posts here use really confusing terminology like 'syntax'. A syntax is a set of rules describing how to form sentences in a language. Also, I try to avoid the term page because nowadays with all the dynamically generated content and redirects it can become very fuzzy. I prefer the terms URL and content.
Now, if there was something you could do about this as a webmaster, i would know it, and i would have posted it. Not in this thread, but more than a year ago.
In this thread, the best bids sofar seems to be:
I personally very much doubt that any of these will fix the problem for a page that is already hit, but OTOH they will positively not damage you, so if you feel like trying them you will lose nothing by doing so.
This can only be fixed at the source of the problem, which is neither you nor the page doing the hijack - the source of the problem is Google, MSN and whatever other engine that mixes this up.
Although Google is no good on the thread topic, it still performs well in other respects
- On most DCs we're back on page one for our site name. Even saw it at #1 on a live google.com search. (A pleasent surprise after a month of pages 3 to 11.) Surrounding results are much more relevant than the junk that had been showing up post-Allegra.
- 302s previously ranking for our site name seem to have been beaten down in the SERPs, but still in the index. (But we're still being topped for inurl:domain.com by a site that's framing our home page...that may be a whole other problem.)
- A number of old URLs that I had 301'd and successfully deleted with the remove URL tool have found their way back into the index (hmm...rollback?), though mostly as URL-only or supplemental results.
Fingers crossed...
I am in dispute with alexa about 2 links in google of theirs that contain my URL.
xsltcache.alexa.com/traffic_graph/js/g/a/3m?amzn_id=typicallyspan-20&url=http://www.mysite.com
Similar pages
www.alexa.com/data/details?amzn_id=typicallyspan-20&url=http://www.mysite.com
Similar pages
One is a download and the other goes to the popularity page regarding my site with no description, just a long address. And both are displayed in google inurl: but my actual URL is missing from the results. This is simply not on and I am sure the existance of the alexa links above to be damaging to my site.
As of yet no reply by them. I am sure because my url was detected via their 302 somehow involved in the links, I just cannot work out exactly but am working on it. I will for sure give alexa one last chance to deactivate those links or I will retaliate with my spare server.
My site should appear in results for inurl: it does not and I cannot believe that all is ok. Its name is unique and I do not expect anything else to appear in an inurl: result other than my website.
Completely useless to be in google results but never the less. This only happened because googlebot deemed they should be.
These are the last 2 links I am trying to remove from google for one of my sites that disappeared after being number 1 for many keywords for 2 years when the go-php hijackers killed it. 3 meta refreshes were pointing at the same site that contained my index page cache. One scraper site closed down pending a fraud investigation. Anything you clicked on that site the webmaster made money. It had no e-mail and whois was private. But as you know, its easy to bypass private whois.