Welcome to WebmasterWorld Guest from 22.214.171.124
It seems to me that one of my subdomains is at penalty by Google. Cited by:
1. When I search for : "subdomain.example", it is not on top like it used to be. though I got more links for it.
2. It is not on top for its title with quote: "Subdomain keyword keyword keyword"
3. It is now at 350 (+ 350 compared to last week) for a very competitive keyword, which it used to be at: top 20.
My Guess for ban:
- It contains links to my affiliate programs, but I did use "nofollow" tag for those links. And Many other sites are using this affiliate, but they are still okey.
- Keyword density: it contains many repetitive keywords: A Hotel, B hotel, C hotel, D hotel... N hotel. So there is a "hotel" repetition here, I do not mean to spam Google, but I feel that those keywords are hurting me. right?
I am not here to blame anything, but just raise a question for everyone to discuss, then learn something from my case. And we could find the best solution to address this problem.
Anyone here could suggest me to do anything when all pages on my site are linking to this subdomain, and their rankings are still okey.
See Ted's comment in this thread, which otherwise doesn't apply to your problem...
Index page near bottom on site:example.com results
When you see this, sometimes it's a temporary oddity as Google moves data around. If someone is seeing this right now for the first time, or even seeing the complete absence of the domain root on a site: query - especially when it's never been an issue before and traffic is still the same - then I'd just wait. Google is really shuffling data at the moment and in that kind of a period, stuff often happens that gets quickly straightened out....
Others are noting missing home pages in the October SERPs thread. It's not far fetched to imagine that the default address of a subdomain might exhibit the same problem right now.
Or, as you suggest, you may have a different problem. I'd look for problems, but I'd sit tight for a while before I made any changes.
I seem to find the reason the penalty: I recently found that a website copied 100% content on my subdomain (what I read here is proxy hack. My site is pure hand-written HTML, and I do not know how it could be hacked.
Its ranking now is extremely bad, which once was great.
Please tell me how to deal with this problem?
Thank you very much, I really need your help now because I am living on it now, all of my income generated from it. So please.
[edited by: tedster at 9:35 pm (utc) on Nov. 22, 2007]
[edit reason] spliced two threads together [/edit]
what I read here is proxy hack
A proxy hijack is not really a flaw with your server's security. It is a flaw in Google's current methods of indexing. There are extra steps you can take to short-circuit a proxy hijack. We had a long discussion about this recently and it's archived in the Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page.
See [webmasterworld.com...] for help on how to defend against proxy server urls hijacking your rankings.
I know the website proxy hacked my sites, it also created an account on Digg. How evil they are.
Did Google do anything to prevent it?
ANY public proxy server can be used by anyone to request content from any public webserver at all. The proxy server is transparent to this process and it does not host or steal your content. Your server is still hosting and supplying the content, and simply replying to the proxy server's request. The proxy server just passes your information back to the user agent that made the original request.
The problem is that the REAL GOOGLEBOT can crawl and index any public site through any proxy server.
The thread I linked to above discusses this problem. The first line of defense is to verify that googlebot user-agent requests are coming from an official googlebot IP address, as discussed here: How to verify Googlebot is Googlebot [webmasterworld.com]
Here's the information that Matt Cutts provided from Google's crawl team:
Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:
> host 126.96.36.199
188.8.131.52.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 184.108.40.206
I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.
Getting proxy-crawled by the REAL googlebot is the situation that causes you ranking hijack problems at Google. But if the REAL googlebot requests your content through a proxy server, the IP address of the request will not match the googlebot user agent. Your server can detect this mismatch of IP address and the googlebot user agent and refuse the proxied request. That fixes the issue.
This problematic hijack process gets started because, somehow, googlebot got a proxy server url for your content and then crawled your site through the proxy. How did Google get those proxied urls? It can happen either accidentally (publicly open server logs, for example) or maliciously. But discovering how those proxy urls were provided to Google is not a fruitul defense. It also doesn't matter WHICH proxy server is involved.
If your server uses the googlebot verification technique outlined above (forward and revere DNS look-ups), you will break the chain of events that causes confusion in Google's index.
There is a separate situation where someone scrapes your content and then actually hosts it on their own pages, often after some slicing and dicing. This is not a proxy hijack, although the impact on your rankings may be similar. The double IP verification of googlebot will not stop scraping, except in the case where the scraper request spoofs the googlebot user agent. There are many other ways to scrape content, however.
The best defense against scraped content outranking your original version is to build a strong and trusted website of your own. Scrapers have trouble outranking you, or getting your domain filtered out, unless your domain is already weakened in Google's view.