There must be something within a site that Google really doesnt like. I would check and double check all your content carefully, then try and Google to do another manual review.
My gut feeling is either you still have some nasties left over from your last hack, or someone complained when it was hacked, followed by someone at Google deciding to block your sites.
(how do we get Google to review it? did it get removed from webmaster tools? as I can no longer remember/find it)
I keep tripping over references (here's one) [google.com] regarding Panda being tasked with de-indexing of sites as well as ranking. Can this be true? Anyone has any idea what they're talking about?
This could be true but then Google is really lacking in the follow-up department! 6 months had passed between the break-in/fix and de-indexing. Still, I have sites that have been de-indexed yet never hacked into (located on a different server).
|My gut feeling is either you still have some nasties left over from your last hack, or someone complained when it was hacked, followed by someone at Google deciding to block your sites. |
A couple things come to mind and are usually the case. Don't immediately assume a penalty. Check these things first:
Double check your robots.txt and sure you aren't blocking them
Check the settings for WP, make sure you didn't accidentally check the box to make the site invisible to search engines.
Check your server settings to make sure you aren't blocking the bots in some way.
Other than that, I would need a little more info,
Did you change domains? IPs? Domain registration?
Did you make any recent changes to the site? URL changes?
Have you been building links?
How many visits do you normally get from Google?
How did you close the hole that lead to the hack? My worry here is that a back door is still open and perhaps the hack is not so obvious this time. Try accessing you site from a different IP than you typically do. Does it all look the same?
My point is, maybe what you see is not what everyone else sees.
@ponyboy96: Thanks for your suggestions! I do hear what you're saying but I'm pretty darn sure it is a penalty (ban) this time around. I'm not only not blocking them, but Googlebot is the single most bandwidth-wasting visitor on my sites last two days: they send no visitors and yet they download 50,000+ pages per day. I used WMT to "Fetch as Googlebot" and it was able to get there just fine.
@maximillianos: you make all great points. In fact, I cannot be 100% sure there is no backdoor. I will keep looking of course, including coming in on a different IP, which is a great idea, but if the hacker is still there, (s)he is quietly sapping resources other than http - I am constantly checking HTML code, logs and WMT crawling errors and nothing suggests that the html code of my pages has been modified in any way. Additionally, none of the sites hit with the ban and hosted on this server are listed in any spam database. Besides, I have banned sites on another server that has never been hacked into (to the best of my knowledge, of course)
I still harbor some hope that the issue is technical but I'm not getting any closer to finding it.
|regarding Panda being tasked with de-indexing of sites |
That is user language you linked to, not a Google spokesperson. People struggle with Panda related language, because Google does not consider it a "penalty." It's not easy to come up with exact language when Google doesn't hand it to us.
The best I've been able to come up with is something like "demote" or "devalue"... something like that.
Has anyone with a Panda devalued site noticed previously indexed pages being completely de-indexed?
In my case I can see a small but noticeable 15-20% drop in the amount of search queries shown in WMT for each of the banned sites approx. two weeks before the ban. The date does not jive with the official Panda schedule though - the drop happened on Aug 4th, the ban - Aug 21st.
|Has anyone with a Panda devalued site noticed previously indexed pages being completely de-indexed? |
"WMT account shows some 250,000"
Just one thing 250000 pages ! is that all well written content? tell us a bit more, what niche?
It's a forum. Some content is good, some is rubbish. Utter spam and obscenities are normally deleted. Some tiny amount may still be there. Nothing different from any other forum you've seen out there. Maybe not as tightly moderated as WebmasterWorld but cleaner than many that are still indexed.
|is that all well written content? |
Since you have UGC, check to make sure all the user-generated outbound links are nofollowed. If they used to be nofollow, and then something went wrong, and they suddenly all changed to followed links, that might have triggered problems (suddenly linking to tons of bad neighborhoods, or giving the impression of paid links).
I agree with dazzlindonna, and another question , what is the niche of the forum?
@dazzlindonna: it could be disastrous, indeed. But I don't convert URLs into proper HTML links at all. Maybe (just maybe) Google started to count URLs in text format as another form of referral (sort of Link Lite)? That would be crazy indeed. Maybe I should start obfuscating URLs, not just leave them as plain text.
@map1: it does not matter. I have sites banned that span several completely unrelated niches. Some are competitive, some are absolutely empty. It's not about the niche, I don't think...
You have done a 'Fetch as Googlebot' in WMT to check they are getting the sites correctly, right?
Its a bit of stretch, as you lost multiple sites on multiple hosts, but Gbot not getting a full load on a page request would explain removal, although not overnight.
Worth checking though :)
Do you mean as in an incomplete html, i.e. no closing body and html tags got Gbot but full page with some sneaky JS on the bottom to actual users? I guess can see how this could have been abused
|but Gbot not getting a full load on a page request would explain removal |
in my case when I do "fetch as Googlebot" I get a complete page, all HTML code is there and not altered. I was looking for a sign of breaking in, so it's word to word what was expected. I tried homepage and one random content page.
Thanks for your input! I'm still looking for ideas ...
Any chance these sites would fit into Google's definition of scraper sites? Since they are currently chasing that problem, perhaps they are testing their scraper-fixing algo on your sites.
I meant that the site's pages could be returning a pagesize of 0 and no content to G. :)
I have a site that is doing that atm, having shifted hosts - normal visits get the code fine, and other sites on the host are fine.
The symptom was the number of pages indexed dropping day by day.
No freaking idea whats happening, but the code is really old so I did a wget of the site, overwrote it with static copies for the interim (which are loading to G fine) and am rewriting the site. It was due anyway; its been a maintainence problem with its crappy old code anyway.
It occurred to me there was a small possibility that had happened to you, although unlikely from your description.
(Please don't derail the thread for my siutation - I have it under control and don't need advice! :))
Have you thought about moving your banned domains to different hosts? It's possible that there are black hat webmasters on your host, and your sites were banned as being located (ip address-wise) near a number of banned sites.
| This 47 message thread spans 2 pages: < < 47 ( 1  ) |