Welcome to WebmasterWorld Guest from 22.214.171.124
But the average site owner may not have the resources or understanding to investigate thoroughly. All they know is that their Google traffic went away.
But if you can discover that you've been hacked, the fix is straightforward:
One thing that hackers do is find sites to help distribute malware. This one should be easy to detect, because Google will post a warning notice in the SERPs "This site may harm your computer." This discussion [webmasterworld.com] covers the details of how to handle a malware hack.
One common footprint for a malware hack is an iframe that doesn't belong in your code - especially one with a lot of hex coding.
These are really "old school" - they're more like online graffiti than anything else. The hacker usually just wants to brag that they got you, and they put up a message on your pages for all to see. Well, that's easily detected because you just go to your pages and there it is!
But as I said, this is old school and many hackers are looking for something with some financial value these days.
This one is either done for sheer malicious delight, or perhaps for competitive disruption. How often do you check your robots.txt file? If someone replaced the first line and disallowed all indexing, how fast could you catch that?
In addition to visually inspecting your robots.txt file on a regular basis (and especially if your urls start disappearing from the Google index) you can also set up a Webmaster Tools account and check it regularly. Google will report to you when urls get blocked by robots.txt.
This one is sneakier and depends on the value of backlinks, either for PageRank or for the traffic itself. The hacker places links on your pages (they may be hidden through various means) and you may not be inspecting your content close enough to see those links.
The tool you need is a link checker, such as Xenu LinkSleuth, that can give you a report on all your external links. You are careful about who you link out ot, right? So anything really bogus is going to jump out at you from that list. Running a link checker on a regular basis has many other benefits as well, such as keeping those accidental 404s out of your site. So I consider it to be something like getting a regular physical (but I recommend doing it more often.)
Now we're really getting devious. Over the past year or more, hacks have been showing up that cloak their parasite content so that only googlebot sees it. If you visit with a regular browser (user agent) you only see what you expected to see.
Your main tool here is a user-agent spoofer of your own, such as the User Agent Switcher extension for Firefox. Just fire it up with a googlebot user agent string and see if your page content changes.
Complex Cloaking - using IP and cookies
This is getting deep - and it's also not so common, but it is out there "in the wild." The hacker in this case paces complex scripting on your site so that not only do they cloak for googlebot by user agent, they also cloak by IP address. In some cases the script also places a cookie so you get only one chance to see what they're doing.
And your tools here are 1) learning how to browse your site with coolies turned off and 2) studying your server logs for what your server replies to googlebot.
Cloaked Redirects - .htaccess hacks
Google's John Mueller (JohnMu) has just made an excellent blog post about this. I'll refer you to him:
The first symptom that you would see is hard to interpret: URLs from the website are just not indexed anymore...
When you submit a Sitemap file, Google will show warnings for URLs that redirect. By design, you should be listing the final URL in your Sitemap file, so if the URL is redirecting for our crawlers (as in this case), we’ll show a warning in your account.
I urge you to read JohnMu's entire article [johnmu.com]. He's offering a lot of help here.
Some of the sneakiest hackers have used various kinds of DNS tricks. Over two years ago we discussed this rare but still possible problem in this thread [webmasterworld.com].
If your traffic totally dries up, you would hit the panic button pretty quickly - so these hackers have been more clever than that. With DNS tricks they might syphon off only 20% of your traffic. One thing you would see was a traffic drop with no corresponding drop in rankings.
There's been some good effort here on the part of the DNS servers to get more secure from this type of thing, but it's still worth mentioning as a potential. The moral is to check your DNS settings and fix any warnings you get. It might seem like a foregin language to you if you never waded into these waters before, but it's worth climbing the learning curve - especially if your traffic is evaporating. However, it's something that I wouldn't suspect until I ruled out all the rest of the hacks I listed above.
It might be an employee, too
Sorry to say, it's not always an external hacker. Sometimes a person your trusted with server access gets greedy and places parasite links to earn some csh on the side. We've had such reports here, and it even happened at Google a few years back.
Don't get crazy about this possibility, but if you do find junk on your server and there's no real sign of an external hack - then consider who you might have given server access to. This is one solid reason always to changes passwords (strong ones) when anyone leaves the company, or when your contract is over with anyone who had access. Even great companies sometimes hire a bad apple.
I'm hoping I've been clear enough about the hacks I learned about over the past couple years. Has anyone got any details to add or clarify, or maybe a type of hack problem that I missed?
Fortunately I have a reseller account so I was able to totally remove the entire site, the site's hosting and recreate the hosting for the site and rebuild the entire site over a weekend. Luckily the site was small. After the new version was up, written only in HTML & CSS, I kept constant watch as these guys and/or their friends were still trying to view their work or perhaps even try to do more. They eventually gave up.
I'm no longer using any CMS package for my sites other than WordPress which I keep updated.
How are you supposed to know? The code is well hidden and launches in the background with IE. It does NOT launch with firefox but if it has been launched already it will replace ads on firefox too.
The best way to test for this, because there really is no other trace of it (yes, it's that fast and smooth and kills cookies etc) is to visit your yahoo email and hover over an ad momentarily. If your computer is infected the ad will show in the nav bar as being served by a third party server, not by yahoo. The ads are high quality from various sources (except for the "get free icons" ad that Yahoo would never show).
This type of infection steals website owners money without touching their sites and without doing anything nasty to your computer.
Long story short, I LUCKILY was reviewing my command history file one morning and happened to see some strange commands in there from JUST THE NIGHT BEFORE!
After reviewing some log files, I found they had obtained root access a few hours before... They were logged in for a total of 10 minutes. During that time they installed a script on my website that gave them file browsing capability via the browser (and edit), along with some other tools for accessing my database, etc. They then poked around my website scripts, viewing various files that seemed interesting (like login scripts, etc).
Then they attempted to clean up behind themselves and logoff. It was all a bit terrifying! I immediately changed every password on the server, and went into full lockdown mode on my SSH access. Previously I had not locked it down to be accessible via only my IP since I am on the move so much. Well this convinced me that I had to lock it down. So now I use a dynamic IP service (dyndns.org) to help me still SSH to my site when I'm moving IP addresses.
I spent weeks combing my site, logs, scripts, files, cron jobs, htaccess, config files, startup scripts, etc looking for something they may have left behind. So far I have not found anything. I'm hoping they were just being curious and planning to come back another day to do damage... since they didn't seem to do any visible damage this time.
BUT YOU NEVER KNOW FOR SURE since they did obtain ROOT access, they could have done anything and simply lead me to believe they were just browsing around. I still don't know for sure to this day. My ISP said I should buy a new server and migrate over and have the existing one wiped clean. I am not in a position to do that since I have years invested in setting this one up just the way I need it...
Unfortunately that may be a decision that comes back to haunt me... let's hope not.
But the moral of this story... getting your server hacked is a very real issue. Do not laugh at hackers that are trying to brute force their way in... they can (and will) eventually get in. Take steps to block them as soon as you see it happening... which was my biggest mistake. I figured my random password was unbreakable. Well they broke it.
It seems you're aware already that once they've got root they can modify logs to look like whatever they want, change what commands do (ssh may not ssh anymore. It may send your password to them, then ssh). And a host of other nasty things.
I wouldn't be able to sleep knowing someone had root at one time - it's just unknowable what they've left behind. And as your ISP noted, the only real fix is backup, wipe, and reinstall.
Defacement Hacks: These are really "old school" - they're more like online graffiti than anything else. The hacker usually just wants to brag that they got you, and they put up a message on your pages for all to see. Well, that's easily detected because you just go to your pages and there it is!
I am currently trying to compile a list of hacker access methods, to understand better what to avoid/check/secure. i.e. how the hacker manages to gain access in the first place, to do the stuff in Tedster's list. Here is my list so far (any aditions, amendments welcome)
- XSS - 'Cross site scripting'
Solution: Escaping and filtering in the form script.
- 3rd party Aps not updateded / patched e.g. WordPress, AWStats
Solution: don't use 3rd party Aps if you can avoid it. Keep them updated.
- Can you view a password folder or private content in the browser?
Solution: protect with htaccess
- Hosts using 3rd party aps with vulnerabilities: (e.g. CPanel, VDeck)
Solution: don't use such hosts.
- FTP with default passwords
Solution: Strong password + Change it every few months.
Many site owners completely forget about that. For an attacker this approach is superior because:
1. Site owner trusts his site or his accounts. Active content filtering can be completely off.
2. With his hijacked browser now accesses his admin, cpanel, database, external accounts, mail, you name it.
Ex: when you login into your google accounts, don't you have these active content/scripts running? Otherwise how you gonna see all these nice calendars, maps and analytics results.
And you may see a month later a warning from your anti-virus program about it. But by then its too late.
So in essence the site owner may give full control to the attacker. It's ironic but common.
Sent three reinclusion requests to G through Webmaster Tools since this summer, no response and still virtually nonexistent in Google. Years of hard work flushed down the drain.
I've gotten a lot of stickies and mails throughout the past few years claiming this or that "incident" with some "offending site" or with Google; and a very large percentage of the observed ranking problems could be traced back to bad configuration of the site in question.
I'm not using the word "hacking" as my first choice though, as a lot of these symptoms can be caused by ignorant or clumsy webmasters without any interference from third parties at all. Or, even webmasters trying to "cheat a little" and blame somebody else. No names, no examples.
1. Ability to crawl with any user-agent (googlebot is of a special interest, of course)
2. Respects robots.txt (otherwise it crawls too many pages that I absolutely do not care about)
3. Respects the "nofollow" tag, and reports the "nofollow" tag when found
Xenu lacks these features and does not allow to discover many potential security problems that you mentioned above. e.g. cloaking.
i read somewhere that says 304 is our friend.. but i am not sure about this.. seeking help
I found out one of my blog post that was indexed by Google have many pages with the same url but with query strings. All my blog post have an extension of .htm but the rest have this .htm?reyf=140 and a series like .htm?reyf=150, .htm?reyf=152 and .htm?reyf=67 and all the urls are identical. What is troubling to know is those pages have different titles with most of the words like c-a-s-i-n-o and l-o-t-t-e-r-y and when I click the link from the Google search, it redirected to another URL.
I think this caused a keyword ranking dropped and sudden dive of my traffic. Have anyone experienced this?
[edited by: tedster at 2:41 pm (utc) on May 3, 2009]
[edit reason] moved from another location [/edit]
I made some experiment. I tried to remove the wp-config.php to test if the url with .htm?reyf=140 will show an error because every URL on my blog should not work. My whole blog isn't working but the URL is still working and redirecting to an anti-virus site.
I'm in the process of reinstalling my Wordpress and I discovered something. It was actually hacked because of a Wordpress vulnerability (It was the new feature of wordpress I'm talking about). I'll go back to post the details after I fixed it.
So to cut the story short, the hacker uploaded files on my wp-include>js>tinymce>plugins>inlinepopups>skins>clearlooks2>img folder, about 3,900+ files with strange filenames. So deleting those two directories, wp-admin and wp-includes and uploading them again from a fresh copy solved the problem.