Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

99% traffic drop from Google after site hack

         

Logitheque

2:20 pm on Dec 24, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hello,

For This Christmas season, Google has made our website <snip> disappear from the Internet. After 20 years of existence.

Our site has a certain volume: we make several million visitors per year, peaks of several tens of thousands of visitors per day.

Unfortunately, we were hacked 2 months ago, victim of the famous hack "Japanese keyword injection".

We did everything necessary to fix it: installing a security solution, switching to secure CDN, locking rights, removing the parasite content, cleaning the database, removing some rotten links, complete update, etc.

On November 13th the site is operational, cleaned, relaunched, we are confident, everything seems to work perfectly. Only downside: some pages indexed by Google are some of the old pages of the Hack. We know that we have to wait for Google to update its cache step by step. So be it.

December 18th, 6AM : the visits are collapsing. The traffic is not divided by 2 or 3 or 5, it's much worse. It disappears completely from the Google traffic source.

We have checked everything, tested everything, we don't understand :
- no manual penalty in Search Console
- no security issues
- a crawl with Screaming Frog tells us that the site is reachable without worries: all the 200, 301, 4xx are exactly where we expect them, the pages are indexed
- a crawl with Semrush doesn't bring out anything
- the site works perfectly, reachable through all browsers, devices, tested by VPN
- the log nalysis shows that the 2 Googlebot (desktop + mobile) pass well on the site at a good rate
- we posted two messages on the Google webmaster forum, without any relevant answer

It's been a week now that we are in this situation, we have no ideas anymore to go through this:
- it is impossible for us to submit our site to Google for a review
- we don't have any information from anyone about what's going on
- in Search Console, Google gives us a 5xx error when we submit a url for inspection: it doesn't make sense, the site is accessible, the bots are not blocked (we see them passing), the page is visible by all and all tools

We think it's the quantity of pages coming out of the hack that must be the problem, but there doesn't seem to be any solution to get out of this in-between, this digital desert.

If you too have been a victim of a similar experience, or if you, or one of your customers, have been confronted with this type of inconvenience, we welcome your feedback and advice with the utmost kindness.

Otherwise, our site seems doomed, which is serious with serious consequences for our team.

Please, be safe, and have a wonderful holyday.

[edited by: goodroi at 2:50 pm (utc) on Dec 24, 2020]
[edit reason] Welcome to WebmasterWorld, now please go read the forum rules :) [/edit]

goodroi

3:06 pm on Dec 24, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld! In this subforum we do not perform site reviews so please do not post your site name.

As for your situation, I'd start with a few things & in no particular order...
- Make sure all CDNs have clean content & there is no cache of the poison content

- Ensure the site is 100% clean, some hacks will use tricks like cookie users to show them a clean version on repeat visits but continue delivering poison content to all other first time visitors.

- Ensure googlebot is receiving good content & good status codes. Just because googlebot is visiting does not mean they are finding good things. It's nice that screaming frog & semrush thinks all is ok but they aren't googlebot, so you need to dig deeper.

- Build up new links. When a site is hacked, often other sites will delete backlinks because they don't want to send users to a hacked site. After the site is cleaned, it is still missing those backlinks & thus won't rank as well. Besides the more traffic-generating links, you develop the quicker you will succeed without Google. Plus more links tends to encourage Google to reindex faster.

- Get more active on social media. This will help to boost up ranking signals and fill in your traffic gap while the rankings recover.

- Figure out if the pages aren't indexed or not ranking. Two very different situations requiring different solutions.

- Double-check the htaccess. It is probably ok but a single type could ruin everything.

- Check for duplicate content on other sites that could be making your content filtered out of the SERPs. Unlikely for the situation you have described so far but easy to check.

Logitheque

3:17 pm on Dec 24, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hello goodroi,

Thanks for your reply.

As far as I know, everything is clean now, tested and tested again. The only place we see those spam urls is in the Search Console.

googlebot is visiting, but somehow the indexing tool (back up again since yesterday) refuses to crawl our pages.

We pratically disappeared from the Google Index. It's not something we thought possible for a 20 year old website with no bad history.

I trust the technical team when they told me they doubled check, but what is strange is that a month separates the end of the hack to the "penalty".

As for the duplicate content, we have to endure multiple cloned websites, but they don't seem to benefit from the situation.

Anyway, thanks for your quick reply and have a good day!

not2easy

3:54 pm on Dec 24, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



First this needs some attention:
in Search Console, Google gives us a 5xx error when we submit a url for inspection
A 500 error indicates a server error. What do your error logs show? Are these 500 errors or some more specific 5xx error? Do you see Googlebot in your access logs with 200 server responses or are they getting 500 for every visit?

Have you gone through the steps Google requires for a hacked site to be eligible for crawling? Google offers tools to verify the site is or is not hacked and forms to resubmit a site that had been hacked and then cleaned. If they found your site hacked, you should follow their steps to undo the restrictions: [developers.google.com...]

JesterMagic

4:21 pm on Dec 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry to hear about your hack issues.

The only 100% way to recover to be 100% sure no lingering files are left is to use a backup. You obviously also have to make sure whatever backdoor they used is closed down.

As not2easy suggests, concentrate on why those 500 errors are happening.

Some of these hacks are pretty smart and only inject content into pages for only certain types of visitors.

Maybe this is what is happening in your case. You see a clean page but Google Bot doesn't. I would test your site pretending to be GoogelBot (use a Googlebot user agent).

My theory on why you are getting a 500 error based on the info you provided is that some of your core files were modified. These files are now looking for some extra files that the hacker added to your site. You have since removed these extra files by the hacker but have yet to update the modified core files which then in turn fail when they can't find the other files the hacker uploaded.

JorgeV

5:46 pm on Dec 24, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

Google gives us a 5xx error when we submit a url for inspection


When you try the inspection too, do you see the "hit" in your log ?

As said, what you see is not necessarily what others (like Googlebot) see.

The hack modifies the "core" wordpress files, did you reinstalled wordpress from scratch?

Check your PHP logs, there is certainly the explanation of the 5xx error messages.

You can upload a "hello world" PHP test script, and fetch it with the Google inspection tool. Just to be sure. Who knows, it's possible the Apache/Nginx/PHP config files have been also modified, and scripts being pulled out from "elsewhere".

aristotle

7:55 pm on Dec 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I agree that the 5xx errors are probably the key to the problem.

Is the site on a stand-alone or shared server? This might sound like a drastic step, but you might consider moving to a different server.

Logitheque

10:19 am on Dec 26, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hello!

Wordpress has been reinstalled from scratch and we duplicated the pages. There is no trace of the hacked website.
As for the 5XX errors, we think it's the problem here, as we cannot render a mobile page.
Some very nice fellows are helping us on that, but now we can't figure why the bot hit our pages but can not render a visual.
Maybe it has something to do with Sucuri, or the robots.txt, or the server...We don't know yet but the logs show the bots hit and it's really bugging us.

Thanks!

JorgeV

11:32 am on Dec 26, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

we cannot render a mobile page.

Since some times now, Googlebot simulates a mobile device, so it might be the reason Googlebot is not able to index your pages.

Logitheque

8:49 am on Jan 5, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hello,

We can now render a mobile page, to no avail.

JesterMagic

12:05 pm on Jan 5, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So no 5xx errors? What was the issue?

In Search Console I assume you have a pile of errors for all of your pages since you said

Google gives us a 5xx error when we submit a url for inspection


Have you told Google to Validate the fix?

Since the error has been happening for a while it may take a week or 2 for Google to start sending you traffic again

Logitheque

1:17 pm on Jan 5, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



The url we submit are now crawled and rendered. We can't told Google to validate the fix, since this option is not available right now.

If this was the reason of the deindexing, we expected an improvement after the repair. Maybe we have to wait more, but the signals are simply non-existent.

Logitheque

1:25 pm on Jan 12, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Well, things are getting worse, since Search Console indicates now that 8 Millions URLs respond in 404. We only have a few hundreds of thousands pages overall but now GSC shows us a ridiculous number of pages.

We start to think that nothing is going to change soon, as Google still "thinks" that we are hacked. If anyone has a solution, or has faced a similar issue, I'd really be happy to read them.

[edited by: not2easy at 1:55 pm (utc) on Jan 12, 2021]
[edit reason] Please do not link to your site here. Read ToS [/edit]

not2easy

1:57 pm on Jan 12, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you have followed Google's procedures (linked above on Dec. 24) then the doors may begin to open. They may assume you have not attended to the hack if you have not followed the steps in that link.

Logitheque

2:19 pm on Jan 12, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



This procedure has been followed thoroughly but we can't use the tool to "unpirate" our website since Google doesn't consider our website is hacked anymore. It's a conendrum.

not2easy

2:31 pm on Jan 12, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A pirated site is not the same as a hacked site. If some other site is showing your content you should be able to make that stop as soon as you can determine the method in use. Did someone scrape your site? Contact them/their host over intellectual property rights OR if they are displaying your content via iframe, simply block their access to the content.

Examine your raw access logs and you can see if there are remote requests that align with another host. Block the IP and you're done - though you may need to continue vigilance.

Logitheque

2:45 pm on Jan 12, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



To make it clear, it was a "japanase keyword hack". Our content has not been duplicated or stolen, our pages have been replaced with pages of a japanese store and millions of spam pages have been created on bogus urls.

not2easy

3:59 pm on Jan 12, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google has a page for that: [developers.google.com...]

When you said "pirated" that leads to a different issue.

Logitheque

4:07 pm on Jan 12, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



We read that already. Thank you.

swright

4:17 am on Jan 18, 2021 (gmt 0)

5+ Year Member



You mentioned Sucuri. Did you have the Sucuri firewall or scanner protecting your site when this happened?

Logitheque

9:52 am on Jan 18, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yes it was running when this happened.

lucy24

7:16 pm on Jan 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looking back at these:
in Search Console, Google gives us a 5xx error when we submit a url for inspection:

Search Console indicates now that 8 Millions URLs respond in 404

I’d want to see a closer check between what Search Console reports for some specific request, and what access logs show for that same request.

Since it’s a WP site, it is very unlikely that 404s will show up in logs; they're logged as 200, and then WP generates the 404. If logs do show the 404, that’s a pointer to the source of the problem. 500-class errors on the other hand can be expected to show up in logs; again, if these requests are logged as 200, that’s a pointer.

I don’t see anything posted about what you see in htaccess. (Assuming Apache, which is most likely.) Look both at the content of the file, and at its timestamp.* It should correspond to the date and time of your most recent WP (re)install, or the last time you personally edited it, whichever is more recent. If anyone has tampered with htaccess, it would be trivial to serve different content to any given visitor, such as the googlebot.

You could try using a UA switcher and visit the site yourself pretending to be Googlebot. But, since you're not coming from Google's IP, results aren't necessarily reliable. On many sites, you'd get a flat 403 slammed in your face.


* We won’t even consider the possibility of a fake timestamp, because that would mean someone has compromised the entire server.

swright

10:24 pm on Jan 18, 2021 (gmt 0)

5+ Year Member



@Logitheque

Did the Sucuri server-side scanner at least catch the problem after it occurred? I'm also using Sucuri, so I'm concerned that it might miss some problems. I also use Wordfence. What other security solutions do you wish you had implemented to stop the hack from occurring in the first place?

Logitheque

8:41 am on Jan 19, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



@lucy24 The problem seems to be over now. Since Sunday, the Google index is showing our former positions, one month exactly after we were removed. The traffic is not optimal yet but we saw signals last Friday when Search Console started to consider the urls from the sitemap again. When we ran the logfile analyzer, it showed 75% of 404s. Maybe I will run this again to see how much of these remain.

@swright The scanner side spotted an error in the logs, but nothing related it seems. We wish we had installed Sucuri when we installed WP, and a malware files scanner too. Now the trafic seems to be "normal" again, as it was before December the 18th. We think that the fixing of the 5XX error has played a huge part, and we suspect a plug-in to have major security and coding flaws that ended up in messing with Sucuri, the Cache engine and our php code.