Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's 2020 Spam Report, 40 billion Spammy Pages Discovered Per Day

         

engine

10:30 am on Apr 30, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google has just published it's latest spam report based upon 2020, and it says it's using AI a great deal more, reducing auto-generated and scraped content by over 80%.

Hacked spam is still a problem, and the report says the volume of vulnerable websites is still very large, with an improved detection capability of greater than 50%, and also removed the hacked from the search results.

Google makes the point that it cannot solve the hacked site problem alone, and this is true. It encourages site owners to practice good security and "hygiene."

Google says, "every day, we discover 40 billion spammy pages. Here’s how we work to keep that spam from getting in the way of your search for helpful, useful information."
https://developers.google.com/search/blog/images/webspamreport2020/WebspamReport2020_EverySteps.png
We observed spammers hacking into vulnerable sites, pretending to be the owners of these sites, verifying themselves in the Search Console and using the tool to ask Google to crawl and index the many spammy pages they created. Using AI, we were able to pinpoint suspicious verifications and prevented spam URLs from getting into our index this way.


Google goes on to say it's working on expanding efforts against online scams and fraud.

It also asks users to report spam via this form [google.com...]

[developers.google.com...]

NickMNS

2:31 pm on Apr 30, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No comment required:
every day, we discover 40 billion spammy pages

rustybrick

2:53 pm on Apr 30, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seems like an insane number - 40 billion daily.

engine

3:01 pm on Apr 30, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I really wondered when I read that figure of 40-billion, daily, but that's what it says. Perhaps it's the same 40 billion each day. ;)

The other question is, what does Google class as spam? It may not be what we think of as spam.

iamlost

4:27 pm on Apr 30, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



An interesting side note comparison are some stats from an ahrefs blog post just over a year ago:

Their main crawler crawls some 5-billion (yes, billion with a b) pages per day.

Their content explorer tool discovers less than 2-million (yes, million with an m) new ‘quality’ (by their definition) pages per day.

So... ahrefs’ discovered page signal to noise is 2 in 5000.

An additional interesting side note is from the, at the time point used (2017), 1-billion+ pages in their content explorer db:
* over 90.63% of those pages did not get any direct Google search referred traffic. Nada. None. Zilch.
* 5.29% more got 10 or fewer direct referrals per month from Google search.
* 66.31% of those pages had no back links. Nada. None. Zilch.
* 26.29% more had only 1 to 3 backlinks.
And these were from ahrefs’ list of ‘quality’ signal from noise pages.

While I too wonder at the 40 billion per day spam page number Google does work at a different scale than anyone else.

And I remember a WebmasterWorld member who, some 15+ years ago, uploaded 10 thousand auto generated pages (100 pages x 100 sites) each and every day (an hour or two after breakfast) expecting G to nuke 90% within a week, 99+% within a month but sufficient surviving that they got their UPS delivered cheque each month...

There are a lot of spammers and scammers with a lot of automation at their finger tips...

One final thought: if G feels it needs to publish a defensive spiel on this topic it must be causing significant pain...

thecoalman

6:45 am on May 1, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The irony about comment spam is about 90% of the email addresses I encounter used to register are gmail accounts.


....verifying themselves in the Search Console and using the tool to ask Google to crawl and index the many spammy pages they created.


I had this happen to myself but with a very interesting twist. A Drupal installation on one site had major exploit a few years back that allowed remote file upload, it was hacked before I even read the email to patch it. They uploaded verification .html file but here is the twist. Instead of spam pages they uploaded crypto mining scripts. They also created sitemap in Search console effectively using Google's bot like a cron job.... that was slick.

JorgeV

10:39 am on May 1, 2021 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

Seems like an insane number - 40 billion daily.


This is the power of the AI and automatic page creation.

RedBar

2:08 pm on May 1, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without reading the report, how many of these are FB, Insta, Twit, etc ?

NickMNS

2:46 pm on May 1, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is the power of the AI and automatic page creation.

You don't need AI to automatically create pages.

JorgeV

12:58 pm on May 2, 2021 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

I wonder how much it costs in resources (cpu, bandwidth, electricity, etc...) to crawl and parse 40 billions pages a day, and not index them (or remove them).

No5needinput

2:04 pm on May 2, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



every day, we discover 40 billion spammy pages.


Maybe they are spidering their own serp's...

SweetPotato

10:10 pm on May 2, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



I found far more spam hacked sites on page one of SERPs today than what I did on Matts Cutts manual era of policing.

I don't think this AI approach is working at all. And it shows.

JS_Harris

11:50 pm on May 2, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The better Google gets at not indexing spam sites the more desire to get spam onto other people's sites increases.

Lock the doors!

Also, I'm finding a lot of sites that present with a warning for not having https even if they are in my bookmarks or don't allow user input. This is leading me to click the "visit anyway" button which if I do too much I'm bound to hit an actually dangerous site due to muscle memory and laziness.

universenet

12:04 am on May 3, 2021 (gmt 0)

Top Contributors Of The Month



Google working so hard for make for us internet "better" place
Better internet is when exist only google and website what google owns
All other websites will be lost in GOOGLE LOOP
(people also ask)
It is easy for understanding...

goodoldweb

5:46 am on May 3, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Google must continue to report +40% increase in earnings each quarter, this can not be achieved without labeling and penalising most of the internet as "spam" (business and shopping sites in particular).

Disgusting really.

GoneRogue

10:49 am on May 4, 2021 (gmt 0)



And I remember a WebmasterWorld member who, some 15+ years ago, uploaded 10 thousand auto generated pages (100 pages x 100 sites) each and every day (an hour or two after breakfast) expecting G to nuke 90% within a week, 99+% within a month but sufficient surviving that they got their UPS delivered cheque each month...

One might say the spammy problem was created by Google with a well thought out business model.
The irony about comment spam is about 90% of the email addresses I encounter used to register are gmail accounts.

For one of my customers, over 60% of probing for site vulnerabilities originates from the Google Cloud. That particular site, while very very small, was a blog that had an article about Google. Shortly after that Google article was posted, traffic referred by Google searches (for the entire site) dropped to zero.

samwest

1:51 pm on May 5, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When they remove those pages, their profitable traffic share increases. Problem is, what is considered spam? Knowing how this company operates, probably anything that contravenes their ability to dominate the market and rake in cash. In other words, "the competition".
There's a name for that, but I just can't recall...

EditorialGuy

8:53 pm on May 5, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One might say the spammy problem was created by Google with a well thought out business model.

Right. And banks are to blame for the existence of bank robbers.

goodoldweb

2:57 am on May 6, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Right. And banks are to blame for the existence of bank robbers.

Banks must exercise due diligence, ensure their security measures are up to scratch and not openly engage in promoting robbery. It is a crime when a bank pay robbers (well) to brake in, then later collect large sums of insurance money.

Now you have a better understanding of the AdSense business model.

thelostagency

8:13 am on May 6, 2021 (gmt 0)

10+ Year Member



Hows GPT3 going to make this an ever bigger problem?

heisje

6:07 am on May 8, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



@goodoldweb
Google must continue to report +40% increase in earnings each quarter, this can not be achieved without labeling and penalising most of the internet as "spam" (business and shopping sites in particular).

Exactly! >> the essence of the matter, already a long time too.
.

heisje

11:18 am on May 8, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Disgraceful misinformation & false news as part of their defense against authorities investigating their abusive practices. "Spam" and "Security" their eternal pet excuses. And why not? They have worked pretty well to date. Mighty surprise, criminals lying.
.

goodoldweb

1:40 pm on May 12, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



@heisje

As far as I am concerned Google are now the internet's number one enemy, a disgusting uncompetitive predatory company really. They make Microsoft look great.

I will prefer to shut my business than ever spend even one cent with them on advertising. It's a matter of principal. I found ways to get my traffic via social networks and don't really need them anymore.

To me, the way they bombard viewers with ads on utube simply confirms they've totally lost the plot.