Forum Moderators: open

Message Too Old, No Replies

CommonCrawl

User-agent is just - CommonCrawl

         

SumGuy

12:16 am on Oct 30, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



I got a couple of hits (https) a second part, on Oct 28, to my landing page (all they got was index.html) from 92.154.44.151. All header fields were blank, except for user-agent, which was just "CommonCrawl".

The IP is wanadoo.fr (france telecom, aka Orange).

As far as I know there is only a single thread here on webmaster world, circa 2009, that mentions CommonCrawl.

lucy24

3:16 am on Oct 30, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huh. That sounded awfully familiar, but turns out I was thinking of
CCBot/2.0 (https://commoncrawl.org/faq/)
which I take it is unrelated. (Variety of IPs, but only one header deficit requiring hole-poking.)

Sgt_Kickaxe

5:31 pm on Oct 30, 2022 (gmt 0)



CommonCrawl is a non-profit data gathering company bot used to help others create web tools. Entities using the bot to create web tools may or may not set proper headers. Probably harmless but not entirely helpful.