Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

New Cloudflare feature Crawler Hints

Provides data to search engines when sites change their content

         

NickMNS

3:45 pm on Nov 3, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just discovered this new beta feature offered by Cloudflare. Although based on the link below it seems to have been launched at the end of July. I'm not sure how effective it is yet, I just enabled it. But it will be interesting to see if it has any impact.

Crawler Hints provide high quality data to search engines and other crawlers when sites using Cloudflare change their content. This allows crawlers to precisely time crawling, avoid wasteful crawls, and generally reduce resource consumption on origins and other Internet infrastructure.


More information available here:
[blog.cloudflare.com...]

It is funny how this press-release is framed "reducing environmental impact".

This also seems to be related to Bing's IndexNow announcement:
[webmasterworld.com...]

aristotle

7:41 pm on Nov 3, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well I don't use cloudflare, but doesn't the usual 304 server response code (= not modified) do basically the same thing?

Or does cloudflare use some kind of "ping".

It seems to me that the only time you would want the search bots to come is right after you make a content change, and you can use GSC for that. If you haven't changed anything, you can just depend on the server 304 response.

Or perhaps I haven't looked closely enough to have a good understanding

lucy24

8:00 pm on Nov 3, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



doesn't the usual 304 server response code (= not modified) do basically the same thing?

These days it is very unlikely that response code 304 would be returned with a text file, unless your pages are pure HTML with no included content (such as a php navigation header). I think the same would apply to server responses to requests with the If-Modified-Since header, which currently seems to be sent only by Googlebot (sporadically) and Seznambot.

So the question is whether Cloudflare, unlike an ordinary server, can tell that the HTML itself hasn’t changed, even when some of the included content is dynamic.

aristotle

9:08 pm on Nov 3, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These days it is very unlikely that response code 304 would be returned with a text file, unless your pages are pure HTML with no included content (such as a php navigation header).

I guess I didn't know that because all of my sites are static html, which would explain why I see 304s in my logs all the time. I also don't use separate css files, but instead put all of that into the head section.

When I get time, I'll check to see what happens for image requests.

lucy24

11:43 pm on Nov 3, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Image requests definitely should return a 304 most of the time, unless you've got one of those arcane responsive-cms setups. (Something tells me that you, aristotle, don’t.) Similarly any static supporting file: fonts, scripts, sounds and so on. Oh, and actual .txt files such as robots.txt. I only know the 304 thing because of direct personal observation: the moment I started using php includes, 304 disappeared from logs for the affected pages.

Quick check in raw logs suggests that only G and Seznam get the 304 response. Does this mean the server only uses it when there was an If-Modified-Since header?

aristotle

4:36 pm on Nov 4, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I read the cloudflare article and, if I understand it, the method it describes does make sense. Essentially cloudflare provides search bots with a special sitemap of a site's URLs along with info about which ones have changed.

lucy24

5:39 pm on Nov 4, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But does any major search engine actually use, or plan to use, this feature?

aristotle

2:11 pm on Nov 5, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe I overlooked some others, but the only specific companies I saw mentioned as being involved at this point are Yandex, Duckduckgo, and Internet Archive.

Also, there are apparently some details and protocols still not fully finalized:
There are a few details to work out, such as how a search engine should identify itself to get its personalized list of URLs ...
We plan to continue to work with the search and hosting communities to refine the protocol in order to make it more efficient.