Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Strange requests from googlebot with random parameters

         

haramamba

10:04 am on Jan 19, 2025 (gmt 0)

Top Contributors Of The Month



I'm seeing multiple strange requests from googlebot with random parameters from IPs 66.249.66.xx:
somearticle.html?Dyjdjd=GFGHFHGF
somearticle2.html?FFGH6fh=Cghj46fly
Does anybody else see this?
Is it a kind of "black hat seo" or just G went crazy?
My webpages do not require any parameters.
And GSC does not show any 404 in the crawl stats.

haramamba

7:01 am on Jan 20, 2025 (gmt 0)

Top Contributors Of The Month



UPD:
Now I see these requests in GSC Settings>Crawl stats>Crawl requests: Not found (404).

not2easy

11:14 am on Jan 20, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Googlebot is known to make 'fake' or impossible, non-existent requests just to see what happens. A 404 is a good response.

lucy24

4:58 pm on Jan 20, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's also possible--though it happens far more often with Bing--that they’ve got their shopping list garbled, attaching one site's URLpaths to another site's hostname. Another issue that crops up now and then is when they're following a link from another site, and said site is coded to attach parameters to external links.

Either way, you have done nothing wrong and can safely ignore the whole thing.

mitmat

3:50 am on Jan 24, 2025 (gmt 0)



Where did you found it? cpanel?

haramamba

6:06 am on Jan 24, 2025 (gmt 0)

Top Contributors Of The Month



@mitmat
Nginx logs.

haramamba

7:11 am on Jan 24, 2025 (gmt 0)

Top Contributors Of The Month



The first request is dated 17/Jan.
If these requests did not appear in the Pages > Not found in GSC still, for me it means that g-bot is torturing the server with fake requests.
I'm curious about the real purpose of this #%^& ...

lucy24

6:01 pm on Jan 24, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm curious about the real purpose
Unless you derive personal pleasure from speculating about G###'s motivations, don’t trouble your head about it.

Or, hey, as long as you're studying GSC, pay attention and see how long it takes a given URL to drop off their “not found” list. (Or how long the list can be. I don’t know if it goes by time or number.)

Come to think of it, I could do this myself. I recently restructured one directory, changing it from 58 chapters to 4 parts (corresponding to the book’s original four-volume publication). Since I very much doubt any human has bookmarked any individual chapter, I simply returned a 410 for those 58 chapters. And now I can check GSC periodically to see how long it takes their complaints to die away.

:: detour ::

Today I Learned ... that G’s "not indexed because 404” list is fully a month and a half behind--i.e. the last date shown is something like December 8, when I know for a fact they’ve requested some of those 410 pages as recently as yesterday.

not2easy

6:30 pm on Jan 24, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the last date shown is something like December 8, when I know for a fact they’ve requested some of those 410 pages as recently as yesterday.
looks like updating the index is not a big priority?

haramamba

8:19 pm on Jan 24, 2025 (gmt 0)

Top Contributors Of The Month



pay attention and see how long it takes a given URL to drop off their “not found” list.

3-4 days.
I simply returned a 410 for those 58 chapters. And now I can check GSC periodically to see how long it takes their complaints to die away.

g-bot still request some of the pages that I made 410 2-3 years ago.

lucy24

8:52 pm on Jan 24, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



g-bot still request some of the pages that I made 410 2-3 years ago
Yes, they’ll do it periodically forever. (On my personal site, they still sometimes request /directory/index.html even though I canonicalized everything to /directory/ in, I think, 2012.) But if you’re returning a 410 rather than a 404, the frequency should drop off significantly.

looks like updating the index is not a big priority
The twist is that these specific pages--the 58 old ones, not the 4 new ones--were noindexed all along, so I was surprised to see how often G crawls them. Maybe hoping for the meta noindex to disappear, so they can transfer them all from the “no-indexed because you asked us not to” bin to the “no-indexed because we didn’t feel like it” bin?

haramamba

7:59 am on Jan 25, 2025 (gmt 0)

Top Contributors Of The Month



But if you’re returning a 410 rather than a 404, the frequency should drop off significantly.

Some of them - 11 times per day, each day.
I even tried to put them into the robots.txt and removed after 2-3 months. The pinging resumed immediately.

lucy24

6:15 pm on Jan 25, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you’re annoyed by the crawling, you can certainly robot them out. But if they can’t crawl, they can’t receive the 410.

Request a few URLs yourself and confirm that you’re getting a 410 rather than a 404. (And if you don’t already have a nice human-friendly 410 page, this is the time to make one. The Apache default is scary.)

But, again: If you’ve verified that things are behaving as intended, you are perfectly free to ignore GSC “errors”.