Forum Moderators: open

Message Too Old, No Replies

Screaming Frog SEO Spider/14.1

What is the use-case here?

         

SumGuy

10:39 pm on Mar 14, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



When you see hits to, say, robots.txt and also your landing page, where the user-agent is:

Screaming Frog SEO Spider/14.1

From a residential ISP (that IP- geo-locates to my city) - what exactly is going on?

If these are hits from someone's home device (PC? phone? TV? Refrigerator?) what are the nature of these hits? Of what use, in a residential or perhaps small business setting, are they being put to?

phranque

11:47 pm on Mar 14, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



typically Screaming Frog is used by someone doing some form of technical SEO or has other reasons to crawl a web site or list of urls.
by default, Screaming Frog SEO Spider will respect robots.txt directives.
however if you have a licensed version, it is a simple matter to change the configuration settings so that robots.txt is ignored.
it is easy to run the spider from a home computer.
it is also not difficult to run it in the cloud, so you may also see hits from AWS, etc IPs.
it could be a local marketer or competitor doing some research.

phranque

11:54 pm on Mar 14, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



also SF released v18.0 three months ago [twitter.com] so that (/14.1) might actually be Croaking Frog

SumGuy

11:57 pm on Mar 14, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



I'm not blocking that user-agent in the robots file. There's no crawling going on, just asking for the robots file and my default.html landing page file.

Have only seen this maybe 3 times in the past month or 2, from the same IP.

From what I read about it, the vast, vast, vast, practically entire use it has is for someone to technically analyze their own website (or a site they manage). If you're pointing it at someone else's web site, and you're not crawling it, then what's the point?

Also, if you're interested in someone elses site (from what, a traffic or search pov?) then aren't you going to go to other sites that try to gather site popularity metrics?

[edited by: SumGuy at 12:08 am (utc) on Mar 15, 2023]

phranque

12:03 am on Mar 15, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



perhaps they have a list of sites of which they're checking the home pages occasionally for whatever reason.

Brett_Tabke

12:56 am on Mar 15, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



they can be validating back links. if the other site links to you, they are checking to make sure you are still there.

Just used screaming frog yesterday and found an error I did not know I had created ... here.

lucy24

1:39 am on Mar 15, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to logs ::

OK, then, why do they keep requesting /adminer.php or, for variety's sake, /db/ neither of which exists on the site? I even found a /sql%20imported%20files/ --which just screams “malign robot”, doesn’t it. Everything is 403, suggesting header deficits for which no hole has been poked. (Supplementary visit to logged headers confirms this.)

Nary a request for robots.txt, though.

phranque

3:36 am on Mar 15, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Everything is 403, suggesting header deficits for which no hole has been poked.

with a paid license, it is a simple matter to customize HTTP Request headers sent by SF to look more human-like.
a well-informed user would do so.
when using SF i often spoof the UA to get past the simplest server block attempts.

Nary a request for robots.txt, though.

if you have a licensed version, it is a simple matter to change the configuration settings so that robots.txt is ignored.

lucy24

6:32 pm on Mar 15, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huh. So 100% of my SF hits are from licensed users who have changed the settings to skip robots.txt, but have not changed the settings to humanize their headers? (No point in licensing just so you can change the UA, since any current browser will let you do that.)

Nah. They're probably just fakers.

phranque

10:20 pm on Mar 15, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



humanize their headers

i think you are giving most SF user too much credit.
it is a simple matter to notice that your SF request was blocked by robots.txt.
it takes quite a bit more to fill the gaps for 403 responses due to non-human headers.

They're probably just fakers.

i'm trying to imagine why a faker would go out of their way to select this UA instead of a more human-like UA.

lucy24

12:00 am on Mar 16, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: deeper delve into archived logs, followed by look at robots.txt and header access file, which I forgot to do earlier ::

Screaming Frog has been disallowed in robots.txt since some time in 2019, as the first step in deciding whether to poke a hole. In their case actually the second step, because that “sometime in 2019” is when I put them on a separate line after the more concise shared-block didn't work.

all SF requests: 133
SF requests for robots.txt: 16, or about 1/8 of all requests (a few of them in current non-archived logs, so I don't know how I overlooked them before)
SF robots.txt requests followed by request for one page: 15, i.e. all but one of them, including several that definitely encountered a Disallow

Why bother to ask for robots.txt if they have already decided to ignore it? Are “ask” and “honor” separate settings?

Unlike the once-popular fake Googlebot, which is easily flagged by IP, they might find SF authorized by name, so that's a reason for claiming to be SF. I think it is safe to say that a legitimate Screaming Frog would not be asking for, say, /admin/ on a site that does not have such a file (and would certainly not have external links to it in any case).

tangor

1:21 am on Mar 21, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is the first month I've seen Screaming Frog in the logs ... 4 hits, mixed bag. Could this be a new thing for bad actors?

SumGuy

11:02 pm on Mar 22, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



> Could this be a new thing for bad actors?

If you're seeing them from residential IP's (like I did) then I'd have to wonder how the bad actors are doing it. And I don't mean residential IP's in the third world or asia (which are heavily comprimized by hacked routers).