Forum Moderators: open

Message Too Old, No Replies

Why Do Bots Generally Only Visit My Navigation?

         

RedBar

10:27 am on Jun 24, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Apologies if this has been asked before but over the years I have noticed that most bots only visit specific pages such as index, contact, legal, privacy etc, mostly it is only my header navigation and sometimes both header and footer. The amount of times they visit the real meat is very rare.

I have always assumed that this was to:

Prove the site still existed, to take the bare details and sell on that info.
To harvest as many email addresses as possible to sell on.

Please enlighten me as to what else they are doing apart from gobbling-up bandwidth.

not2easy

12:35 pm on Jun 24, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Your assumption may be correct. Could be those widget directory bots? They usually offer contact data via info pagelets and
harvest as many email addresses as possible to sell on.

RedBar

1:35 pm on Jun 24, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So ... these bots actually serve no purpose whatsoever except their own possible sales strategy?

I did know of a trade widget site that sent out a monthly bot to all its bona fide trade links to ensure that all their directory links were still valid and not broken and at that time they would actually contact the site owner if something was broken.

Any other genuine purposes?

not2easy

1:59 pm on Jun 24, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I see some sites like that when I search for local doctors, dentists and medical offices. They often have no website but their location and phone numbers are listed on this type of directory sites. It appears they get data from medical registries(?) but without Yellow Pages (or phone books) they serve a purpose.

I'm guessing medical registries like state licensing offices because it has all kinds of data about when and where they studied and graduated. I'm sure they do better in rural areas. Yes, it is possible to get the information without giving up an email but I'd guess the average user wouldn't figure it out. These are not high tech sites.

It is likely that other search engines use their data because that's where I find them.

lucy24

3:39 pm on Jun 24, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For years I've been visited by an ever-evolving series of what I call the /contact botnet. The exact pattern varies, but it's always {some page or pages}--often blocked--followed by the contact page giving that first page as referer. This, in and of itself, is a plausible pattern for humans, since the contact page is linked from the 403 page. But humans also request supporting files, while robots don't.

For a couple of years they seem to have made some attempt to use the /contact page, since the GET was followed by a POST. But it must have been ineptly done, since it never resulted in any actual message reaching my inbox. Oddly, they seem to have given up this aspect long before I got rid of the contact form, replacing it with email. And no, I don't get junk mail this way either.

I think robots are taught to recognize certain basic names like "contact" or "legal", and will home in on those pages regardless of the full URL.

:: detour to check something in logs ::

Since /contact is in a roboted-out directory, requests for /contact preceded by robots.txt are vanishingly rare. (They should, of course, be nonexistent, but there are always glitches and bad actors.) While looking this up, I do find the occasional robots that request the root followed by all pages linked therefrom. And robots.txt, whether for verisimiltude or in search of information.

RedBar

9:59 am on Jun 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So lucy24, we have all these bots chasing around doing what apart from distorting real site visitor numbers?

There must be some achieving something that we probably do not see? Matrix reveal yourself :-)

lucy24

3:35 pm on Jun 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



distorting real site visitor numbers
Now, that's a whole nother issue. In order to affect your numbers, a robot would have to either request the analytics file (much easier if you use something like GA that isn't subject to your own access controls) OR request all supporting files (if you're basing your numbers on raw logs). Both of those are pretty uncommon. Though I do wonder about one category of requests: the ones that ask for the root without referer at HTTP, get redirected to HTTPS, and there request all supporting files except analytics and favicon. I'm increasingly suspicious that many of these are in fact convincingly humanoid robots, but I just can't be bothered to look closer if all they're getting is the root.