Forum Moderators: open

Message Too Old, No Replies

VsuSearchSpider/1.0

seen today

         

SumGuy

11:03 pm on Mar 22, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



I saw this today:

GET /robots.txt
GET /index.html

User Agent was VsuSearchSpider/1.0

Nothing else was showing up in the header fields.

IP was 79.247.146.x (p4ff79298.dip0.t-ipconnect.de)

This is showing a Deutsche Telekom IP address. Many IP's in the vicinity are mapped to "myfritz.net" host names which equate to the MyFritz residential network appliance so this is definately a residential customer vs commercial or business.

lucy24

5:40 am on Mar 23, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you actually have “index.html” as a visible URL? Robots asking for URLs they have heard of somewhere is one thing. Robots asking for wholly imaginary URLs is another thing.

Dunno about anyone else, but I get a fair number of requests for “index.php”--an URL I have never had on any site, so nobody but a malign robot would ask for it. Requests for “index.html” are much rarer, except on one site that predates the canonicalization redirect.

No VsuSearch yet, but I will keep my eyes peeled.

SumGuy

1:27 pm on Mar 23, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



> Do you actually have “index.html” as a visible URL?

When my site was first developed (http) and served by IIS4 the "default" landing page file name was default.html. Much much later when I installed Abyss to serve https, I believe it's "default" page was index.html so all I did was copy default.html to index.html. So both are active and can be requested directly by name if you ask for them.

If you don't request a specific file, a blind http request to (a) my IP will give you a 404 Object not found or (b) a blind request to my domain will give (301) https ://mydomain.tld/default.html. I'm seeing these details by using wget from a (windows) command prompt.

A blind request to https ://mydomain.tld results in wget getting a 200 OK and saving a (text/html) file to "index.html". Wget doesn't show how it got the file name, but it must have some how. Apparently Abyss will also serve index.html from a blind https request to my IP.

Seeing requests for index.php would be very rare in my recollection. It might happen as part of a laundry list of php requests from a bot.

lucy24

4:32 pm on Mar 23, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sounds as if you are long overdue for an index redirect :) I instituted mine in
:: shuffling papers ::
September 2012. (Looking this up, I see it was about half a year after I started using dynamic navigation headers/footers. This seems backward, but logs don't lie.)

Exact syntax varies of course by server type, but there's always something analogous to Apache's
DirectoryIndex index.html
meaning “when someone requests a directory (ending in /), find a physical file with the specified name--the first in the list, if more than one--and serve that”. (One obscure directory on my test site contains no less than five index files, because I experimented on this very point.) So then to avoid the dreaded Duplicate Content you have to redirect the ones who ask for the file by name.