Welcome to WebmasterWorld Guest from 54.242.115.55

Forum Moderators: Ocean10000 & phranque

Google testing for index types?

I assume it's just Google testing for index types...

     
3:59 pm on Mar 11, 2019 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts: 142
votes: 11


From error log:

(index.php,index.php5,index.php4,index.php3,index.perl,index.pl,index.plx,index.ppl,index.cgi,index.jsp,index.jp,index.phtml,index.shtml,index.xhtml,index.html,index.htm,index.wml,Default.html,Default.htm,default.html,default.htm,home.html,home.htm,index.js) found, and server-generated directory index forbidden by Options directive, referer: https://www.google.com/webmasters/tools/crawl-errors?hl=en&siteUrl=https://www.example.com/


Just found it interesting, probably already been discussed - I assume it's just Google testing for index types?
10:48 pm on Mar 11, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3446
votes: 67


i would have though most sites would block access to the actual 'page' as they would want access only to '/'
so i can't see the purpose of this.

in the same way a large number of sites are extensionless these days (perhaps the majority) so access to filename.php/.asp/.htm/.etc would also be blocked.
11:46 pm on Mar 11, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11608
votes: 193


From error log

did you use that timestamp to check the server access log for the equivalent request?

to me that looks like someone with access to GSC clicked on the crawl errors report (in the "old" search console) and then navigated to the home page url for that site.
you'll probably find that request originated from your IP using your user agent to request the resource.
(or hopefully someone else within your organization)

the message is a normal "error condition" level message for mod_autoindex.
i.e. somewhere in your config you probably have a "LogLevel autoindex:error" specified.
if you want to not show these errors you must increase the level, e.g.:
LogLevel autoindex:crit

(index.php,index.php5,index.php4,index.php3,index.perl,index.pl,index.plx,index.ppl,index.cgi,index.jsp,index.jp,index.phtml,index.shtml,index.xhtml,index.html,index.htm,index.wml,Default.html,Default.htm,default.html,default.htm,home.html,home.htm,index.js)

i'm guessing this is extracted from the list of index files provided in the DirectoryIndex directive.
11:51 pm on Mar 11, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11608
votes: 193


i would have though most sites would block access to the actual 'page' as they would want access only to '/'
so i can't see the purpose of this.

sites should 301 redirect requests for directory index documents to the (trailing slash) directory path.
e.g. https://www.example.com/index.php should be 301 redirected to https://www.example.com/

mod_autoindex doesn't come into play until a (trailing slash) directory url path is requested.
12:29 am on Mar 12, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9233
votes: 780


Google tests everything. Period. As phranque suggests much of this can be controlled on your side with a few redirects ... but that won't change g's behavior. Bing is not quite as aggressive, but does similar.

Meanwhile, your error log is working just fine. :)
12:47 am on Mar 12, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15494
votes: 744


if you want to not show these errors
But only if itís your own server; you canít set it in htaccess. Anyway, this type of request doesnít require consulting the Error Log, since they should each show up in access logs as a 404 as well.

OP, you left out the single most important piece of information: were these files requested by the googlebot? Otherwise itís just another malign robot sending a bogus referer. (And since when does Google itself give GSC as a referer?)

You can have multiple index files in the same directory, and set more than one to be the DirectoryIndex--but once the server has found one on the list, it stops looking. Others then have to be requested by name. I remember once playing with this on my test site; in fact Iíve still got the directory, with a slew of index.htm and index.php and so on. Unlike LogLevel, DirectoryIndex can easily be changed in htaccess. You just have to remember it's there, if you make different settings for one directory.

I get heaps of requests for index.php, but that's just malign robots doing their thing--looking for WP vulnerabilities and the like. Never seen the weird extensions.

:: detour to recent logs ::

Nope, nothing but the occasional index.html. In fact, G must have been doing a periodic spot-check pretty exactly a year ago; on one date in March 2018 I find requests for /index.html of almost every directory I own--on a site where these have never been visible URLs.
1:58 pm on Mar 12, 2019 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts: 142
votes: 11


Thanks for the answers

Directory and index pages all redirect to trailing slash. File names have .php extensions - without trailing slash

I found that particular error in cpanel last errors or whatever its called - not somewhere I usually look for problems as I have a dedi server but I was just browsing around and saw it. I did go to GSC that morning, the old one, so I probably triggered it.

Here is the error again as it was and I included the line both above and below it:

[Mon Mar 11 11:33:26.040422 2019] [core:info] [pid NUMBER:tid NUMBER] [client BINGBOT IP] AH00128: File does not exist: /home/example/public_html/dir/dir/non-existent-page.php
(index.php,index.php5,index.php4,index.php3,index.perl,index.pl,index.plx,index.ppl,index.cgi,index.jsp,index.jp,index.phtml,index.shtml,index.xhtml,index.html,index.htm,index.wml,Default.html,Default.htm,default.html,default.htm,home.html,home.htm,index.js) found, and server-generated directory index forbidden by Options directive, referer: https://www.google.com/webmasters/tools/crawl-errors?hl=en&siteUrl=https://www.example.com/
[Mon Mar 11 11:29:07.358159 2019] [:error] [pid NUMBER:tid NUMBER] [client BINGBOT IP:0] File does not exist: /home/example/public_html/dir/dir/non-existent-page.php


No major drama, I just thought the possible number of index. files was interesting :-)
6:26 pm on Mar 12, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15494
votes: 744


Do the parentheses mean that (one or more items in this long list) was found? The error would make more sense if (none-of-the-above) were found, which is the only reason a server would need to check for auto-generated index permission.

client BINGBOT IP
For a couple of years now, the bingbot has had an irritating habit of asking for lowercase.html when the actual filename (which it also asks for) is CamelCase.html. This leads to a fair number of bingbot 404s.
11:34 pm on Mar 12, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11608
votes: 193


bingbot has had an irritating habit of asking for lowercase.html when the actual filename (which it also asks for) is CamelCase.html

i would call this corporate guilt for past misdeeds.
afaik the windows server os default setting was and still is for case-insensitive file names:
Configure Case Sensitivity for File and Folder Names [docs.microsoft.com]