Welcome to WebmasterWorld Guest from 23.20.147.6

Forum Moderators: Ocean10000 & incrediBILL & phranque

Returning 404s to 'bots, instead of 403s

rbots,404,403

     
12:20 am on Mar 14, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


I'd like to ask Lucy24 this question, but cannot on the Search Engine Spider and UA forum. [webmasterworld.com...]

Most robots are missing headers; this one's sending too many. The rest get a 404-- not because those WP files don't exist, although they don't, but because I return a manual 404 to almost all .php requests. (It's the same amount of work for the server as a 403, and conveys no information--or, better yet, conveys false information--to the visitor.)

From what I understand, you return a 404 to 'bots instead of '403s. This might throw them off and I'd like the idea. How do you do this? Can you give me some htaccess code? Thanks in advance, and as always I'm grateful for all your help.

[edited by: TorontoBoy at 12:37 am (utc) on Mar 14, 2017]

12:24 am on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


I started using 404s for: wp-login.php, admin.php and other files I don't have on my server. There's no need to block the request if you don't have the file, and a 404 is the proper server response. I found the requests for these files decreased.
12:39 am on Mar 14, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


My WP and Drupal sites return a pretty polite (really large sized) message stating the obvious "file not found". I'd rather not send that polite message and trap the 404 before it hits my sites, and return nothing.
12:48 am on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


You can't use custom, light-weight error pages?

ErrorDocument 403 /forbidden.html
ErrorDocument 404 /error.html
2:45 am on Mar 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13674
votes: 439


If your manual 404 is located before the WP section of htaccess, you should be able to use your own ErrorDocument (a document which really exists, so it will never get involved with the WP code). Then again, if you're only using the manual 404 for malign robots, why give them any information at all? Say something like
ErrorDocument 404 default
which offers only a minimalist error message.

This will not affect anything handled by WP, since those 404s don't come from the server.

The code will say things like
RewriteCond %{THE_REQUEST} /includes
RewriteRule ^includes/ - [R=404]

RewriteCond %{THE_REQUEST} \.php
RewriteRule \.php$ - [R=404]
The second rule may require more detailed conditions if you've really got some URLs in .php. The purpose of the Condition is to pass internal requests for "index.php" and the like, or pages that are rewritten from .html URL to .php content.

:: standing back to admire how neatly keyplyr has impersonated me ;) ::
3:08 am on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


:: standing back to admire how neatly keyplyr has impersonated me ;) ::
I'm holding a cat.
3:08 am on Mar 14, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10846
votes: 61


welcome to WebmasterWorld, TorontoBoy! [webmasterworld.com]

From what I understand, you return a 404 to 'bots instead of '403s. This might throw them off and I'd like the idea. How do you do this? Can you give me some htaccess code?


we can't write your code for you but we can help you write it.
this philosophy is explained in the Apache Web Server forum Charter:
https://www.webmasterworld.com/apache/charter.htm [webmasterworld.com]

what are you actually trying to block?
all requests for a set of url paths?
all requests that provide a specific request header or header value?

you can create a mod_rewrite ruleset that checks for provided request headers/values and/or requested url paths and returns a 410 (Gone) response status code, which is equivalent to a 404 (Not Found) and doesn't expose the intention of a 403 (Forbidden).
3:46 am on Mar 14, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3227
votes: 146


If the site has WordPress installed, it will use the Wordpress 404.php file even if you have a custom 404 page and have specified that custom error page before the WP section in the htaccess file. The only way my custom 404 page gets called is when a static page is requested and not found. Though I have custom 404 pages in various folders that work just fine. If a WP URL is requested, it gets the WP 404.php page by default (ignoring htaccess).

IF the files requested are WP files (such as wp-login.php for example) it will only return the WP default 404.php file. That file can be customized if you want a nice 404 page, but it is not a "lightweight" file. It calls all the theme resources and headers and logos and scripts.

IF the request is being served a 404 in htaccess (and never allowed anything but a 404) it would likely work, but it would work for every request, unless you can exempt yourself.

Just a thought - if 404.php were to be deleted or renamed, then it might return the server default (?) or at least the custom 404 which is easily a smaller file. I'm not interested enough to test that, sorry. For WP files that actually exist (such as wp-login.php for example), I just put a captcha on it and they quit bothering. At least I don't see those scripted bots with dozens or hundreds of scripted attempts any more.
5:40 am on Mar 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13674
votes: 439


If the site has WordPress installed, it will use the Wordpress 404.php file even if you have a custom 404 page and have specified that custom error page before the WP section in the htaccess file.

But that's because a WP 404 is not generated by the server. In your logs it will show as a 200, meaning that the request was successfully passed on to "index.php" or similar. From there, WP sends back the error document along with the 404 header.

There's a finite number of WP files that physically exist (and which are therefore not subject to the !-f condition). Here, again, you could do something like
RewriteCond %{REMOTE_ADDR} !^11\.22\.33\.44
RewriteRule (admin|user) - [R=404]
(no anchors, to allow for all possible options) or, if that's too fuzzy for your taste,
RewriteCond %{REMOTE_ADDR} !^11\.22\.33\.44
RewriteRule ^(wp-login|bitrix|admin|user) - [R=404]
with no closing anchor. The REMOTE_ADDR is, of course, your own IP. That's assuming you are not on AOL dialup or something that similarly changes on every request. (Even then, try using just the first three numbers; how many botrunners live at your own ISP?) This, too, has to come before the WP section of htaccess--not because it won't work otherwise, but so the server doesn't have to do the resource-greedy -f test. After all, the whole point is that it doesn't matter if the file exists or not; all that matters is that the malign robot goes away thinking it doesn't exist.

Edit: I double-checked in the Apache docs. Emphasis mine:
Any valid HTTP response status code may be specified, using the syntax [R=305], with a 302 status code being used by default if none is specified. The status code specified need not necessarily be a redirect (3xx) status code. However, if a status code is outside the redirect range (300-399) then the substitution string is dropped entirely, and rewriting is stopped as if the L were used.

Right, so you don't need the [L] flag (though it does no harm, if its absence makes you uneasy).
6:05 am on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


RewriteCond %{REMOTE_ADDR} !^11\.22\.33\.44
RewriteRule ^(wp-login|bitrix|admin|user) - [R=404]
Problem is, these vulnerability attempts come from thousands of compromised accounts that keep changing.
12:00 pm on Mar 14, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


I am protecting 4 WP and a Drupal site, all in separate subdirectories under public_html. I've tried RewriteCond but due to inheritance they do not trickle down. Thus I have moved to SetEnvIf commands in an htaccess in public_html, which does inherit down. Here is what has worked for me:

SetEnvIf User-Agent "yoozBot" keep_out

order allow,deny
allow from all
deny from env=keep_out

What I'd like to do is Lucy's strategy that if I find a bot repeatedly looking for security holes, instead of returning 403s I'd like to return 404s with no info. Why give the bots any more return info than necessary?

If I cannot do this with SetEnvIfs I can do RewriteCond but must replicate this to all my WP installs, which is more maintenance.
12:10 pm on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


I've tried RewriteCond but due to inheritance they do not trickle down.
You'll need to add a couple lines of code to the sub-directories to "trickle down."
RewriteEngine on
RewriteOptions inherit
12:26 pm on Mar 14, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


I have asked my host provider, who said the RewriteOptions inherit does not work. I tried this as well a couple of months ago. It did not work, because some of my RewriteConds in the parent are not applicable to the child and cause errors. This is why I moved to SetEndIf. I will look into this again. All my subdirs have individual htaccess.

Maybe using SetEnvIf in my public_html's htaccess I cannot select an error message, and must make do with error 403. Otherwise i would need to edit all subdir htaccess and add RewriteOptionsCond?

Thanks.
12:38 pm on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


I tried this as well a couple of months ago. It did not work, because some of my RewriteConds in the parent are not applicable to the child and cause errors.
Those are the rules that should be in the unique htaccess for each site. The global (general) rules should be at base level (htaccess in public_html.)
6:51 pm on Mar 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13674
votes: 439


these vulnerability attempts come from thousands of compromised accounts that keep changing.

Well, yeah, that's the point of the negative condition. If the request comes from anyone other than yourself...

Even with RewriteOptions inherit (or the wider range of options in 2.4), inheritance doesn't work the same as it does with other mods. In particular, the same request will never be subject to more than one RewriteRule. I'd say: don't bother with RewriteRules in a shared htaccess, whether userspace or primary domain.

Incidentally, what have you got against the YoozBot? My impression is that it's compliant. (I don't read Farsi, but it looks like a search engine.)
9:24 pm on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


that's the point of the negative condition. If the request comes from anyone other than yourself
TorontoBoy - wouldn't everyone be requesting WP files, even the regular site visitor, since these are WP sites?
10:04 pm on Mar 14, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3227
votes: 146


@keyplyr - I think that's the point of listing the specific requests that would trigger the 404:
(wp-login|bitrix|admin|user)


Visitors have no reason to request those files, WP pages are standard URLs that would not normally contain "wp-login" or "admin". SOME sites have members, in which case "user" might be part of a legitimate request. I'm thinking that the end user would need to study and edit to suit their purposes.
10:22 pm on Mar 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


Visitors have no reason to request those files
How does one log in to a WP site if they are blocked from sending their credentials via wp-login?

I haven't set up a WP in a couple years, but as I remember unless it was customized, wp-login was one of the standard plugins.
10:34 pm on Mar 14, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3287
votes: 23


if you've got multiple sites on the same server and can edit httpd.conf then consider using ALIAS
i think you can also use an htacces file with ALIAS ... put the .htaccess in your main public folder (i've never done this with htaccess, so am not entirely sure)

here's a snippet from my set up


Alias /wp-activate.php "/var/www/path/to/custom/404.php"
Alias /wp-admin "/var/www/path/to/custom/404.php"
Alias /wp-blog-header.php "/var/www/path/to/custom/404.php"
Alias /wp-check.php "/var/www/path/to/custom/404.php"
Alias /wp-checking.php "/var/www/path/to/custom/404.php"
Alias /wp-config.php "/var/www/path/to/custom/404.php"


in my case 404.php logs the request to a special log file and then serves a 404
you can add as many alias's as you like,
it's astonishing how many of these files are requested.
i review my regular logs and add add new Alias rules when new files are requested a lot

you can also do stuff like
Alias /wp-

anyway, it works for me.
12:14 am on Mar 15, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10846
votes: 61


i think you can also use an htacces file with ALIAS


https://httpd.apache.org/docs/current/mod/mod_alias.html#alias
Alias Directive
...
Context: server config, virtual host, directory


Context:
https://httpd.apache.org/docs/current/mod/directive-dict.html#Context

i.e. no Alias directive in .htaccess context.

this makes sense since being able to alias outside of the document root means you should probably also have server access to set file/directory ownership/permissions.
12:19 am on Mar 15, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13674
votes: 439


as I remember unless it was customized, wp-login was one of the standard plugins
Oh, my bad if so. I thought wp-login was one of those files that only the site admistrator uses.

If it's for everyone, then make sure your users' login credentials are solid. ("password" is not a strong password.) A bit of a bother, though, because even if the robot gets hit by a 401 (or whatever WP returns), they still come away knowing that this is a WP site.
12:21 am on Mar 15, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3287
votes: 23


ah, thanks, phranque - i don't use htaccess only the config file directly.

however the OP is using multiple sites and may well have a VPS ...in which case they could probably edit the config file (which is altogether better than using htaccess anyway, for everything)
12:56 am on Mar 15, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3227
votes: 146


The wp-login file is only for the Admin. OR only for the Admin and registered users. Which is what this was about:
SOME sites have members, in which case "user" might be part of a legitimate request. I'm thinking that the end user would need to study and edit to suit their purposes.
1:20 am on Mar 15, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 81
votes: 4


As for Wordpress, admins and any contributors use wp-login.php to authenticate. To get to the admin area of the site, where authenticated users can add documents they would need to access files in the wp-admin subdir. The theme and content are held in the wp-content subdir. Javascript is held in the wp-includes subdir. These would all need to be accessible for viewing.

I don't recognize YoozBot and am not targeting their language, so I banned them. They visit me, I don't know them, tried to research them, didn't get enough info, so ban.

I can't get the Inheritance to work for my Apache 2.2.9 RewriteOptions inherit. I tried, researched and failed to get it to work. SetInvIf works well, so I'll stick to it and send all bad bots 403s instead of a combo of 404s and 403s.

Thanks anyway, and a good thread, though it seemed to run away from me!
2:57 am on Mar 15, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13674
votes: 439


Oh, OK, so wp-admin is still concerned with a small, limited number of users, not with everyone who visits the site. If you're letting other people have this kind of access, I think you're entitled to know their IP address. (Well, presumably you know it already; it's in logs.) So you'll need a longer list of conditions. I'd put it in the form

RewriteCond %{REMOTE_ADDR} !^your-own-IP-here
RewriteCond %{REMOTE_ADDR} !^(11.22.33.44|55.66.77.88) et cetera, grouping your other members together.

This is on the assumption that you access the page more often than everyone else, making the first condition more likely to fail than the second one.

Reminder about robots: Even if you physically block them, always disallow them in robots.txt too. (Belt and suspenders.) The only thing better than a blocked request is a request that isn't made in the first place--and some robots really are compliant, even among the ones you don't personally have any use for.
3:02 am on Mar 15, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8289
votes: 331


so wp-admin is still concerned with a small, limited number of users, not with everyone who visits the site.
That would depend on the site. Many WP sites allow posting & uploads (images, audio) for members.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members