Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Determine source of a 403/Forbidden

     
2:51 am on Jun 28, 2013 (gmt 0)

Junior Member

5+ Year Member

joined:July 13, 2010
posts:170
votes: 0


A while back, I explicitly blocked two bots, but I don't recall how I did it, to remove the ban.

I checked the site's .htaccess, nothing there to block them (no user agent/partial UA, no IP or IP range present). I checked mod_security (the way I block at server level), nothing there.

Is there an apache log that can tell me what triggered the 403?
4:39 am on June 28, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12717
votes: 244


Is it your own server or isn't it? htaccess implies no; "mod_security at server level" implies yes. mod_security seems to add pretty detailed comments to the error log-- but I think they all come through as 500-class errors, so don't spend too much time there.

I took a quick detour to MAMP and tried locking myself out. Even at LogLevel "debug" it still says nothing more than "client denied by server configuration". Grrr.

There are lots of ways to lock people out, but most of them wouldn't apply to a potentially desirable robot. A referer block, for example-- but surely your robots don't come with their own referer?

If there's any chance you blocked them via mod_rewrite, you could try running a RewriteLog and see what turns up. Just to confuse you, there's no on/off setting or LogLevel, you just have to specify a file. But it can't be done in htaccess, so you're looking at restarting your server :(
4:44 am on June 28, 2013 (gmt 0)

Junior Member

5+ Year Member

joined:July 13, 2010
posts:170
votes: 0


Yes it is a dedicated server. Neither bot has a referrer field. I've checked high and low for references to their IPs/User agents and even ran grep on public_html.

If there's any chance you blocked them via mod_rewrite

No, I do the rewrites via .htaccess (checked there too), and I checked the server's root folder (the page that displays if you type the IP in the address bar) and there wasn't an .htaccess present.

you could try running a RewriteLog and see what turns up. Just to confuse you, there's no on/off setting or LogLevel, you just have to specify a file. But it can't be done in htaccess, so you're looking at restarting your server

Officially confused. How do I run a RewriteLog?
6:25 am on June 28, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12717
votes: 244


Make up a file name and tell your server about it :)

But first:

You can test the UA question pretty easily by using any browser that will let you fake a user-agent. Give the exact text that your robots use, and see if you can get into your site. If no, there is a UA block somewhere. If yes, keep looking for IP.

I really doubt you want to take the RewriteLog approach. If you do, the format is

RewriteLog file-path

where-- says Apache--
If the name does not begin with a slash ('/') then it is assumed to be relative to the Server Root. The directive should occur only once per server config.

And once you've done that, you then have to specify a LogLevel. (If you set a log level without naming a file, logs simply vanish into the ether. If you name a file without setting a non-zero log level, no logging gets done. mod_rewrite always has to do things differently from all other mods.)

RewriteLogLevel some-number-from-1-through-9

Apache also says-- with exclamation marks--
Using a high value for Level will slow down your Apache server dramatically! Use the rewriting logfile at a Level greater than 2 only for debugging!


All of this strikes me as a last-resort solution if all you're trying to find out is how the ### you blocked those robots. In fact, this whole section of the docs gives the impression that Apache just isn't all that happy about the RewriteLog idea at all :)
6:47 am on June 28, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Look in the CP and Security Section for Deny IP.
7:31 am on June 28, 2013 (gmt 0)

Junior Member

5+ Year Member

joined:July 13, 2010
posts:170
votes: 0


Interesting..

I used the firefox user agent switcher & copied one of the UA's from my 403 log.

I tried to access one of the pages the bots were looking for.
In the log, my visit showed up as broken images and i saw only the text/links of the 403 page. (Error document /403.php - for example)

When the actual bots visit, it's just 1 click and only the page they tried to reach shows up in the logs. The css/images, etc don't show up in the 403 logs.

The 403 logs is a script I've written myself that logs every hit to "403.php"

That makes me wonder if this is IP based, then.

I really doubt you want to take the RewriteLog approach.

This seems a little scary. If I make a mistake and fill the logs up & cause a crash to the server, it'll take 3+ hrs for the data center to reboot, if my last accidental crash from filling up logs is any indication.

Look in the CP and Security Section for Deny IP.

I've checked the firwall & didn't see their IPs. Also, when an IP is firewalled they don't show up in my 403 logs.
8:34 am on June 28, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12717
votes: 244


When the actual bots visit, it's just 1 click and only the page they tried to reach shows up in the logs. The css/images, etc don't show up in the 403 logs.

Uhm.... They're robots. Except in the rarest, most exceptional cases, robots never take anything but the html itself.

I tried to access one of the pages the bots were looking for.
In the log, my visit showed up as broken images and i saw only the text/links of the 403 page.

This is a little obscure. Do you mean that you, yourself, saw broken-image icons onscreen? And there are supposed to be images on the 403 page? If so, you have learned something very useful and potentially embarrassing in a "been there, done that" kind of way.

You have to make sure that everything needed by your 403 page is accessible to those who have been locked out. Numerically most 403s go to robots, who don't even look at your 403 page. But the page exists for the benefit of humans who took a wrong turn-- most often, by asking for a directory that doesn't have an index. So you have to poke a hole for them.

If the 403 comes from mod_rewrite, make a preliminary RewriteRule that says "if the request is for anything used by the 403 page, let them through". If the 403 comes from mod_authz-whatsit, make a <FilesMatch> envelope that says Allow from all.

And so on.
8:37 am on June 28, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


if you are using apache 2.4 the directives for logging mod_rewrite changed.

in 2.4 you can turn on logging for any module, so that could be useful.