Forum Moderators: phranque

Message Too Old, No Replies

Strange GET requests from bots in my Apache server log

Strange GET requests are appearing in my server log. Is this a problem?

         

jghomestead

2:21 pm on Oct 20, 2011 (gmt 0)

10+ Year Member



I was looking at my Apache server logs and noticed there are hundreds of very strange GET requests from all sorts of bots including Googlebot, Yahoo Slurp, Baidu spider, Bingbot, and others.

They appear to be randomly generated urls for .html pages that do not exist and are totally unrelated to my website.. (My site is for doors.)

Here are a few examples:

67.195.111.184 - - [18/Oct/2011:04:28:36 -0700] "GET www.mysite.com/ofsR-free-printable-probability-worksheets.html HTTP/1.0" 404 24074 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

157.55.18.23 - - [18/Oct/2011:02:15:01 -0700] "GET mysite.com/ofsR-map-us-printable-free.html HTTP/1.1" 404 24074 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

180.76.5.26 - - [18/Oct/2011:05:37:04 -0700] "GET www.mysite.com/ofsR-free-printable-birthday-templates.html HTTP/1.1" 404 24074 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

208.115.111.67 - - [18/Oct/2011:06:12:25 -0700] "GET www.mysite.com/ofsR-printable-coloring-disney-pages.html HTTP/1.1" 404 24074 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"



Here are 2 very odd request examples where both the requested page and the referral page do not exist (the IPs trace to Amsterdam; second example traces to Beijing):

91.224.247.82 - - [18/Oct/2011:05:28:56 -0700] "GET www.mysite.com/images/indext.php HTTP/1.0" 404 24074 "http://www.mysite.com/images/indext.php?u=freelance-hairstylist-powered-by-phpbb" "Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]"

222.33.62.9 - - [18/Oct/2011:08:52:42 -0700] "GET www.mysite.com/images/indext.php HTTP/1.0" 404 24074 "http://www.mysite.com/images/indext.php?u=freelance-hairstylist-powered-by-phpbb" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 3.1)"

Does anyone know why such strange requests are being made?

Does any of this appear to be potentially threatening?

The ones from Amsterdam and China concern me, but I don't know if it is something to worry about or not. If someone could add some insight I would appreciate it.

lucy24

10:50 pm on Oct 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Skim through the last page or so of threads in this forum. Requests for nonexistent urls-- generally from search engines, sometimes even from humans-- are one of the most commonly reported problems.

The dual questions are:

Can I stop them, and if so, how?
How do I deal with the ones coming in?

Replies given in those other threads will give you some ideas about what approach to take.

tangor

10:58 pm on Oct 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Over the years these non-exist urls are test attempts to determine 404 response from servers/sites. Since they are generated uri EXPECTED TO FAIL, attempting to block them is problematical.

lucy24

1:36 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some are genuine mistakes-- for a given definition of "genuine". (The alternative word "stupid" comes to mind.) G### alone is probably responsible for half of them.

Pfui

5:43 pm on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hundreds of matching requests from multiple engines don't strike me as 404 testing.

1.) Have you had your IP address for a long time?

The engines may be traversing links to a site that existed elsewhere and exposed its raw address. In any event, the solution is what your server's doing: 404

2.) Staggering numbers of compromised machines/IPs/ISPs that appear to be 'visitors' are exploit probes.

For example BOTH the Amsterdam IP address [projecthoneypot.org...] AND the Beijing (not Baidu) IP address [projecthoneypot.org...] are actively compromised. 404 is again correct. I'd also deny IPs based how much of a cesspool Project Honey Pot shows their ISPs to be.

3.) In the case of Google, you can get an idea of where bad links are coming from via Google Webmaster Tools (GWT).

After signing up, look around the crawler error report(s) and you'll find the site(s) where G found the links. Note: Hover your cursor over any link before clicking and make sure it's on G; some of the links may go off-site to some very bad areas. Proper response? 404.

Bottom Line:

You can get paranoid about this stuff but it's not worth it because simply being connected to the Internet means your machines -- your site server, your home computer(s), etc. -- are actively targeted by bad guys and buffoons 24/7. That said, vigilance is important.

Eyeballing your logs, checking GWT for large numbers of anomalies, running "ps awx" (if you use command line), staying up to date with software patches (like PHP), these are among the basic steps to take and keep taking at least weekly, more if you suspect trouble (which will never end, btw). Then get back to work making great content:)