Forum Moderators: phranque

Message Too Old, No Replies

Please somebody check this .htaccess and robots.txt is correct or not

Please somebody check this .htaccess and robots.txt is correct or not

         

classicsads

4:07 pm on Jul 18, 2010 (gmt 0)

10+ Year Member



I have trouble in my website, some unknown robots are eating my bandwidth like anything (still 43,000 mb eaten in a week). And i am not familiar in .htaccess and robots.txt creating.

Reports given below:

Unknown robot (identified by empty user agent string)
Unknown robot (identified by 'spider')
Unknown robot (identified by 'robot')
Unknown robot (identified by 'bot*')
Unknown robot (identified by 'crawl')
Unknown robot (identified by hit on 'robots.txt')


So i added this code in my .htaccess file, will this avoid these bots and spiders eating my bandwidth.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Yeti/0.01 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu's [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteCond %{HTTP_USER_AGENT} ^BadBot
RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR]
RewriteCond %{HTTP_USER_AGENT} ^FakeUser
RewriteCond %{HTTP_USER_AGENT} ^LinksManager.com_bot
RewriteCond %{HTTP_USER_AGENT} ""
RewriteCond %{HTTP_USER_AGENT} ^Java
RewriteCond %{HTTP_USER_AGENT} ^Jakarta
RewriteCond %{HTTP_USER_AGENT} User-Agent
RewriteCond %{HTTP_USER_AGENT} compatible
RewriteCond %{HTTP_USER_AGENT} "Mozilla"
RewriteCond %{HTTP_USER_AGENT} panscient.com
RewriteCond %{HTTP_USER_AGENT} libwww
RewriteCond %{HTTP_USER_AGENT} lwp-trivial
RewriteCond %{HTTP_USER_AGENT} curl
RewriteCond %{HTTP_USER_AGENT} PHP/
RewriteCond %{HTTP_USER_AGENT} urllib
RewriteCond %{HTTP_USER_AGENT} GT::WWW
RewriteCond %{HTTP_USER_AGENT} Snoopy
RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample
RewriteCond %{HTTP_USER_AGENT} HTTP::Lite
RewriteCond %{HTTP_USER_AGENT} PHPCrawl
RewriteCond %{HTTP_USER_AGENT} URI::Fetch
RewriteCond %{HTTP_USER_AGENT} Zend_Http_Client
RewriteCond %{HTTP_USER_AGENT} PECL::HTTP
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*lastexample\.com [NC]
RewriteCond %{HTTP_REFERER} ^http://.*LinksManager.com\.com [NC]
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0$ [OR]
RewriteCOnd %{QUERY_STRING} Esystem [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR]
RewriteCond %{HTTP_REFERER} ^-$
RewriteRule .* - [F]

And currently my robots.txt file details are given below:

User-agent: *
Disallow: /cgi-bin/
Disallow: /cron/
Disallow: /admin/
Disallow: /backup/
Disallow: /includes/
Disallow: /images/
Disallow: /lib/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /blog/guestbook/
Disallow: /blog/conf/
Disallow: /blog/getdetail/
Disallow: /blog/getpic/
Disallow: /blog/getthumb/
Disallow: /blog/htsrv/
Disallow: /blog/image/
Disallow: /blog/inc/
Disallow: /blog/locales/
Disallow: /blog/media/
Disallow: /blog/plugins/
Disallow: /blog/rsc/
Disallow: /blog/xmlsrv/
Disallow: /blog/feeds/
Disallow: /blog/a/
Disallow: /forum/admin/
Disallow: /forum/images/
Disallow: /forum/inc/
Disallow: /forum/uploads/
Disallow: /forum/jscripts
Disallow: /forum/cache
Disallow: /forum/captcha.php
Disallow: /forum/editpost.php
Disallow: /forum/misc.php
Disallow: /forum/modcp.php
Disallow: /forum/moderation.php
Disallow: /forum/newreply.php
Disallow: /forum/newthread.php
Disallow: /forum/online.php
Disallow: /forum/printthread.php
Disallow: /forum/private.php
Disallow: /forum/ratethread.php
Disallow: /forum/report.php
Disallow: /forum/reputation.php
Disallow: /forum/search.php
Disallow: /forum/sendthread.php
Disallow: /forum/task.php
Disallow: /forum/usercp.php
Disallow: /forum/usercp2.php
Disallow: /forum/calendar.php
Disallow: /forum/*action=emailuser*
Disallow: /forum/*action=nextnewest*
Disallow: /forum/*action=nextoldest*
Disallow: /forum/*year=*
Disallow: /forum/*action=weekview*
Disallow: /forum/*action=nextnewest*
Disallow: /forum/*action=nextoldest*
Disallow: /forum/*sort=*
Disallow: /forum/*order=*
Disallow: /forum/*mode=*
Disallow: /forum/*datecut=*

these code's are currently placed in my robots.txt and .htaccess file but after adding this code also i found the same results given below in my awstats.

Unknown robot (identified by empty user agent string)
Unknown robot (identified by 'spider')
Unknown robot (identified by 'robot')
Unknown robot (identified by 'bot*')
Unknown robot (identified by 'crawl')
Unknown robot (identified by hit on 'robots.txt')


Please some body fix my issue thanks in advance.

my website address is [classicsads.com...]

wilderness

6:40 pm on Jul 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Look at this old thread [webmasterworld.com] (similar to yours), review your "raw logs" and review the forum library.

jdMorgan

6:58 pm on Jul 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reason that your new code does not block unwelcome robots is that you have omitted the [OR] flag on your RewriteConds. As a result, only a user-agent that contains *all* or those user-agent strings would be blocked.

Add the [OR] flag to all RewriteConds except for the last one just above the RewriteRule.

I strongly suggest that you go through that overly-long list, and delete the ones that never visit your site -- Many of those user-agent strings are long-obsolete, and there is no use testing for them any more.

Be very sure that you understand the meaning of the regular-expressions tokens that you are using. The carat character "^" is a "start anchor" and means "starts with" and the "$" token means "ends with." Therefore, if you use "^" then the user-agent must start with the string that you specify. If you use the end-anchor token "$" then the user-agent string must end with the string that you specify. If you use both anchors, then an exact match is required. If you use neither, then any user-agent that contains the specified string will be matched.

You can combine some or all of these lines using the 'local OR' operator "|". For example, the two lines

RewriteCond %{HTTP_USER_AGENT} ^abc [OR]
RewriteCond %{HTTP_USER_AGENT} ^def [OR]

can be combined into one line as

RewriteCond %{HTTP_USER_AGENT} ^(abc|def) [OR]

Be sure that you take the start and end anchors as described above into account when combining these lines!

See the resources cited in our Apache Forum Charter for more information. Do not use any code on your site that you do not completely understand!

robots.txt: Be aware that only the major search engine robots will support the "something*something" syntax. Others may treat this string as a literal, and therefore not recognize your meaning.

Again, you can get rid of many of those obsolete user-agent strings, and you can likely combine many of the ones that remain.

Jim

classicsads

8:57 pm on Jul 18, 2010 (gmt 0)

10+ Year Member



jdMorgan

First thanks for your reply, now i deleted all the code's now i need to know 2 things.

1. what is the exact code to block (bots or spider) which is not having user agent string.

Actual URL from my log "its not EXAMPLE" given here:

69.93.204.33 - - [18/Jul/2010:00:00:01 -0500] "GET /forum/syndication.php?limit=15 HTTP/1.0" 200 37962 "-" "-"

2. How to find the Agent name in the log, i downloaded the raw log from my cpanel and opened it with MS Excel sheet getting nearly 75000 urls how to find Agent strings or how to know who is spamming.

Actual URL from my log "its not EXAMPLE" nearly thousands of urls are like this, given here:

115.118.215.166 - - [18/Jul/2010:00:18:46 -0500] "GET /files/category_icon__8343651311278098738.jpg HTTP/1.1" 200 998 "http://classicsads.com/?pgid=select_city&sl_cn=50%20&key=Russia" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.9 Safari/533.4"

117.197.255.171 - - [18/Jul/2010:01:49:29 -0500] "GET /plugins/comment/style.css HTTP/1.1" 200 758 "http://classicsads.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 Firefox/3.5.11"

27.54.147.191 - - [18/Jul/2010:00:11:54 -0500] "GET /plugins/uploadify/scripts/uploadify.swf HTTP/1.1" 200 23118 "http://classicsads.com/add-listing/automotive-vehicles/auto-dealers/cars-amp-other-four-wheelers.html" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SIMBAR={5B742ACE-8F83-411B-ADBE-D4282B6E2ECF}; AskTB5.6)"

67.195.115.49 - - [18/Jul/2010:00:03:46 -0500] "GET /?pgid=home&sl_cy=3011&key=Jihlava&sl_cn=88 HTTP/1.0" 200 181268 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

66.249.71.228 - - [18/Jul/2010:01:56:12 -0500] "GET /index.php?pgid=classifieds-search-result&main_category=Contacts&category=Girlfriends%20&%20Boyfriends%20%20&sl_cn=58&sl_cy=812 HTTP/1.1" 302 181352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

With diffrent ipaddress, please fix my issue jdmorgan i am not expert like you

Thank you

jdMorgan

10:00 pm on Jul 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We can help you fix your issue, but you must do the work. Our goal is to educate, not provide a free "repair service." Useful resources are available in our Apache Forum Charter -- see the link at the top of this page.

Taking one log entry, here are the names/meanings of the fields:

115.118.***.166    -    -    [18/Jul/2010:00:18:46 -0500] 
Requestor IP address Unknown Unknown Request timestamp

"GET /files/category_icon__88.jpg HTTP/1.1" 200 998
HTTP request line: HTTP method /URL-path Protocol Server response code (200-OK) Byte count

"http://example.com/?pgid=select_city&sl_cn=50%20&key=Russia"
HTTP referer (This is the URL of the page where the link to your site was clicked. However, this field can easily be faked.)

"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.9 Safari/533.4"
User-agent string (This can also be easily faked.)

To deny access to a blank user-agent, you can use either of the following two RewriteConds, and the RewriteRule as above:

RewriteCond %{HTTP_USER_AGENT} =""
RewriteCond %{HTTP_USER_AGENT} ^$

The first is an exact-text match, while the second is a regular-expressions match. They are equivalent. Use only one or the other. The first is a tiny bit faster.

You may also want to look out for some malicious agents that send a literal hyphen as the UA string. In that case, use

RewriteCond %{HTTP_USER_AGENT} ^-?$

to catch both literal hyphens and blanks.

Jim

classicsads

2:03 pm on Jul 20, 2010 (gmt 0)

10+ Year Member



I added the code as you said to avoid blank and malicious agents in my .htaccess file, but after adding the code also i can c the blank user agent (identified by empty user agent string)in my log how is this possible

the code which i added now in .htaccess

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F,L]

confused a lot even now also i am loosing heavy bandwidth :(

jdMorgan

2:41 pm on Jul 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is the server response status code, as described in my post above?

If it is a 403-Forbidden, then your server is refusing to service the blank user-agent's request, and the size of your "bandwidth loss" will depend on the size of your 403 ErrorDocument. If you use a custom 403 ErrorDocument, make sure that it is very small.

The only way to block a user-agent completely, so that its requests never reach the server and don't get logged, is with a firewall. If you don't have or don't have control of the firewall, then returning a 403 is the best you can do, and those 403 responses will appear in your logs.

Note that [L] used with [F] is redundant and unnecessary -- You can just use "RewriteRule .* - [F]"

Jim

classicsads

8:39 pm on Jul 20, 2010 (gmt 0)

10+ Year Member



69.93.204.33 - - [20/Jul/2010:05:48:57 -0500] "GET /forum/syndication.php?limit=5 HTTP/1.0" 500 - "-" "-"

69.93.204.33 - - [20/Jul/2010:06:12:16 -0500] "GET /blog/index.php?tempskin=_rss2 HTTP/1.0" 200 106784 "-" "-"

this is log i got in reports 1 is 500 and another is 200

classicsads

8:36 am on Jul 21, 2010 (gmt 0)

10+ Year Member



This is the last visit of the robots and spider log

Unknown robot (identified by 'robot')7362+8271.02 GB21 Jul 2010 - 03:11

Unknown robot (identified by 'spider')15977+1661.27 GB21 Jul 2010 - 03:06

Unknown robot (identified by 'crawl')1883+133301.12 MB21 Jul 2010 - 01:15

classicsads

10:18 am on Jul 21, 2010 (gmt 0)

10+ Year Member



And additionally i found the agents name as

("Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; Tablet PC 2.0)"

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100713 Firefox/3.6.7"

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "Googlebot-Image/1.0"

"Mediapartners-Google" "reefLESS bot"

69.93.204.33 - - [21/Jul/2010:04:17:48 -0500] "GET /forum/syndication.php?limit=5 HTTP/1.1" 200 19424 "-" "reefLESS bot" )

with the time stamp of robots and spiders which was matched in last visits are same.

if i block the particular ip address's then again its crawling in another ip address, so i am confused in blocking the agents bcoz, i found the user agent are valid and if i block it i my self could not able to open my website.

jdMorgan

6:47 am on Jul 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You got a 500-Server Error on one request, and a 200-OK on another.

What was in your server error log for that 500 error? This is important.

The 200-OK response indicates that the new code you added was not executed for the request to /blog/index.php?tempskin=_rss2

This could be the result of putting your code into an .htaccess file in the wrong directory (such as putting it in /forum/.htaccess where it would apply only to /forum URLs). Or it could be the result of adding your new code was put into your root .htaccess file, but placed *after* rewrites which are used to pass requests to your forum and blog scripts.

Forget your 'stats' reports. They are useless for this kind of troubleshooting. Also, for now, let's just deal with the two main issues and ignore the other user-agents. The two main problems, in order of importance are: Why are you getting a 500-Server Error, and why did the /forum request succeed? Those need to be addressed first.

Please check the potential problems I described above, and tell us where your code was added and what was in your server error log.

Jim