homepage Welcome to WebmasterWorld Guest from 54.226.136.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
What is bot[+:,\.\;\/\\-]?
ichthyous

10+ Year Member



 
Msg#: 4440133 posted 6:08 pm on Apr 12, 2012 (gmt 0)

I haven't seen any reference to this on WebmasterWorld or in google. It also appears as [+:,\.\;\/\\-]bot on my logs. It's requesting hundreds of mb of pages from y site. I am also seeing many other unidentified bots at the same time:

    Unknown robot (identified by 'spider') 5,891+302 192.14 MB 12 Apr 2012 - 13:47
    bot[+:,\.\;\/\\-] 4,800+352 196.00 MB 10 Apr 2012 - 15:42
    Unknown robot (identified by 'crawl') 3,556+109 149.11 MB 12 Apr 2012 - 13:30
    Unknown robot (identified by empty user agent string) 1,655+7 54.83 MB 12 Apr 2012 - 12:52
    Unknown robot (identified by 'bot*') 1,272+106 48.68 MB 12 Apr 2012 - 13:49
    [+:,\.\;\/\\-]bot 1,215+96 34.98 MB 10 Apr 2012 - 15:21
    Unknown robot (identified by hit on 'robots.txt') 0+809 1.60 MB 12 Apr 2012 - 12:53
    Unknown robot (identified by 'robot') 731+37 30.19 MB 12 Apr 2012 - 10:16
    Unknown robot (identified by '*bot') 109+16 4.25 MB 12 Apr 2012 - 10:58
    Unknown robot (identified by 'discovery') 37 634.57 KB 12

Are these spambots or valid bots and how can I stop them? Thanks for any help

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4440133 posted 12:23 am on Apr 13, 2012 (gmt 0)

I've not seen it before, but after the word bot is a regular expression escape string of just the special characters but to what end?

What's the IP of this thing, China?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4440133 posted 12:35 am on Apr 13, 2012 (gmt 0)

FWIW, if you had provided one actual raw visitor log line, you'd have received much quicker help than from the dribble you provide from your stats software.

IP please, and/or actual raw log line!

Mokita

5+ Year Member



 
Msg#: 4440133 posted 5:59 am on Apr 16, 2012 (gmt 0)

@wilderness, some people hosted on shared servers don't have access to raw logs. Others are simply unaware of their existence, or how to download them.

@incrediBILL, There was a thread here a couple of years ago about this that I replied to, but even though I've looked, I can't seem to find it.

@ichthyous, What you are seeing is an artefact of AWStats. If you are able to read your raw log file, you would most probably find that whichever bots it was that visited on 10 Apr 2012 - 15:42 and 10 Apr 2012 - 15:21, they certainly had a normal user-agent, not the regex puzzle quoted by AWStats.

In my case, all the ones I have bothered to follow up have been legitimate bots, and even include Google's AdsBot. It isn't one single bot, its a whole bunch.

Bottom line, ignore it! ;)

Andy Langton

WebmasterWorld Senior Member andy_langton us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4440133 posted 9:36 am on Apr 16, 2012 (gmt 0)

There's an older thread here with a slightly more detailed explanation of the awstats regex:

\wbot[\/\-] in AWStats Robots/Spiders visitors
But does not appear on raw log
[webmasterworld.com]

Mokita

5+ Year Member



 
Msg#: 4440133 posted 5:23 pm on Apr 16, 2012 (gmt 0)

Found the earlier thread here:

[webmasterworld.com...]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4440133 posted 10:17 pm on Apr 16, 2012 (gmt 0)

Can't help but feel that
bot\W
would cover the same territory in fewer bytes. Or possibly
bot[^\w\s].

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved