Welcome to WebmasterWorld Guest from 34.228.41.66

Forum Moderators: Ocean10000 & keyplyr

Message Too Old, No Replies

MauiBot

     
9:37 pm on Mar 30, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891



UA: MauiBot (crawler.feedback+wc@gmail.com)
Protocol: HTTP/1.1
Robots.txt: Yes
Host: AWS
54.160.0.0 - 54.175.255.255
54.160.0.0/12
10:09 am on Mar 31, 2018 (gmt 0)

New User

joined:Mar 27, 2018
posts: 12
votes: 1


Hi keyplyr+ also picked this up in my logs this morning. This one may soon warrant blocking their IP addresses/ranges too but my blocker kicked them off anyway will keep monitoring these guys.

35.153.*.* - - [31/Mar/2018:09:26:51 +0200] "GET /robots.txt HTTP/1.1" 444 0 "-" "MauiBot (crawler.feedback+wc@gmail.com)" "-"PORT:80 0.000 - . "GZIP:-"
5:45 pm on Mar 31, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


The name seemed naggingly familiar, but it took a case-insensitive search before I remembered “MAUI WAP browser”. No relation, I suppose.

And that’s why NoCase or [NC] needs to be used with extreme caution.
9:15 pm on Apr 1, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


Follow-up: Based on its attested behavior in the last few days' logs, this may intend to be a compliant robot. (Loads of requests, but nothing in a roboted-out directory.) I'll see what happens after I Disallow.
9:25 pm on Apr 1, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


It requested robots.txt 60 times yesterday at one of my sites (the only site to see it.) IMO this means nothing regarding compliance. We don't know what they plan to do with our files, and actually I don't care if an agent respects robots.txt or not.

My criteria for allowing remote actors to use my property is benefit. If they are not benefitting my interests in some way, they can't have access. There is too much activity on the net not to be idiocentric.

If they're not benefiting you, they're benefiting themselves or someone else.
3:59 am on Apr 4, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891



[Update]
Since I disallowed MauiBot in robots.txt, it hasn't requested other files.
12:39 am on Apr 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


My criteria for allowing remote actors to use my property is benefit.
If I come home to find that someone has been in my house, and I know this because they have vacuumed the rug, done the laundry, washed the dishes and cooked me a gourmet dinner ... they’re still housebreakers. (One of the Discworld books has a great riff on this theme. I can’t remember the nice technical term they came up with.)

I, too, have seen a whole lot of MauiBot requests for robots.txt, and nothing else since they were disallowed.

Do you supppose the MauiBot is “crawler” by a new name?
12:49 am on Apr 6, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Say what?
5:16 am on Apr 6, 2018 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2004
posts: 81
votes: 0


Most of the requests for the past few days have been from MauiBot (crawler.feedback+wc@gmail.com). Not sure whether to ban the IP or not.
5:40 am on Apr 6, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


jehoshua - well that's the problem when the bot owner does not include a link to an info page describing who they are and what they do with our data.

Personally, I block all Amazon (AWS) IP ranges, but allow beneficial agents through. So if they don't provide info they are beneficial, I don't allow them.
6:06 am on Apr 6, 2018 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2004
posts: 81
votes: 0


jehoshua - well that's the problem when the bot owner does not include a link to an info page describing who they are and what they do with our data.


Thanks, I have disallowed that one. :)
7:51 am on Apr 6, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Amazon (AWS) IP ranges [webmasterworld.com]
10:49 pm on Apr 17, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


More about this robot's behavior.

I disallowed them in robots.txt at the beginning of the month. Normally I reassess access controls once a month; this time I felt so sorry for them, I removed the disallow and poked the appropriate holes after 10 days. Well, they were just so polite ...

Site 1:
IP: exactly 54.234.aa.bb for all requests, beginning about 2 days before I authorized them.

Requests: top-to-bottom spidering, although earlier requests (when they were blocked but not disallowed) suggest they knew about certain interior pages already.

Crawl frequency: clumps of 3-6 requests in a single second, followed by a gap of rpetty exactly 30 seconds.

Site 2:
IP: different from Site 2, but same pattern: about 2 days before I authorized them, they settled on a single IP all the time. (I've noticed the same thing in some European search engines: for any given site, they always crawl from the identical IP.)

Requests: proceeded directly to selected interior pages, although earlier requests were only for top-level directories (linked from front page and 403 page). In other words, the exact opposite of their Site 1 behavior.

Final quirk: Site 2 is HTTPS. Requests came though on HTTP and were redirected to HTTPS, so nothing except robots.txt got a 200. To date they have not followed-up the redirects; I waited before posting to see if they'd be back, but it's been almost a week.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members