meanpath

Forum Moderators: open

Message Too Old, No Replies

meanpath

lucy24

12:38 am on Sep 19, 2013 (gmt 0)

Anyone know anything about meanpath dot com and/or the meanpathbot? Forums search is entirely silent :(

Its most recent crawl was from an from OVH range, meaning it was a priori blocked except for robots.txt. It did ask -- but it went on to ask for the front page of my test site, which is 100% roboted-out. As in:

"Which part of

User-Agent: *
Disallow: /

did you not understand?"

I don't know if it would try to dig deeper, if permitted. Equally important, I can't tell if it's serving any good and worthy purpose. This is assuming for the sake of discussion that a legitimate new search engine is a "good and worthy" thing. ymmv on this point. But here I'm not sure of its legitimacy in the first place.

bhukkel

5:26 am on Sep 19, 2013 (gmt 0)

On their blog i see postings like 'Twitter Bootstrap Now Powering 1% of The Web'. So it looks likes a service as builtwith.

adamseabrook

12:24 am on Sep 20, 2013 (gmt 0)

Hi Lucy24,

I am the CEO of meanpath, Inc. meanpathbot should respect your robots.txt so if you can email me the domain of the site it tried to crawl without respecting your robots.txt I can get the team to look into it. adam@meanpath.com or support@meanpath.com

One common error we see with robots.txt is that web masters are not aware robots.txt will be read from top to bottom with the first applicable rule found followed. So if you had say a permission for all crawlers to crawl at the top and a specific one for meanpathbot saying no below it would follow the allow at the top not the disallow at the bottom.

This may not be the case here but we can easily figure out what the issue is once we do a test on your site.

not2easy

4:00 am on Sep 20, 2013 (gmt 0)

It has been crawling around a few of my sites since late June or so. Not to my benefit, they have been uninvited.

JD_Toims

5:05 pm on Sep 20, 2013 (gmt 0)

So if you had say a permission for all crawlers to crawl at the top and a specific one for meanpathbot saying no below it would follow the allow at the top not the disallow at the bottom.

Why would you not follow the most specific directive for your bot [like at least Google and Bing do] rather than simply using the first found since a "generic" directive is likely the first directive in the file?

keyplyr

8:11 pm on Sep 20, 2013 (gmt 0)

So if you had say a permission for all crawlers to crawl at the top and a specific one for meanpathbot saying no below it would follow the allow at the top not the disallow at the bottom.

That statement just got that UA and it's respective IP range blocked across the sites I manage.

JD_Toims

8:24 pm on Sep 20, 2013 (gmt 0)

Oh, yeah, I didn't mention I snap-blocked them too after I read that.

adamseabrook

9:17 pm on Sep 20, 2013 (gmt 0)

Sorry my response had a typo in it and was not well worded. What I was trying to say was if you have two directives specific to meanpathbot with conflicting disallow and allow statements we take the first one found. We find multiple conflicting directives often especially on sites which have dynamic bot blockers that add things to robots.txt on the fly. Meanpathbot should act the same as he Googlebot so if you see it acting differently let us know so we can work out why.

User agent * allow is actually redundant as all bots will assume they are allowed unless there is a specific rule for them.

[github.com...] our robots code is actually open source and in use by a few crawlers.

lucy24

9:37 pm on Sep 20, 2013 (gmt 0)

Overlapping again! In response to keyplyr and jd:

I read yesterday's response, said WHAT THE ###, composed a long reply... and deleted it on the grounds of non-responsiveness.

My robots.txt files do happen to list * as the very last record. But that's coding style, not the robots.txt standard.

Well, it's all academic since we're talking about an OVH range.

keyplyr

6:42 am on Sep 21, 2013 (gmt 0)

Well, it's all academic since we're talking about an OVH range.

True... but I couldn't resist being dramatic :)

meanpath

lucy24

bhukkel

adamseabrook

not2easy

JD_Toims

keyplyr

JD_Toims

adamseabrook

lucy24

keyplyr

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week