homepage Welcome to WebmasterWorld Guest from 54.163.168.15
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MojeekBot
wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 2:45 am on Apr 24, 2012 (gmt 0)

Just a heads up.

Three 2005 references in the archives.
IP a UK server farm.

195.74.55.164 - - [24/Apr/2012:02:21:24 +0100] "GET /robots.txt HTTP/1.1" 200 2627 "-" "Mozilla/5.0 (compatible; MojeekBot/0.2; http://www.mojeek.com/bot.html)"

[edited by: incrediBILL at 4:18 am (utc) on Apr 24, 2012]
[edit reason] de-linked user agent [/edit]

 

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 4:41 am on Apr 24, 2012 (gmt 0)



Thanks Don

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4444489 posted 7:10 am on Apr 24, 2012 (gmt 0)

:: shuffling papers ::

195.74.55.164, yup. That silly name must have made a huge impression on my memory-- I use the term loosely-- because I only find one visit in the past year. (Logs on HD where Spotlight can paw through them.) robots.txt, front page, yawn. Did they do anything nasty to you? Er, to your site.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 8:56 am on Apr 24, 2012 (gmt 0)


IP a UK server farm

and a colo - 'nuff said

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 12:19 pm on Apr 24, 2012 (gmt 0)

lucy,
RewriteCond %{REMOTE_ADDR} ^19[013-6]\. [OR]

mojeek



 
Msg#: 4444489 posted 5:11 pm on Apr 30, 2012 (gmt 0)

Hi, I'm the developer of Mojeek, can I ask what's wrong with colocation? Also if our bot disobeyed your robots.txt or did anything that would suggest it to be anything other than a genuine se bot?

Also, as we already provide a fairly comprehensive bot page, is there more we could add to it that would of persuaded you to give us a chance?

Thanks.

Marc

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 5:41 pm on Apr 30, 2012 (gmt 0)

can I ask what's wrong with colocation


neither co-location or server farms (shared hosting, VPN's and other similar website hosting), offer valid visitors, rather they provide a web host and/or its server harvesting pages.

is there more we could add to it that would of persuaded you to give us a chance


Not from me.
I don't allow all of Europe into my sites. Except by special request/custom from widget contacts/references.

I'm sure somebody else will come along and provide and explanation more beneficial to you.

mojeek



 
Msg#: 4444489 posted 6:01 pm on Apr 30, 2012 (gmt 0)

Ok thanks for the reply. Although it's not shared hosting etc., we have our own racks and apart from having our own datacentre not sure what other options there are.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4444489 posted 6:19 pm on Apr 30, 2012 (gmt 0)

There's some very, very long threads here on Amazon, which you may find insightful.

IMO, a server is a server. It doesn't matter to me if its a colo, shared, VPN or even a commercial internet provider who offers subnet ranges for hosting customers.
NONE of them offer valid visitors to my websites.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4444489 posted 7:17 pm on Apr 30, 2012 (gmt 0)

Mojeek - In general I'll go along with wilderness on that. Only known genuine bots are allowed access to our sites from any kind of server.

It makes no difference whether the server range is rated good or bad. I even block all of the server IPs at the server farm where my own servers are leased from. :)

The simplest criterion is: does this bot benefit my site? Very few do, and I have to say I hadn't been aware of mojeek until this thread, although I may have blocked the IP range based on an unknown (eg your) bot.

In the spirit of looking for alternative UK SEs (I reside in UK) I have unblocked the IP 195.74.55.164 and added MojeekBot to my Allowed list. Let's see how it goes. :)

And incidentally, you may want to check the WebmasterWorld forum UK & Ireland Search Engines at [webmasterworld.com...] - a thread on a new UK SE has just begun there.

mojeek



 
Msg#: 4444489 posted 8:14 pm on Apr 30, 2012 (gmt 0)

I fully understand the problem as I have my fair share of rogue bots and scrapers, including so-called "reputable search engines" trying to avoid using proper api access or creating their own technology. I don't allow automated queries or the results to be crawled, so with nearly limitless pages it can be a big problem.

I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out. There's a thread on here talking about Mojeek in 2006 - [webmasterworld.com...] so we're not new and obviously not some page harvester.

With regards to genuine visitors, I suppose we'll never be able to send any if we're not allowed to index your site, or provide some results to our users that we would of otherwise liked to.

dstiles - Thanks, although we do provide thorough info on our bot including how to test it's ours. I commented on the UK se thread earlier, always interested in any engine coming out of the UK, a rare thing!

frontpage

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4444489 posted 5:58 pm on May 1, 2012 (gmt 0)

SecRule HTTP_User-Agent "MojeekBot" "deny,log,status:403"
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4444489 posted 7:33 pm on May 1, 2012 (gmt 0)

I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out.

That's a whole nother thread. If all the Big Sites routinely block all robots except the Privileged Few-- some of whom, ahem, behave almost as badly as your average Ukrainian-- then all that's left for an up-and-coming search engine is the Not So Big Sites. So your search results become something like "The best of the rest". Which in some cases could be quite interesting :)

mojeek



 
Msg#: 4444489 posted 2:16 pm on May 2, 2012 (gmt 0)

Interesting thought but probably cause there to be even less alternatives than there is now, unless they simply backfilled with a major. Anyway, definitely going off topic now so I'll shut up, sorry about that.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved