homepage Welcome to WebmasterWorld Guest from 54.224.202.109
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MojeekBot
wilderness




msg:4444491
 2:45 am on Apr 24, 2012 (gmt 0)

Just a heads up.

Three 2005 references in the archives.
IP a UK server farm.

195.74.55.164 - - [24/Apr/2012:02:21:24 +0100] "GET /robots.txt HTTP/1.1" 200 2627 "-" "Mozilla/5.0 (compatible; MojeekBot/0.2; http://www.mojeek.com/bot.html)"

[edited by: incrediBILL at 4:18 am (utc) on Apr 24, 2012]
[edit reason] de-linked user agent [/edit]

 

keyplyr




msg:4444510
 4:41 am on Apr 24, 2012 (gmt 0)



Thanks Don

lucy24




msg:4444549
 7:10 am on Apr 24, 2012 (gmt 0)

:: shuffling papers ::

195.74.55.164, yup. That silly name must have made a huge impression on my memory-- I use the term loosely-- because I only find one visit in the past year. (Logs on HD where Spotlight can paw through them.) robots.txt, front page, yawn. Did they do anything nasty to you? Er, to your site.

keyplyr




msg:4444584
 8:56 am on Apr 24, 2012 (gmt 0)


IP a UK server farm

and a colo - 'nuff said

wilderness




msg:4444627
 12:19 pm on Apr 24, 2012 (gmt 0)

lucy,
RewriteCond %{REMOTE_ADDR} ^19[013-6]\. [OR]

mojeek




msg:4447587
 5:11 pm on Apr 30, 2012 (gmt 0)

Hi, I'm the developer of Mojeek, can I ask what's wrong with colocation? Also if our bot disobeyed your robots.txt or did anything that would suggest it to be anything other than a genuine se bot?

Also, as we already provide a fairly comprehensive bot page, is there more we could add to it that would of persuaded you to give us a chance?

Thanks.

Marc

wilderness




msg:4447603
 5:41 pm on Apr 30, 2012 (gmt 0)

can I ask what's wrong with colocation


neither co-location or server farms (shared hosting, VPN's and other similar website hosting), offer valid visitors, rather they provide a web host and/or its server harvesting pages.

is there more we could add to it that would of persuaded you to give us a chance


Not from me.
I don't allow all of Europe into my sites. Except by special request/custom from widget contacts/references.

I'm sure somebody else will come along and provide and explanation more beneficial to you.

mojeek




msg:4447606
 6:01 pm on Apr 30, 2012 (gmt 0)

Ok thanks for the reply. Although it's not shared hosting etc., we have our own racks and apart from having our own datacentre not sure what other options there are.

wilderness




msg:4447613
 6:19 pm on Apr 30, 2012 (gmt 0)

There's some very, very long threads here on Amazon, which you may find insightful.

IMO, a server is a server. It doesn't matter to me if its a colo, shared, VPN or even a commercial internet provider who offers subnet ranges for hosting customers.
NONE of them offer valid visitors to my websites.

dstiles




msg:4447642
 7:17 pm on Apr 30, 2012 (gmt 0)

Mojeek - In general I'll go along with wilderness on that. Only known genuine bots are allowed access to our sites from any kind of server.

It makes no difference whether the server range is rated good or bad. I even block all of the server IPs at the server farm where my own servers are leased from. :)

The simplest criterion is: does this bot benefit my site? Very few do, and I have to say I hadn't been aware of mojeek until this thread, although I may have blocked the IP range based on an unknown (eg your) bot.

In the spirit of looking for alternative UK SEs (I reside in UK) I have unblocked the IP 195.74.55.164 and added MojeekBot to my Allowed list. Let's see how it goes. :)

And incidentally, you may want to check the WebmasterWorld forum UK & Ireland Search Engines at [webmasterworld.com...] - a thread on a new UK SE has just begun there.

mojeek




msg:4447665
 8:14 pm on Apr 30, 2012 (gmt 0)

I fully understand the problem as I have my fair share of rogue bots and scrapers, including so-called "reputable search engines" trying to avoid using proper api access or creating their own technology. I don't allow automated queries or the results to be crawled, so with nearly limitless pages it can be a big problem.

I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out. There's a thread on here talking about Mojeek in 2006 - [webmasterworld.com...] so we're not new and obviously not some page harvester.

With regards to genuine visitors, I suppose we'll never be able to send any if we're not allowed to index your site, or provide some results to our users that we would of otherwise liked to.

dstiles - Thanks, although we do provide thorough info on our bot including how to test it's ours. I commented on the UK se thread earlier, always interested in any engine coming out of the UK, a rare thing!

frontpage




msg:4448080
 5:58 pm on May 1, 2012 (gmt 0)

SecRule HTTP_User-Agent "MojeekBot" "deny,log,status:403"
lucy24




msg:4448120
 7:33 pm on May 1, 2012 (gmt 0)

I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out.

That's a whole nother thread. If all the Big Sites routinely block all robots except the Privileged Few-- some of whom, ahem, behave almost as badly as your average Ukrainian-- then all that's left for an up-and-coming search engine is the Not So Big Sites. So your search results become something like "The best of the rest". Which in some cases could be quite interesting :)

mojeek




msg:4448458
 2:16 pm on May 2, 2012 (gmt 0)

Interesting thought but probably cause there to be even less alternatives than there is now, unless they simply backfilled with a major. Anyway, definitely going off topic now so I'll shut up, sorry about that.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved