Welcome to WebmasterWorld Guest from 54.227.110.209

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MojeekBot

     

wilderness

2:45 am on Apr 24, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Just a heads up.

Three 2005 references in the archives.
IP a UK server farm.

195.74.55.164 - - [24/Apr/2012:02:21:24 +0100] "GET /robots.txt HTTP/1.1" 200 2627 "-" "Mozilla/5.0 (compatible; MojeekBot/0.2; http://www.mojeek.com/bot.html)"

[edited by: incrediBILL at 4:18 am (utc) on Apr 24, 2012]
[edit reason] de-linked user agent [/edit]

keyplyr

4:41 am on Apr 24, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Thanks Don

lucy24

7:10 am on Apr 24, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



:: shuffling papers ::

195.74.55.164, yup. That silly name must have made a huge impression on my memory-- I use the term loosely-- because I only find one visit in the past year. (Logs on HD where Spotlight can paw through them.) robots.txt, front page, yawn. Did they do anything nasty to you? Er, to your site.

keyplyr

8:56 am on Apr 24, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




IP a UK server farm

and a colo - 'nuff said

wilderness

12:19 pm on Apr 24, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



lucy,
RewriteCond %{REMOTE_ADDR} ^19[013-6]\. [OR]

mojeek

5:11 pm on Apr 30, 2012 (gmt 0)



Hi, I'm the developer of Mojeek, can I ask what's wrong with colocation? Also if our bot disobeyed your robots.txt or did anything that would suggest it to be anything other than a genuine se bot?

Also, as we already provide a fairly comprehensive bot page, is there more we could add to it that would of persuaded you to give us a chance?

Thanks.

Marc

wilderness

5:41 pm on Apr 30, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



can I ask what's wrong with colocation


neither co-location or server farms (shared hosting, VPN's and other similar website hosting), offer valid visitors, rather they provide a web host and/or its server harvesting pages.

is there more we could add to it that would of persuaded you to give us a chance


Not from me.
I don't allow all of Europe into my sites. Except by special request/custom from widget contacts/references.

I'm sure somebody else will come along and provide and explanation more beneficial to you.

mojeek

6:01 pm on Apr 30, 2012 (gmt 0)



Ok thanks for the reply. Although it's not shared hosting etc., we have our own racks and apart from having our own datacentre not sure what other options there are.

wilderness

6:19 pm on Apr 30, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There's some very, very long threads here on Amazon, which you may find insightful.

IMO, a server is a server. It doesn't matter to me if its a colo, shared, VPN or even a commercial internet provider who offers subnet ranges for hosting customers.
NONE of them offer valid visitors to my websites.

dstiles

7:17 pm on Apr 30, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Mojeek - In general I'll go along with wilderness on that. Only known genuine bots are allowed access to our sites from any kind of server.

It makes no difference whether the server range is rated good or bad. I even block all of the server IPs at the server farm where my own servers are leased from. :)

The simplest criterion is: does this bot benefit my site? Very few do, and I have to say I hadn't been aware of mojeek until this thread, although I may have blocked the IP range based on an unknown (eg your) bot.

In the spirit of looking for alternative UK SEs (I reside in UK) I have unblocked the IP 195.74.55.164 and added MojeekBot to my Allowed list. Let's see how it goes. :)

And incidentally, you may want to check the WebmasterWorld forum UK & Ireland Search Engines at [webmasterworld.com...] - a thread on a new UK SE has just begun there.

mojeek

8:14 pm on Apr 30, 2012 (gmt 0)



I fully understand the problem as I have my fair share of rogue bots and scrapers, including so-called "reputable search engines" trying to avoid using proper api access or creating their own technology. I don't allow automated queries or the results to be crawled, so with nearly limitless pages it can be a big problem.

I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out. There's a thread on here talking about Mojeek in 2006 - [webmasterworld.com...] so we're not new and obviously not some page harvester.

With regards to genuine visitors, I suppose we'll never be able to send any if we're not allowed to index your site, or provide some results to our users that we would of otherwise liked to.

dstiles - Thanks, although we do provide thorough info on our bot including how to test it's ours. I commented on the UK se thread earlier, always interested in any engine coming out of the UK, a rare thing!

frontpage

5:58 pm on May 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



SecRule HTTP_User-Agent "MojeekBot" "deny,log,status:403" 

lucy24

7:33 pm on May 1, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I just find it a shame when a genuine new or smaller engine can so easily be publicly associated with rogue bots and are usually the first to be banned, as they're also the easiest to be identified, without at least being given a chance or checked out.

That's a whole nother thread. If all the Big Sites routinely block all robots except the Privileged Few-- some of whom, ahem, behave almost as badly as your average Ukrainian-- then all that's left for an up-and-coming search engine is the Not So Big Sites. So your search results become something like "The best of the rest". Which in some cases could be quite interesting :)

mojeek

2:16 pm on May 2, 2012 (gmt 0)



Interesting thought but probably cause there to be even less alternatives than there is now, unless they simply backfilled with a major. Anyway, definitely going off topic now so I'll shut up, sorry about that.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month