Forum Moderators: bakedjake

Message Too Old, No Replies

Mojeek gets investment, 100 new servers deployed

         

brotherhood of LAN

5:17 pm on Dec 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mojeek, a UK based search engine with privacy at its heart, announces new investment and plans to grow their index to 8 billion pages while improving its algorithm. Alternatives are a good thing. The world has been a bit too Googly the past 15 years, seeing the world's information from one perspective is not as good as seeing it from more.

[blog.mojeek.com...]

topr8

5:53 pm on Dec 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



very commendable and SE competition can only be good for us (webmasters) - especially a british one.

however my quick thoughts would be ...

SEs seem to be obsessed with size - i understand why, it is the same reason as people judge themselves by the number of likes or whatever on social media.

however as the old saying goes it isn't all about size ... i would have thought if you could develop an algorithm, or even a manual system to spot the scraper type sites that publish millions of pages ... once found you could them just put them on a do not crawl list and drop them from your index - yes your index would be millions of pages less (per site blocked), but it would save time and resources - additionally it is a good way of weeding out at least some of the rubbish from the SERPS. maybe if you are charitable you could check back after a year to see if the site has changed its ways - but i would say a permanent ban would make more sense, after all if scrape and republish is your business model you are not going to change it anytime soon, you can't it would cost you too much to actually write real, original content.

brotherhood of LAN

9:12 pm on Dec 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



size


This is always a potential issue with scrapers and original content. Unfortunately the pure case for original content is whoever publishes it first, or in the case of crawlers, whichever version is seen first, from a particular frame of reference (in this case a search engine)

Me personally, I'd hugely advocate some kind of repository of checksums of unique content, say a 64-bit hash of a unique content span, just for a publisher to say "I made this". I'm surprised such a thing isn't already a web standard. I'm not a fan of centralisation, but having some kind of authoritative reference of unique content would be a brilliant thing. I've heard of some 'push' methods on this front to authenticate new content, but nothing that's substantially taken off.

lucy24

10:56 pm on Dec 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They’re a search engine?! Who knew. I’ve been seeing their (well-behaved) robot for years, but honestly never had any idea what they’re about :)

engine

5:22 pm on Dec 4, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Good to know they are alive and well.

It's time mojeek raised its profile, and that's the biggest battle, imho.

Mark_A

1:17 pm on Dec 5, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm, we only rank at #62 .. not impressed !

treeline

11:35 pm on Dec 5, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On some specialized searches they surface a top link, but when I search for the type of stuff I might look for normally, like widget store, city, state it's pretty ugly. Many unrelated things like "directories" of a different type of store, NSFW in another country, and completely different subjects. On one search I did turn up the very normal type of store sought at #19, but none of the similar stores in town.

Hopefully more links help.

brotherhood of LAN

10:45 am on Dec 6, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Cheers for the feedback @treeline. Would really appreciate if you (and others) could use the feedback form in the search results to point out deficiencies you think are there.

Also feel free to PM me specific searches and I'll see what's going on.

Mojeek is expanding its index and also improving the algo so we expect the general order of results to change quite a bit over the coming months. Obviously we can only rank what we can crawl but also need to rank better what we already have.

engine

11:52 am on Dec 6, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It's refreshing there's not the usual highly-optimized corporate sites in there.
I actually found new stuff!

jmccormac

4:08 am on Dec 7, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How does it deal with sites that have no inbound links?

Regards...jmcc

brotherhood of LAN

7:00 am on Dec 7, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How does it deal with sites that have no inbound links?

Mojeekbot discovers pages entirely through links, there is no seed list. A link would be needed to get crawled and indexed. So in a site's case, you'd want at least one link from an external site and your site structure to allow for the crawling of further pages.

tangor

7:31 am on Dec 7, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's always that First Steps Stage ... as they say Maturity Takes Time ... and Experience. I'll be watching! Can never have too many choices!

superclown2

7:37 am on Dec 8, 2019 (gmt 0)



Can I humbly suggest a different name? Mojeek isn't memorable.

I do think that webmasters, particularly those of us in the UK, could do a lot to help spread the word about a new search engine. Just a statement like 'Try this <link>new UK search engine </link>that doesn't track you' on our web pages would help spread the word, and it would help all of us too if it was successful. After all it was help from the webmaster community that gave Google a foothold in the first place.

tangor

9:05 am on Dec 8, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I dunno @Superclown2 ... mojeek seems clever enough to me.

It is MORE GEEK condensed and pronounced as "spelled" without that other stuff in the way:

"mojeek"

A name is what you make it ... and any value that can be attached. Time will tell.

blend27

11:21 am on Dec 8, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@superclown2 >>>>> After all it was help from the webmaster community that gave Google a foothold in the first place

Yep, aha and bingo! >>> ads by Google. Plus that FREE log analyzer(lucy24 pleased help with the name) they snatched and then killed it that was on almost every site for ages.
---------------------------------------------
As far as their current index-shmindex goes... I did a search for "top keyword phrase" in my niche. You know that B&M 2 blocks down the road if you look outside the back window of your house/flat/skyscraper, the one that still has W3C and CSS1 Certified logos as a main attraction(as their webmaster might still think) on their web sites that one would look at as 'Nice Jellow Color Buttons and let me click on it' type with tables(html term) outlined in magenta.... that is all I got for the first 30.

I found most of the site names that I forgot about at some point were there as my competitors with a small hint of they were only replaced by a blogs with affiliate links to ..... wait for it.... Big Fat AMAZON(love me some PRIME though).

But the index-shmindex is currently clean. Oh, and only 6 out of 30 were HTTPS., some of which were with mixed content.

[washingtonpost.com...]

mcneely

5:54 pm on Dec 8, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can't help but notice how much more HTTP's there are as opposed to how much less there are of the HTTPS's in the index ... plus, a great deal of the results/listings were for dated pages from 2007, 2009, and 2011 that supplied outdated or otherwise obsolete technical info ...

Mojeekbot discovers pages entirely through links


Our index does the same, but we come up with frightfully more HTTPS's in our index. Our similar search terms provide more up-to-date and current results/information as well.

Mojeek isn't memorable.


LOL ... much more memorable than what we have ...

brotherhood of LAN

11:19 am on Dec 9, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



out of 30 were HTTPS

Can't help but notice how much more HTTP's there are as opposed to how much less there are of the HTTPS's in the index ... plus, a great deal of the results/listings were for dated pages from 2007, 2009, and 2011 that supplied outdated or otherwise obsolete technical info ...

We'd recently identified a bug that explained why some pages were not being crawled and/or not updated. It'll take a few weeks for the index to catch up.

mack

3:31 pm on Dec 9, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Been following the Mojeek story pretty much since it was set up and it's come a long way and done way better than many would have suspected. Their search quality is very good. If only they could be discovered more by the masses.

Mack.

blend27

5:41 pm on Dec 9, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@^ All I say is that some start a with a good intent of breaking few rules, a bit of which is OK, and then 'Gowdy/Houdi/Shmoudy' happens. We want and are to learn from what had been already learned.

I once was a front of a situation when one wise old man mentioned that he would get all wet if he walked under a pouring rain with no umbrella on a way to a subway stop that afternoon... 40 yards of that. The man he was speaking to simply replied: Walk Faster.

RedBar

6:32 pm on Dec 9, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm glad this has had a mention but I can only mark it as 50/50.

Why?

Some of my specialist keyword searches were excellent, spot on and with up-to-date information from my sites, however, they also delivered results from one of my sites I closed at least two years ago.

It's clean and it's quick but not yet ready for me to leave DDG however I shall keep checking it out and have included it in my search bar engines.