|Which of the smaller search engines are OK?|
Looking to broaden, rather than deepen, exposure
the websites I am looking after are well visited by the spiders of Google, MSN/Bing, Yandex, and Duck (Baidu's spiders comes way to often and half the time in sneaky ways, so they have been excluded for bad behaviour - that was an easy decision, since China is not a target area). Oh, some bad behaviour on part of Google and MSN had to be contained, as well, but no big deal...
All other boths are presently asked to stay out.
I am interested in learning more about "small" search engines that are known to deliver decent quality even though their market share may be slim. It seems easy enough to explicitly allow the spiders of such outfit on the sites and in some cases perhaps reach a small (but perhaps more discerning?) audience.
Which search engines could you recommend? Which would you advise against and why?
I am already considering Teoma, and have heard a bit about Scrubby and Sistrix that seemed OK, but don't know yet whether they are useful and "safe".
Advice of any sort would be most welcome :)
Has to be DuckDuckGo.
Sorry, I wasn't clear enough there: that's what I meant by "Duck" - it's already visiting the sites.
Also, "all other boths" is supposed to read "all other spiders".
(sorry about the brain fart)
There are two different questions: Alternative (i.e. not G###) search engines, and non-US search engines. Is your audience global or regional?
The one that comes to mind is Seznam (Czech, I think). Crawls all the time, and behaves itself. Yeti/Naver is the big Korean name. There's a Chinese Baidu and a Japanese one. I block by IP; the Japanese one seems to behave itself. They may really share results, but it's the principle of the thing. Somewhere along the line I put Exabot on the Ignore list. I think they're French.
Really, if you just look at the robots that ask for and obey robots.txt, you've already cut out 90% of them.
Edit: If you're looking at robots in general, rather than specifically search engines, pay attention every time someone recommends a service such as link analysis, or anything with "SEO" or "profile" in its description. At the other end of that service is a robot that crawls other people's sites. The service is only as good as its robot.
B, G, and Y are what... 99% of all search? After that is all the rest. Know your intended audience and go from there. Unless these smaller SE have some kind of advertising that can be tapped into, there might not be much reason (from a money point of view).
In all cases it is up to the individual webmaster to make those decisions.
[edited by: iomfan at 11:14 pm (utc) on Jan 31, 2014]
Lucy24, thanks for the comments!
|Is your audience global or regional? |
For the sites in question global of sorts, but specific to the 4 languages English, German, Japanese, and - to a limited extent - Chinese as used outside of China.
A siutation report:
The coverage for English and German so far: Google, Yahoo, Bing, Yandex, Duck; thus also metacrawlers like Ixquick (coverage details in the next post).
I have all the independent Japanese spiders covered - therefore omitted mentioning them. :)
From what I've read, net users in Germany use predominantly Google, Yahoo, Bing.
People in Taiwan, Hongkong, Macau, Singapore are said to do the same, in addition to Chinese search engines (the latter are at present all excluded / Baidu will definitely remain excluded since it does not behave; even sends attack probes).
|The one that comes to mind is Seznam (Czech, I think). Crawls all the time, and behaves itself. |
Will look into that... German and English should go well with that one...
|Yeti/Naver is the big Korean name. |
Right - they have always had access permission and have been visiting the Japanese version of the site regularly until they suspended service in Japan last fall. Korea is not really a target, although there are potential cutomers that can use Japanese, English, German, or Chinese. ;)
|There's a Chinese Baidu and a Japanese one. |
Baidu competes with the NSA for first place on my list of "no thanks" entities... ;) And they carry information about my site anyway, even though they have no access. :)
Thanks also to tangor!
B, G, and Y are what... 99% of all search?
Yes, that's where we all start... :)
|Unless these smaller SE have some kind of advertising that can be tapped into, there might not be much reason (from a money point of view). |
Not investing much here: I am just working on positioning myself for the time when G is just one of several players, and all I am going to do is open robots.txt to those smaller ones that are reported to be well-behaved and of decent quality of their results.
Here is an illustration of the currently achievable result quality (using English search engines for the English language version of the site in question): looking for a service in a strongly contested market that can succinctly be described by two key words that usally occur in very close proximity to each other, namely "guesthouse" and "[location name]", we get the following results:
Bing 10th place - main page (one competitor on 1st place, another competitor successively on places 2-9, so in human terms we come in on place 3)
DuckDuck 17th place - main page
Yahoo (US) 26th place - main page
Ixquick 28th place - main page
Google seems "out to lunch" for the time being - search results are frequent repetitions of the same sites plus a lot of derived contents. Right now:
Google (US) 46th place (sort of, since this is only a minor page of the wanted site that is no longer accessible to the spider and carries the keywords not closely placed together; also on 121st place some other site that makes reference to the wanted site)
|Not investing much here: I am just working on positioning myself for the time when G is just one of several players, and all I am going to do is open robots.txt to those smaller ones that are reported to be well-behaved and of decent quality of their results. |
Good luck... B, G, and Y will be in the top percent for a decade or more.
That does not mean one should not court the smaller SE! Not by a long shot. Just depends on the desired audience... and the SE's in use there.
|B, G, and Y will be in the top percent for a decade or more. |
I wouldn't be surprised... although 10 years are a long long time in term of "the internet" :)
|That does not mean one should not court the smaller SE! Not by a long shot. Just depends on the desired audience... and the SE's in use there. |
Am thinking that there are people (perhaps a growing number) who are looking for alternatives to G - i figure why not appeal to that audience, if all it takes is opening some doors via robots.txt
Regarding the Czech search engine Seznam: they are said to have close to 2/3 of the Czech Market, so definitely worth thinking about if that country is part of one's market. Interistingly, they appear to charge 20 Euro for an URL submission - maybe that's their way of keeping their database reasonably clean...(?)
I think you need to look at it from a usage point of view. If you take the total number of daily searches on Google and compare them to the total number of other search engines combined, Google is still miles ahead.
Because of this you need to work out how much time is it worth spending trying to attract other search engines to index you and possibly send you a few users.
For every hour you spend pleasing Google spend a few mins for the rest.
20 Euros? Really? I wonder who's been paying them to crawl my site all this time :)
|If you take the total number of daily searches on Google and compare them to the total number of other search engines combined, Google is still miles ahead. |
No question about that!
I am not sure, however, how useful that information is for me in relation to the site I have been talking about, since nobody has ever arrived at it via Google (the site is indexed properly in G and is visited often enough by their spider, and it can be found promptly if one uses the name of the establishment to search for it, since that name is unique). Most visitors are coming via links from other sites, a few via Duck or Bing...
|For every hour you spend pleasing Google spend a few mins for the rest. |
After reading the article "Disavow & Link Removal: Understanding Google" that martinibuster refers to in the thread at [webmasterworld.com...] I have the feeling that I won't bother trying to please Google, because I am not in their league. I will, however, continue to work on getting links from sites that make sense to me. ;)
(For the fun of it, I tried to check the backlinks for the site in question and noticed that some links I know of did not show up in the result list, but of those that did show up, the three most productive links (wikitravel, facebook, lonelyplanet) are set to "nofollow", while all except one of the other links are from personal websites, without "nofollow".)
Todays search engine check (same keywords as yesterday) shows only one significant change (on Bing the site moved from 9th to 26th place):
DuckDuck 17th place - main page
Yahoo (US) 19th place - main page
Ixquick 28th place - main page
Bing 26th place - main page
Google 48th place (minor page) / 112th place (mention) / 127th place (scraper)
Will have another look at Seznam - my Czech is limited, so it's quite likely I misundertood something about the fee there. ;)
Follow-up: the 20Euro thingy turned out to be some Arabic web service with a Czech language page that was peddling a submission to Seznam (thanks but no thanks).
A more fundamental problem: I can find neither a link for URL submission nor any hint in that direction (via searching with Seznam, Google, or others - and considering that the English meaning of "seznam" is "list", pinpointing the search is not an easy task. ;)) Guess I should look for a helpful Czech...
I'm puzzled. How on earth have you prevented Seznam from crawling your pages? I certainly never took any action; they just showed up. All this is assuming you've got an ARIN domain. Apparently RIPE domain names are less public, so people may not know you exist unless someone links to you.
:: detour to own logs, including archives ::
It looks as if they discovered my site in October of 2011. (My archives go back to spring 2011, and the name crops up in WebmasterWorld posts going all the way back to 2000.)
They seem to crawl by fits and starts. In the month of January for example I see them on 23-26, 28 and 30. They look like this:
18.104.22.168 - - [30/Jan/2014:09:08:10 -0800] "GET /robots.txt HTTP/1.1" 200 623 "-" "SeznamBot/3.0 (+http://fulltext.sblog.cz/)"
My own Czech is nonexistent, but they seem to be rolling out a modified UA:
|Nový User-Agent string, který se bude používat od února 2014: |
User-Agent: Mozilla/5.0 (compatible; SeznamBot/3.2; +http://fulltext.sblog.cz/)
(Uh... "nový" can't really mean anything but "new" can it?)
|How on earth have you prevented Seznam from crawling your pages? |
I don't think I have prevented it from visiting. ;) The domain in question is rather new (from last summer), and I have all the logs since the beginning to know that it just hasn't come by yet.
|All this is assuming you've got an ARIN domain. |
The TLD is INFO, and most of the domains linking to it are JP and TW (Google, Yandex, Duck, Bing, and search engines in China, Korea, and Japan apparently had no trouble finding it quickly, so I thought other spiders, especially European ones, would appear sooner or later, too, but instead we got UA based log spammers (LOL)).
Talking about links: the non-Asian links that I consider most useful in the context of this discussion (Wikitravel, Facebook, Lonely Planet, and a few thematically related forums in English and German) have set external links to "nofollow". (Those links do bring in human visitors, so I am not complaining. :))
And thanks for the added info!