Forum Moderators: open
77.91.224.7 - - [22/Jan/2008:17:14:54 -0500] "GET /robots.txt HTTP/1.1" 200 5955 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html)
Always requests robots.txt, but I caught it taking disallowed files so I disallowed it. Funny thing is, once I disallowed it via robots.txt, now it obeys and goes away.
Anyone know of a reason I should allow it again?
Seems all they want to do is follow site scrapers around.
It's too bad really, because WebAlta is supposed to be the Russian equivalent of Google with regard to it's popularity over there.
We've several sites with weblogs, and that WebAlta huckster comes in within the hour to parse and/or collect the "exact" page the scraper couldn't get the first time around.
The requests are perfectly aligned, in that it asks for the same thing the scraper did right down to the anchor links. weblogs are it's favourite target, with movies, mp3's, and software following. If a scraper puts in a request for a file and gets denied, I can almost count, to within the minute when WebAlta will show.
I've been watching these guys for a good long while now, and patterns are patterns, so it's asta-le-way-you-go to WebAlta.
Seems all they want to do is follow site scrapers around.
Have a friend who moved a widget website that had been online through three free hosts and a domain.
The website was her college thesis some ten years ago.
When the friend didn't renew their hosting, I offered a sub-folder.
The result is that the sub-folder has much, much less restrictive access than the rest of my websites.
Most every visitor that hits the subfolder results in multiple requests for the root (403'd) and then is followed immediately by 2-5 entirely different IP's making identical requests.
I don't accumulate these request/denies because most are non-North American ranges which don't get into my sites anyway (with only a few exceptions).
The coincidence of these repeated patterns is simply too frequent to overlook.
Accept: text/html;q=1.0, text/plain;q=1.0, text/;q=0.5, */*;q=0.1
Accept-Charset: utf-8;q=1.0, windows-1251;q=0.8, cp1251;q=0.8, koi8-r;q=0.8, *;q=0.5
Accept-Encoding: gzip;q=1.0, deflate;q=1.0, identity;q=0.5, *;q=0
Host: www.some.com
User-Agent: WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
77.91.224.16
He visited our web page [abx.de ]
on 14th May 2008 at 08:56 CEST
His IP address: 85.17.173.8 ( LeaseWeb, AMSTERDAM, Netherlands )
We showed him a new email address generated only for him.
We received the first spam for this email address on
30th May.
Andreas.
WebAlta visited our web page several times
from IP 77.91.224... (WEBALTA-NET, Russia)
He identified himself as webalta crawler/2.0.
We received no spam.
This time, he came from 85.17.173.8.
He identified himself as WebAlta Crawler/1.3.34.
We received spam for the shown email address.
This is the list of the last visits:
27th jan 2008 13:09 cet
IP: 77.91.224.5 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
04th feb 2008 13:51 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
10th feb 2008 04:52 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
17th feb 2008 13:55 cet
IP: 77.91.224.5 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
27th mar 2008 11:09 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
07th may 2008 03:39 cest
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
14th may 2008 08:56 cest
IP: 85.17.173.8 (LeaseWeb, AMSTERDAM, Netherlands )
Browser:
webalta crawler/1.3.34
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
21st may 2008 12:01 cest
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)
Andreas
But heres some info and links from the WebAlta.ru website that seems to be a Russian portal offering Email, Dating and Search's where the WebAlta.net bot comes from (maybe) .
Main Site: [webalta.ru...]
Top ranked sites in its index: [top.webalta.ru...]
Run a search query: [top.webalta.ru...]
There Marketing site ?: [altastat.com...]
If you want to see the site in english run it through google's translator: translate.google.com
a 2006 webmasteworld discussion had in coming from 87.224.173.*
based on excessive speed and above email harvesting, deep sixed it.