homepage Welcome to WebmasterWorld Guest from 184.73.40.21
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
WebAlta Crawler
keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3555426 posted 6:16 am on Jan 23, 2008 (gmt 0)

Been around for a year or so, but I've yet to find any documentation on this bot. Their info page doesn't resolve.

77.91.224.7 - - [22/Jan/2008:17:14:54 -0500] "GET /robots.txt HTTP/1.1" 200 5955 "-" "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html)

Always requests robots.txt, but I caught it taking disallowed files so I disallowed it. Funny thing is, once I disallowed it via robots.txt, now it obeys and goes away.

Anyone know of a reason I should allow it again?

 

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3555426 posted 4:13 pm on Jan 23, 2008 (gmt 0)

banned until URL provided in UA resolves.

kamikaze Optimizer

10+ Year Member



 
Msg#: 3555426 posted 10:51 am on Jan 25, 2008 (gmt 0)

I just blocked it. I block most all that are not from sites that send me traffic (G, Y & M).

Lord Majestic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3555426 posted 2:51 pm on Jan 25, 2008 (gmt 0)

It all probability this is the bot of [webalta.ru...] -
primarily Russian language search site. They used to have another user-agent though, so it could be fake/pretender.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3555426 posted 11:33 pm on Jan 25, 2008 (gmt 0)

The IP range the bot is coming from is the same inetnum as whois for webalta.net so I have taken down blocks and allowing it to crawl through my accounts. However, they do need to get their act together and get their info visible.

mcneely

10+ Year Member



 
Msg#: 3555426 posted 6:55 pm on Feb 4, 2008 (gmt 0)

These guys will forever remained banned on our end.

Seems all they want to do is follow site scrapers around.

It's too bad really, because WebAlta is supposed to be the Russian equivalent of Google with regard to it's popularity over there.

We've several sites with weblogs, and that WebAlta huckster comes in within the hour to parse and/or collect the "exact" page the scraper couldn't get the first time around.
The requests are perfectly aligned, in that it asks for the same thing the scraper did right down to the anchor links. weblogs are it's favourite target, with movies, mp3's, and software following. If a scraper puts in a request for a file and gets denied, I can almost count, to within the minute when WebAlta will show.

I've been watching these guys for a good long while now, and patterns are patterns, so it's asta-le-way-you-go to WebAlta.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3555426 posted 9:01 pm on Feb 4, 2008 (gmt 0)

Seems all they want to do is follow site scrapers around.

Have a friend who moved a widget website that had been online through three free hosts and a domain.
The website was her college thesis some ten years ago.

When the friend didn't renew their hosting, I offered a sub-folder.
The result is that the sub-folder has much, much less restrictive access than the rest of my websites.

Most every visitor that hits the subfolder results in multiple requests for the root (403'd) and then is followed immediately by 2-5 entirely different IP's making identical requests.
I don't accumulate these request/denies because most are non-North American ranges which don't get into my sites anyway (with only a few exceptions).

The coincidence of these repeated patterns is simply too frequent to overlook.

Eric

5+ Year Member



 
Msg#: 3555426 posted 9:48 am on Mar 16, 2008 (gmt 0)

At least this one obey robots.txt

Accept: text/html;q=1.0, text/plain;q=1.0, text/;q=0.5, */*;q=0.1
Accept-Charset: utf-8;q=1.0, windows-1251;q=0.8, cp1251;q=0.8, koi8-r;q=0.8, *;q=0.5
Accept-Encoding: gzip;q=1.0, deflate;q=1.0, identity;q=0.5, *;q=0
Host: www.some.com
User-Agent: WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
77.91.224.16

newbie6

5+ Year Member



 
Msg#: 3555426 posted 1:55 pm on Apr 2, 2008 (gmt 0)

Just to check, is it
User-agent: WebAlta
?

Loeffler

5+ Year Member



 
Msg#: 3555426 posted 9:22 am on May 30, 2008 (gmt 0)

WebAlta Crawler is a harvester. He collects email addresses
from web sites and adds them to spam mailing lists.

He visited our web page [abx.de ]
on 14th May 2008 at 08:56 CEST
His IP address: 85.17.173.8 ( LeaseWeb, AMSTERDAM, Netherlands )

We showed him a new email address generated only for him.
We received the first spam for this email address on
30th May.

Andreas.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3555426 posted 11:43 am on May 30, 2008 (gmt 0)

I've never seen WebAlta Crawler come from LeaseWeb ranges. But I've seen plenty harvesters/scrapers come from the range that IP is in that would spoof the UAs like there is no tomorrow.

Loeffler

5+ Year Member



 
Msg#: 3555426 posted 12:43 pm on May 30, 2008 (gmt 0)

You are right.

WebAlta visited our web page several times
from IP 77.91.224... (WEBALTA-NET, Russia)
He identified himself as webalta crawler/2.0.
We received no spam.

This time, he came from 85.17.173.8.
He identified himself as WebAlta Crawler/1.3.34.
We received spam for the shown email address.

This is the list of the last visits:

27th jan 2008 13:09 cet
IP: 77.91.224.5 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

04th feb 2008 13:51 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

10th feb 2008 04:52 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

17th feb 2008 13:55 cet
IP: 77.91.224.5 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

27th mar 2008 11:09 cet
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

07th may 2008 03:39 cest
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

14th may 2008 08:56 cest
IP: 85.17.173.8 (LeaseWeb, AMSTERDAM, Netherlands )
Browser:
webalta crawler/1.3.34
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

21st may 2008 12:01 cest
IP: 77.91.224.15 (WEBALTA-NET Moscow, Russia )
Browser:
webalta crawler/2.0
(http://www.webalta.net/ru/about_webmaster.html)
(windows; u; windows nt 5.1; ru-ru)

Andreas

reddragons

5+ Year Member



 
Msg#: 3555426 posted 6:49 am on Jul 10, 2008 (gmt 0)

The WebAlta bot visits my sites on a regular basis and as stated above the "http://www.webalta.net/ru/about_webmaster.html" url doesn't work.

But heres some info and links from the WebAlta.ru website that seems to be a Russian portal offering Email, Dating and Search's where the WebAlta.net bot comes from (maybe) .

Main Site: [webalta.ru...]
Top ranked sites in its index: [top.webalta.ru...]
Run a search query: [top.webalta.ru...]

There Marketing site ?: [altastat.com...]

If you want to see the site in english run it through google's translator: translate.google.com

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3555426 posted 4:46 am on Jul 20, 2008 (gmt 0)

I decided to let WebAlta Crawler have access to a few sites on one server cluster. The very next day it came, requested robots.txt then disobeyed it. Took disallowed files and crawled through disallowed directories.

Now permanently banned.

Lord Majestic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3555426 posted 9:36 am on Jul 20, 2008 (gmt 0)

WebAlta stopped showing search engine interface some months ago, the search query at top.webalta.ru is a directory search not web search.

Megaclinium

5+ Year Member



 
Msg#: 3555426 posted 4:50 am on Aug 27, 2008 (gmt 0)

Nasty thing started scraping parts of my site. And going way too fast at a couple pages a second, and taking ka-jillions of pages per session. (that's metric for 'billions and billions')
coming from 77.91.224.*.

a 2006 webmasteworld discussion had in coming from 87.224.173.*
based on excessive speed and above email harvesting, deep sixed it.

thetrasher

5+ Year Member



 
Msg#: 3555426 posted 12:08 am on Sep 12, 2008 (gmt 0)

Formerly known as "WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)":

77.91.224.##
User-Agent: Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)

GET /robots.txt HTTP/1.1

User-agent: *
Disallow: /

GET / HTTP/1.1

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3555426 posted 12:29 am on Sep 12, 2008 (gmt 0)

Yanga WorldSearch Bot requested robots.txt then disobeyed it, requesting disallowed files and scraping through hundreds of webpages and image files. Now banned.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved