Webcollage Barrage

Forum Moderators: open

Message Too Old, No Replies

Webcollage Barrage

bizarre consecutive distributed hits

incrediBILL

7:30 pm on Dec 3, 2008 (gmt 0)

These requests were consecutive within a very short time.

Each IP only requested 1 or 2 pages max and all of the requests had a different version # in the user agent.

80.101.85.nnn "webcollage/1.140"
81.174.16.nnn "webcollage/1.129"
130.160.61.nnn "webcollage/1.117"
24.40.151.nnn "webcollage/1.143"
207.22.18.nnn "webcollage/1.135"
202.149.196.nnn "webcollage/1.117"

I'm wondering if someone is using this:
[wcollage.sourceforge.net...]

So what do you think, proxy IPs? bizarre botnet? I'm baffled.

enigma1

8:16 pm on Dec 3, 2008 (gmt 0)

I see the UA sometimes. They come in and do RFIs. Injecting HTTP urls. that's from my logs.

I see different IPs though but the IPs are reported as proxies.

Are you seeing normal requests to valid pages? It may be different in that case

The webcollage suppose to be some image retrieval thing. It gets images from the web then displays them to a screen saver. That's the original concept anyways.

dstiles

8:50 pm on Dec 3, 2008 (gmt 0)

We've been blocking webcollage for years. They (always?) get their URLs from Altavista. Always different IPs but a variety of version numbers. Generally sporadic but in the past week or so it's been increasing. As has a lot of other scrape-type traffic.

incrediBILL

9:56 pm on Dec 3, 2008 (gmt 0)

I check and record proxy information if present, none show themselves as a proxy.

These are asking for data they could be getting from my RSS feed, nothing malicious.

Last year there was only about 375 of these hits and this year-to-date it's 978.

Very odd stuff but I think I've identified some sort of network out there.

dstiles

11:24 pm on Dec 3, 2008 (gmt 0)

I now think these are coming in from compromised machines. Whilst reading this posting I received two webcollage hits within a couple of seconds, one from Germany, the other from the UK. They were the only hits from the IPs (but see below).

Referer was [altavista.com...]

Both were to the same site with the same page/querystring to a URL which has only ever been presented to SEs (and is not apparently on google or altavista now) and which now redirects to the home page. Which it did, suggesting browser behaviour since the 404 was triggered to serve up the home page (at this point webcollage was not specifically trapped for).

I saw hits on an hourly basis to similar URLs during the big SQL Injection splurge a couple of months ago - URLs with nice juicy querystrngs to attach their evil to. I wonder if, in this particular case, a botnet is researching using webcollage either actually or forged to find sites but it seems unlikely - there are far better ways of getting lists of sites with querystrings.

incrediBILL

11:50 pm on Dec 3, 2008 (gmt 0)

I dug deeper and found both referers:

[altavista.com...]
[random.yahoo.com...]

Receptional Andy

11:56 pm on Dec 3, 2008 (gmt 0)

I wonder if those referrers were mis-coded attempts to actual follow the random link and use that as the referrer - i.e. an easy way to get a random (likely valid) referrer.

incrediBILL

12:09 am on Dec 4, 2008 (gmt 0)

I'm compiling more research on this mess, 457 unique IPs this year, mostly one-shot hits, some had multiple hits with the max being 21 times.

Most of the IPs are vary random with a couple of instances of clustering around a single D block.

[edited by: incrediBILL at 12:09 am (utc) on Dec. 4, 2008]

dstiles

4:14 am on Dec 4, 2008 (gmt 0)

NOTE - clicking on either altavista or yahoo random URLs above goes straight to a random site. It could be any site from a valuable resource to a trojan-trap. In two attempts this evening I hit a Spanish language site and a Cyrillic site. I NEVER intend to do this again. Windows is dangerous enough without provocation!

When I checked out the webcollage site a few years ago it was a mosaic of images linked back to their originating web sites, creating a sort of freebie back-link which people could click on at random. Probably that was a taster for the application. From a simple google cache view that is no longer the home page (but I didn't view the Flash image). It appears to be more commercial.

From a Linux Man Page returned by google on the obvious search...

"The webcollage program pulls random image off of the World Wide Web and scatters them on the root window. One satisfied customer described it as "a nonstop pop culture brainbath." This program finds its images by doing random web searches, and extracting images from the returned pages."

Possibly a form of screen saver?

It goes on to say it runs faster if you install software on your machine. I'm unsure if this is only linux-based but probably not. It does explain the variation in version numbers, though. The company claims (on the cached page) to be growing rapidly.

Not sure why it's hitting web pages. I assume it still links back to them and I suppose people just click on one of the images from time to time. Or maybe the "screen saver" just throws up random pages as well as images. Man Page also suggests it can grab images from local cache instead of search engines, so maybe that's one of the sources. Only guesses and I don't have time nor inclination to experiment.

I still don't trust it, especially when hits come together from differently geographic IPs as though through a botnet.

incrediBILL

4:54 am on Dec 4, 2008 (gmt 0)

I'd definitely feel better if I knew it was just some silly screen saver or something.

enigma1

10:33 am on Dec 4, 2008 (gmt 0)

Yes, the screen save thing is what webcollage is documented to do. Not sure if you should feel better about it. In my opinion they're using the principles of this package to hijack sites and systems.

The way I see it in my logs is this:
GET http://example.com/logo.gif

where example.com is another site not mine.

The referrer fields you posted match the ones in my logs.

Also seems that if you send redirect headers for the request the other end never gets to the location url.

At some point I deployed some sort of detection for rfis before the scripts initialize and load some totally irrelevant code with a 200 response and I haven't seen the webcollage thing since. I can only assume they rely on the 200 headers and read the content in which case they weren't too happy about it. Time will show.

incrediBILL

6:45 pm on Dec 4, 2008 (gmt 0)

OK, the screen saver hits aside, I found this one stray entry:

71.141.116.* "webcollage.perl/1.107"

Wonder if it's a different port of the screen saver or something entirely new?

incrediBILL

7:42 pm on Dec 4, 2008 (gmt 0)

Apache rules for the people that need it:

RewriteCond %{HTTP_USER_AGENT} ^webcollage
RewriteRule .* - [L,F]

wilderness

8:35 pm on Dec 4, 2008 (gmt 0)

Apache rules for the people that need it:
RewriteCond %{HTTP_USER_AGENT} ^webcollage
RewriteRule .* - [L,F]

#line one modified to catch many pests (begins with "web"): line two modified #for redundancy.
RewriteCond %{HTTP_USER_AGENT} ^web [NC]
RewriteRule .* - [F]

blend27

3:26 am on Dec 5, 2008 (gmt 0)

--begins with "web"):

"Express\ WebPictures"
"JOC\ Web\ Spider"
"Web\ Image\ Collector"
"Web\ Sucker"
"WebAuto"
"WebCopier"
"WebFetch"
"WebReaper"
"WebSauger"
"Website\ eXtractor"
"WebStripper"
"WebWhacker"
"WebZIP"
"Xaldon\ WebSpider"
"WebEnhancer"
"libWeb"
"WebVac"
"webcollage"
"WebVulnCrawl"
"WebarooBot"
"authoritativeweb"

added:
or ye and my favorite on of all: FunWebProducts

wilderness

3:45 am on Dec 5, 2008 (gmt 0)

None of the following "begin with":

Express\ WebPictures
JOC\ Web\ Spider
Xaldon\ WebSpider
libWeb
authoritativeweb

The previously provided lines would however catch all the others.

incrediBILL

6:39 am on Dec 5, 2008 (gmt 0)

Whitelisting nails them all, it's the only way to fly.

dstiles

6:27 pm on Dec 11, 2008 (gmt 0)

Just had six webcollage hits from 5 IPs within 60 seconds, first three within 6 seconds, ditto the second three. A mix of 4 IPs with apache servers (one on a "static" IP) and one possibly not with a server (difficult to tell if they have a good firewall that doesn't respond to [IP)....] Again the hits were with the same querystring as before. Versions were 1.130, 1.140 and 1.147.

One of the servers responded with the title "Test Page for the Apache HTTP Server on Fedora Core", another said only "Hi there, I see you". The "static" IP came back with another Fedora page but far more extensive than the other.

The fourth server seems to be a proper public web site, although the IP had already been banned (coming from a known server farm). The home page has links to social network sites and order carts and looks genuine. I suspect but don't know that they may be patching webcollage stuff into the site somewhere (no trace of it on the home page).

I have now upgraded webcollage from "block page but don't block IP" to "block IP" on the grounds that if it's compromised computers I don't want them coming back and if they're not, why should I play their game when they are extremely unlikely to ever be interested in my sites.

And if they are not compromised they could well be soon, since they are apparently randomly downloading pages from sites that may well be trojan traps.

incrediBILL

6:43 pm on Dec 11, 2008 (gmt 0)

I have now upgraded webcollage from "block page but don't block IP" to "block IP"

I wouldn't do that as these primarily appear to be screen savers on mostly linux machines which in some parts of the world are rapidly gaining popularity opposed to being trapped in the clutches of expensive Microsoft software.

Downloading a web page, which is pure text, holds ZERO threat.

For a machine to become compromised there must be something on that page that takes action to download the malware executable, such as javascript or even a meta redirect to the malware file (cute trick, I saw it in use once).

Therefore, just using CURL, WGET or some similar tool like webcollage to download and parse an HTML file looking for images poses ZERO threat.

The page must be loaded in an actual browser with javascript enabled, and then accept the downloaded file for a threat to exists, assuming that file can get past the AV software.

I'm using Avast AV which has an internal proxy and it monitors the data stream itself so the mere presence of a javascript malware injector script in the page triggers Avast before the entire page is even downloaded into my browser.

Not that the web is a safe place, and I warn people about the malware all the time, but it requires a certain level of technology enabled before it actually becomes a threat.

[edited by: incrediBILL at 6:44 pm (utc) on Dec. 11, 2008]

jdMorgan

6:58 pm on Dec 11, 2008 (gmt 0)

To be clear: The legitimate WebCollage is just a program and a Web page (you can use either) that makes a collage of images from random sites and displays that collage. It was a cute "Gee-whiz, look at that!" thing back in the mid-90's.

As observed above, clicking through to the image-linked sites it displays can be dangerous, as clicking on any link on the Web can be.

But WebCollage was/is basically a harmless application based on a URL at AltaVista that can be queried to return a random URL from their index. Now that Yahoo owns the remains of AltaVista, they've added their own 'branded URL' but the function is the same.

I just don't want readers of this thread to think that WebCollage is in itself an evil thing. It's not. But that's not to say that either the WebCollage user-agent or the get-random-URL pages at AV and Yahoo are not being spoofed and/or exploited here.

I suppose a scraper/harvester botnet could fetch one random URL from AV/Yahoo for use as a 'target' and then fetch another as a spoofed referrer to use when fetching that target.

You can safely block it by generic user-agent, and nothing bad will happen.

Ask ten grey-headed advanced Webmasters or server admins about WebCollage, and it's likely that at least nine know what it is... When the Web was much, much smaller and not nearly so commercial or potentially-dangerous, WebCollage was one of the early "entertainments" and "time-wasters" on the Web. I guess I need to go look at it at least one more time now... :)

Jim

incrediBILL

7:22 pm on Dec 11, 2008 (gmt 0)

Jim, use of webcollage is escalating as it appears to be distributed on some versions of Linux these days, which makes is a potential threat to us if it gets too widespread.

I suppose a scraper/harvester botnet could fetch one random URL from AV/Yahoo for use as a 'target' and then fetch another as a spoofed referrer to use when fetching that target.

Exactly, not only can they do it, they have a perfect cover story using the AV or Yahoo referrer and set their UA to webcollage.

Since the knowledge is out there, and real webcollage adoption is escalating, it's probably best to just block that UA and avoid future problems.

WebCollage was one of the early "entertainments" and "time-wasters" on the Web.

It's amazing what was cool back then sets off alarms today.

The times they are changing.

dstiles

9:57 pm on Dec 11, 2008 (gmt 0)

As I remember the original, it got image links from altavista and didn't read pages, but I may be wrong on that.

Six hits on 5 IPs within a minute on a bad url that in any case redirects to another page (on which a fetch is attempted) suggests it's reading a page in something approaching a browser. It has always had a rejection page from my server but it still comes back, though rarely until last month and even then only a few a month.

In particular, why is webcollage coming in on a URL that is at best very obscure on all the engines, includng altavista? It only seems to appear as a link within one other web site and not in its own right at all. I'm still paranoid that it's the same URL as hackers try to use.

I can't believe my web site is anything special to trigger 6 hits at once so I assume the hits are programmed by Webcollage Central. In which case some sites may well get hundreds before long if the webcollage installation rate really is increasing.

In fact, there are no images other than plain-text logos on both the page it wants and the page it would have got if I hadn't sent it a 403 with a page saying "Site access denied" so it would be a very boring display even if it worked for them.

enigma1

10:58 am on Dec 12, 2008 (gmt 0)

Downloading a web page, which is pure text, holds ZERO threat.
For a machine to become compromised there must be something on that page that takes action to download the malware executable, such as javascript or even a meta redirect to the malware file (cute trick, I saw it in use once).

Yes, true, but you cannot tell in advance what the server is going to send you. I have seen browsers downloading right away something and store it when I opened a "web page". It all depends what plugins the browser hooks on and what sites you trust. Configuration details a human visitor usually is unaware of and trusts the browser defaults. As of the AV protection I've seen them generating false positives or missing all kind of details. They rely on signatures for malware (they don't examine the real code as far I can tell by default at least, because it will take ages to scan something) and the code can change at anytime. I saw Avast raising red flags for simple javascripts and then you just add an extra dummy command and the red flag was off. Honestly I cannot solely rely on it. I block active content on the browser to be sure.

webcollage can be a manipulation tool for an evil server. A Server can give out regular images most of the time, but at some point it may also send out some malware. Whoever is on the other end could download it and open it. So perhaps the s/w was useful years back but with the kind of use we see now I don't know. I think the responsibility goes to the browser engines. They should use a default configuration to protect users and not expose them to possible web threats.

blend27

3:58 pm on Dec 13, 2008 (gmt 0)

cgi.server_protocol = HTTP/1.0 = boot, unless you know what it is.

something tells me it's a linkbate, but then again: PHPProxy -> line:30 of http.php -> var $protocol_version="1.0";