Forum Moderators: open
Each IP only requested 1 or 2 pages max and all of the requests had a different version # in the user agent.
80.101.85.nnn "webcollage/1.140"
81.174.16.nnn "webcollage/1.129"
130.160.61.nnn "webcollage/1.117"
24.40.151.nnn "webcollage/1.143"
207.22.18.nnn "webcollage/1.135"
202.149.196.nnn "webcollage/1.117"
I'm wondering if someone is using this:
[wcollage.sourceforge.net...]
So what do you think, proxy IPs? bizarre botnet? I'm baffled.
I see different IPs though but the IPs are reported as proxies.
Are you seeing normal requests to valid pages? It may be different in that case
The webcollage suppose to be some image retrieval thing. It gets images from the web then displays them to a screen saver. That's the original concept anyways.
These are asking for data they could be getting from my RSS feed, nothing malicious.
Last year there was only about 375 of these hits and this year-to-date it's 978.
Very odd stuff but I think I've identified some sort of network out there.
Referer was [altavista.com...]
Both were to the same site with the same page/querystring to a URL which has only ever been presented to SEs (and is not apparently on google or altavista now) and which now redirects to the home page. Which it did, suggesting browser behaviour since the 404 was triggered to serve up the home page (at this point webcollage was not specifically trapped for).
I saw hits on an hourly basis to similar URLs during the big SQL Injection splurge a couple of months ago - URLs with nice juicy querystrngs to attach their evil to. I wonder if, in this particular case, a botnet is researching using webcollage either actually or forged to find sites but it seems unlikely - there are far better ways of getting lists of sites with querystrings.
[altavista.com...]
[random.yahoo.com...]
When I checked out the webcollage site a few years ago it was a mosaic of images linked back to their originating web sites, creating a sort of freebie back-link which people could click on at random. Probably that was a taster for the application. From a simple google cache view that is no longer the home page (but I didn't view the Flash image). It appears to be more commercial.
From a Linux Man Page returned by google on the obvious search...
"The webcollage program pulls random image off of the World Wide Web and scatters them on the root window. One satisfied customer described it as "a nonstop pop culture brainbath." This program finds its images by doing random web searches, and extracting images from the returned pages."
Possibly a form of screen saver?
It goes on to say it runs faster if you install software on your machine. I'm unsure if this is only linux-based but probably not. It does explain the variation in version numbers, though. The company claims (on the cached page) to be growing rapidly.
Not sure why it's hitting web pages. I assume it still links back to them and I suppose people just click on one of the images from time to time. Or maybe the "screen saver" just throws up random pages as well as images. Man Page also suggests it can grab images from local cache instead of search engines, so maybe that's one of the sources. Only guesses and I don't have time nor inclination to experiment.
I still don't trust it, especially when hits come together from differently geographic IPs as though through a botnet.
The way I see it in my logs is this:
GET http://example.com/logo.gif
where example.com is another site not mine.
The referrer fields you posted match the ones in my logs.
Also seems that if you send redirect headers for the request the other end never gets to the location url.
At some point I deployed some sort of detection for rfis before the scripts initialize and load some totally irrelevant code with a 200 response and I haven't seen the webcollage thing since. I can only assume they rely on the 200 headers and read the content in which case they weren't too happy about it. Time will show.
"Express\ WebPictures"
"JOC\ Web\ Spider"
"Web\ Image\ Collector"
"Web\ Sucker"
"WebAuto"
"WebCopier"
"WebFetch"
"WebReaper"
"WebSauger"
"Website\ eXtractor"
"WebStripper"
"WebWhacker"
"WebZIP"
"Xaldon\ WebSpider"
"WebEnhancer"
"libWeb"
"WebVac"
"webcollage"
"WebVulnCrawl"
"WebarooBot"
"authoritativeweb"
added:
or ye and my favorite on of all: FunWebProducts
One of the servers responded with the title "Test Page for the Apache HTTP Server on Fedora Core", another said only "Hi there, I see you". The "static" IP came back with another Fedora page but far more extensive than the other.
The fourth server seems to be a proper public web site, although the IP had already been banned (coming from a known server farm). The home page has links to social network sites and order carts and looks genuine. I suspect but don't know that they may be patching webcollage stuff into the site somewhere (no trace of it on the home page).
I have now upgraded webcollage from "block page but don't block IP" to "block IP" on the grounds that if it's compromised computers I don't want them coming back and if they're not, why should I play their game when they are extremely unlikely to ever be interested in my sites.
And if they are not compromised they could well be soon, since they are apparently randomly downloading pages from sites that may well be trojan traps.
I have now upgraded webcollage from "block page but don't block IP" to "block IP"
I wouldn't do that as these primarily appear to be screen savers on mostly linux machines which in some parts of the world are rapidly gaining popularity opposed to being trapped in the clutches of expensive Microsoft software.
Downloading a web page, which is pure text, holds ZERO threat.
For a machine to become compromised there must be something on that page that takes action to download the malware executable, such as javascript or even a meta redirect to the malware file (cute trick, I saw it in use once).
Therefore, just using CURL, WGET or some similar tool like webcollage to download and parse an HTML file looking for images poses ZERO threat.
The page must be loaded in an actual browser with javascript enabled, and then accept the downloaded file for a threat to exists, assuming that file can get past the AV software.
I'm using Avast AV which has an internal proxy and it monitors the data stream itself so the mere presence of a javascript malware injector script in the page triggers Avast before the entire page is even downloaded into my browser.
Not that the web is a safe place, and I warn people about the malware all the time, but it requires a certain level of technology enabled before it actually becomes a threat.
[edited by: incrediBILL at 6:44 pm (utc) on Dec. 11, 2008]
As observed above, clicking through to the image-linked sites it displays can be dangerous, as clicking on any link on the Web can be.
But WebCollage was/is basically a harmless application based on a URL at AltaVista that can be queried to return a random URL from their index. Now that Yahoo owns the remains of AltaVista, they've added their own 'branded URL' but the function is the same.
I just don't want readers of this thread to think that WebCollage is in itself an evil thing. It's not. But that's not to say that either the WebCollage user-agent or the get-random-URL pages at AV and Yahoo are not being spoofed and/or exploited here.
I suppose a scraper/harvester botnet could fetch one random URL from AV/Yahoo for use as a 'target' and then fetch another as a spoofed referrer to use when fetching that target.
You can safely block it by generic user-agent, and nothing bad will happen.
Ask ten grey-headed advanced Webmasters or server admins about WebCollage, and it's likely that at least nine know what it is... When the Web was much, much smaller and not nearly so commercial or potentially-dangerous, WebCollage was one of the early "entertainments" and "time-wasters" on the Web. I guess I need to go look at it at least one more time now... :)
Jim
I suppose a scraper/harvester botnet could fetch one random URL from AV/Yahoo for use as a 'target' and then fetch another as a spoofed referrer to use when fetching that target.
Exactly, not only can they do it, they have a perfect cover story using the AV or Yahoo referrer and set their UA to webcollage.
Since the knowledge is out there, and real webcollage adoption is escalating, it's probably best to just block that UA and avoid future problems.
WebCollage was one of the early "entertainments" and "time-wasters" on the Web.
It's amazing what was cool back then sets off alarms today.
The times they are changing.
Six hits on 5 IPs within a minute on a bad url that in any case redirects to another page (on which a fetch is attempted) suggests it's reading a page in something approaching a browser. It has always had a rejection page from my server but it still comes back, though rarely until last month and even then only a few a month.
In particular, why is webcollage coming in on a URL that is at best very obscure on all the engines, includng altavista? It only seems to appear as a link within one other web site and not in its own right at all. I'm still paranoid that it's the same URL as hackers try to use.
I can't believe my web site is anything special to trigger 6 hits at once so I assume the hits are programmed by Webcollage Central. In which case some sites may well get hundreds before long if the webcollage installation rate really is increasing.
In fact, there are no images other than plain-text logos on both the page it wants and the page it would have got if I hadn't sent it a 403 with a page saying "Site access denied" so it would be a very boring display even if it worked for them.
Downloading a web page, which is pure text, holds ZERO threat.For a machine to become compromised there must be something on that page that takes action to download the malware executable, such as javascript or even a meta redirect to the malware file (cute trick, I saw it in use once).
Yes, true, but you cannot tell in advance what the server is going to send you. I have seen browsers downloading right away something and store it when I opened a "web page". It all depends what plugins the browser hooks on and what sites you trust. Configuration details a human visitor usually is unaware of and trusts the browser defaults. As of the AV protection I've seen them generating false positives or missing all kind of details. They rely on signatures for malware (they don't examine the real code as far I can tell by default at least, because it will take ages to scan something) and the code can change at anytime. I saw Avast raising red flags for simple javascripts and then you just add an extra dummy command and the red flag was off. Honestly I cannot solely rely on it. I block active content on the browser to be sure.
webcollage can be a manipulation tool for an evil server. A Server can give out regular images most of the time, but at some point it may also send out some malware. Whoever is on the other end could download it and open it. So perhaps the s/w was useful years back but with the kind of use we see now I don't know. I think the responsibility goes to the browser engines. They should use a default configuration to protect users and not expose them to possible web threats.