homepage Welcome to WebmasterWorld Guest from 54.205.242.179
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
AVG Stops Real-Time Scanning
Webmasters Complain AVG Debilitating Traffic Analytics
Cyclob




msg:3691943
 5:44 am on Jul 7, 2008 (gmt 0)


System: The following message was cut out of thread at: http://www.webmasterworld.com/search_engine_spiders/3674410.htm [webmasterworld.com] by incredibill - 11:46 pm on July 6, 2008 (PST -8)


Good News folks! Seems like AVG stopped this real time scanning since they announced on July 5, 2008 in wikipedia... as follows...

[en.wikipedia.org...]

LinkScanner concerns:

The LinkScanner feature has been shown to cause an up to tenfold increase in traffic on web sites that appear in a search engine's search results. Since LinkScanner disguises itself as a click by a human being when it prescans each site listed in the results, web site usage logs will show incorrect and overinflated site visitor statistics. The prescanning of links also causes web sites to transfer more data than usual, resulting in higher bandwidth usage for users and web site operators.[8] AVG says site administrators will be able to filter the LinkScanner traffic out of their site statistics, leaving the problem of excess bandwidth usage still to be solved.[9] Pay-per-click advertising is not affected by the increase in traffic.[10]

On 2008-07-05, AVG announced that as of 2008-07-09 "Search-Shield will no longer scan each search result online for new exploits, which was causing the spikes that webmasters addressed with us."[11]

[edited by: incrediBILL at 7:44 am (utc) on July 7, 2008]
[edit reason] formatting cleanup [/edit]

 

Samizdata




msg:3692093
 12:18 pm on Jul 7, 2008 (gmt 0)

Meanwhile over at The Register:

LinkScanner will continue to scan links after users click on them. That means a small portion of the rogue traffic will continue, but the bulk of the problem should be solved.

[theregister.co.uk...]

Make of it what you will.

...

incrediBILL




msg:3692287
 3:29 pm on Jul 7, 2008 (gmt 0)

Here's an official announcement from AVG:
[avg.com.au...]

We have modified the Search-Shield component of the product to only notify users of malicious sites. Search-Shield no longer scans each search result online for new exploits, which was causing the spikes that web masters addressed with us.

Samizdata




msg:3692294
 3:38 pm on Jul 7, 2008 (gmt 0)

The most notable aspect of the whole LinkScanner debacle has been confusion.

AVG seem to have understood at last and done the right thing, though for obvious reasons they are putting as much spin on it as they can. But the result is all that matters and the facts are contained in earlier WebmasterWorld threads for anyone who wants to read them.

The Register remains a hopeless case. They have published five articles on LinkScanner without ever getting the story right, and in some cases have added to the confusion. But they say they intend to write more on the subject so there is still hope for redemption.

Other websites belatedly jumped on the bandwagon and added to the mess. Some have done better than others, and the principled stand taken by Whirlpool was admirable - though claims by some of their readers that they forced AVG to change the software in 24 hours are obvious nonsense.

A large number of comments and posts almost everywhere have displayed an astonishing lack of understanding of the issues by people who apparently work in IT and really should know better, though as the material they are responding to is mostly deficient some confusion is to be expected.

Here at WebmasterWorld there has also been some confusion, though it has always been corrected fairly quickly (if not immediately), and as things stand you will not find the full facts about LinkScanner anywhere else. So for anyone still confused I will spell it out:

Many webmasters have been openly fooling LinkScanner for two months or more.

It is very easy and the "Bad Guys" who serve drive-by downloads can do it too.

The bandwidth and analytics issues, though important, were always secondary.

LinkScanner was withdrawn because it was a security risk for AVG users.

...

goodroi




msg:3692296
 3:43 pm on Jul 7, 2008 (gmt 0)

That is good news. This is a good lesson for all of us - as the internet matures it is becoming more difficult to make an improvement in one area without causing complications in another.

albo




msg:3692395
 4:41 pm on Jul 7, 2008 (gmt 0)

(I don't know whether I'm alone in this, but...) I had turned off Link Scanner one one box, leaving it in a permanent "error state" and done a "Custom Install" omitting Link Scanner altogether on another box... Seemed wasteful from the get-go.

netmeg




msg:3692403
 4:47 pm on Jul 7, 2008 (gmt 0)

I'm glad they turned it off. But they still lost me as a customer, I've moved on to other products, and moved my company and my clients as well.

httpwebwitch




msg:3692467
 5:51 pm on Jul 7, 2008 (gmt 0)

this is a triumph for all those webmasters who actively petitioned (and used other channels of communication) to convince AVG to cease and desist. You know who you are... Way to go!

Scarecrow




msg:3692552
 7:45 pm on Jul 7, 2008 (gmt 0)

The Register remains a hopeless case. They have published five articles on LinkScanner without ever getting the story right, and in some cases have added to the confusion. But they say they intend to write more on the subject so there is still hope for redemption.

Without The Register this story would not have grown legs in the mainstream tech media, and AVG would not have backed down by now. Many of WebmasterWorld's users are good at eking out the details of esoteric webmastering issues, but as a whole WebmasterWorld is not very good at solving a problem by applying media pressure.

Samizdata




msg:3692565
 7:53 pm on Jul 7, 2008 (gmt 0)

WebmasterWorld is not very good at solving a problem by applying media pressure

I can't speak for Brett but I don't think "applying media pressure" is what WebmasterWorld is for.

As for The Register, they made it clear in their first article where they got their information.

...

incrediBILL




msg:3692649
 9:16 pm on Jul 7, 2008 (gmt 0)

Sorry, but I removed a few posts to keep this on topic about AVG's LinkScanner and not a dicussion about virus software in general or the merits of virus software.

Let's keep it on track.

Scarecrow




msg:3693121
 11:56 am on Jul 8, 2008 (gmt 0)

If anyone has the commercial version of AVG 8, I'd appreciate some feedback on how they changed the LinkScanner interface. The updated commercial version is available sometime within the next day or two. I'm not about to purchase AVG just to test this interface.

I'm referring specifically to the interface where you do a Google (or Yahoo or Live) search and a gray thing appears next to each link on the search results page, and quickly turns green. Then if you mouseover the green check, it says "Safe: This page contains no active threats" and shows how many seconds it took to scan that link, a date and time stamp, and the IP address of the URL for that link.

I did testing on the free AVG 8 two days ago, on the new build 138 that came out on July 4. This version was initially available only on the grisoft.com site for the first couple of days, but now it is also the version offered by CNET's download.com. This build has not changed the LinkScanner interface in terms of the gray and green checks, and the little mouseover window for a green check. What changed is that it no longer goes to the website at all. All it does is a DNS lookup for each link.

This DNS lookup uses the DNS service attached to your local ISP through your browser's configuration. There is no "phoning home" to AVG during any portion of this process. I checked this carefully with Wireshark. It went through every link on the results page and got an IP address from your local provider. There's not even any caching. If you keep clicking to refresh the page, your DNS provider keeps getting hammered.

Of course, it's much faster now that the page fetch isn't done. It only takes a few packets (maybe even just one?) to get an IP address. But it's deceptive, because the only thing that AVG can do now with an IP address on a link is to throw it away. If you aren't using it to go to the site, it's useless. I've seen at least one blog where the blogger did tests and praised the way AVG "fixed" the LinkScanner so that it's much, much faster. This blogger wasn't aware of what was going on.

I'm sure some will suggest that there's a database of IP addresses that gets regularly updated by AVG and sent to the users, and this is why the DNS lookup is legitimate -- it's checking against this internal AVG "bad site" list on your hard disk. I contend that this is impossible. You have to use the actual URL for any site lookups. If you used only an IP address, you would end up identifying a lot of innocent sites as malware, due to the fact that many sites use name-based hosting instead of a dedicated IP address. Name-based hosting means that many diverse websites are all using the same IP address. The URL is already in the link shown by Google, so there is no conceivable reason why a DNS lookup, done with your local DNS provider, can add to any information that AVG needs to do any sort of legitimate check. The only reason it ever did a local DNS lookup in previous versions of LinkScanner was because it actually went to that site and fetched the page. Now it doesn't do that, thankfully, so why is it still pretending that the gray-to-green check process is legitimate and useful?

I suspect that AVG has no intention of changing this interface in the free version, because it's kind of alluring in a "blinking light" sort of way, and it makes mom and pop feel good about using the web. The free version drives a lot of sales of the paid version.

However, I'm interested in the paid version interface. I believe that if you have a paid version, you have better legal standing to expect that there is no deception in AVG's interface. That's why I'm curious about how AVG handles this particular interface in the paid version.

Samizdata




msg:3693128
 12:05 pm on Jul 8, 2008 (gmt 0)

I'm not about to purchase AVG just to test this interface

Like many other software companies AVG offer free 30-day trials for download.

The updated versions should be available today or tomorrow.

But none of them will be "final" - that is the nature of software.

...

appi2




msg:3693151
 12:25 pm on Jul 8, 2008 (gmt 0)

The paid version updated this morning, from an end user point of view nothing looks any different.
The date/time on info tab always shows the current time, even for searches I have just done now, repeated from this morning.
Only weird thing I noticed was searching on yahoo, you get the grey ? and then they all disapear. Think that may be to do with yahoo/macaffe search scan.

rise2it




msg:3693154
 12:28 pm on Jul 8, 2008 (gmt 0)

"I'm glad they turned it off. But they still lost me as a customer, I've moved on to other products, and moved my company and my clients as well."

I'm in that same camp...I can forgive them for making a mistake, but not the arrogance and letting this go on this long.

Oh, and for any of you intentionally running certain software to log usage (say, for employees) - installing AVG 8.0 kills it - even if you go in and tweak the AVG settings to allow it (and keep it from removing some of the files during scans), it still disables the software. Nothing up to 7.5 had this effect.

Scarecrow




msg:3693317
 3:39 pm on Jul 8, 2008 (gmt 0)

Only weird thing I noticed was searching on yahoo, you get the grey ? and then they all disapear. Think that may be to do with yahoo/macaffe search scan.

The same thing happened to Yahoo searches on the free version, build 138. My immediate wild guess when I saw this is that Yahoo inadvertently threw them a curve ball by changing their coding, and it broke AVG's parsing that was extracting the URLs from the results page. Yahoo is difficult to parse - their page is bloated, there are one or two levels of redirects before you get to the URL you want, and they change the coding structure every few months. I scraped Yahoo for three years, and had to fix my scraper about eight times because Yahoo inadvertently broke it with their coding changes. I finally gave up on Yahoo - it was too much trouble.

I cannot imagine putting out a commercial product that depends on parsing search results pages a particular way. I thought that maybe AVG had some sort of understanding with Yahoo, Google, and Live so that their extraction of the URLs wouldn't be subject to breakdowns (something like a secret way to get the URL out that will be more dependable across various coding changes by the search engine).

Now I'm thinking that those who pushed LinkScanner into AVG 8 had very little idea of what they were getting into.

Are you getting the IP address in that green check mouseover window for all of the links, just like in the free version?

appi2




msg:3693345
 3:59 pm on Jul 8, 2008 (gmt 0)

Yeah you get the ip, although it doesn't always match the ip of the site. That may be redirects or something?

Another little thang the search I did earlier site:www.example.com I've just been looking through the logfiles. By no means do you get the DDOS that you would have got from the old link scanner, but noticed ten or so
HEAD /www.example.com.rar
HEAD /www.example.com.zip
HEAD /www.example.com.tar.gz
All from a godaddy ip, could be coincidence but this site is nearly dead and the logfiles easy to view, the closest visitor was 15min each side.

Samizdata




msg:3693416
 4:47 pm on Jul 8, 2008 (gmt 0)

Only weird thing I noticed was searching on yahoo

When AVG does its "first run" you get sent to an "Installation successful!" webpage which states:

"Users of Yahoo! search will notice that only dangerous, suspicious or unknown links will be marked. All unmarked links have been checked and found to be safe."

I doubt that anyone needs further comment from me on that one.

One point that wasn't mentioned about the previous LinkScanner incarnation is that due to Yahoo's idiotic habit of stripping the trailing slash from directory links the number of LinkScanner hits in the logs were often doubled due to the 301 it caused.

Fortunately those days appear to be gone, but continued vigilance would seem sensible.

I'm thinking that those who pushed LinkScanner into AVG 8 had very little idea of what they were getting into

That is my candidate for understatement of the year.

...

Scarecrow




msg:3694715
 7:24 pm on Jul 9, 2008 (gmt 0)

[news.cnet.com...]

"However, Thompson disputed a claim by AVG-watch.org that the updated AVG version now only "pretends to prefetch," and does little more than a DNS lookup of the site. Thompson said "it doesn't pretend to pre-scan. It just works off the local blacklist. That involves a DNS lookup, so that we can compare both ips and urls."

Can someone explain to me why they'd need both a DNS lookup and a URL to check the local blacklist?

I don't buy it. I think they need the DNS lookup for the "blinking lights effect."

incrediBILL




msg:3694782
 8:14 pm on Jul 9, 2008 (gmt 0)

Can someone explain to me why they'd need both a DNS lookup and a URL to check the local blacklist?

I think you misunderstand how they're using DNS as this isn't your normal DNS lookup. I think they're using something like Spamhaus does with a DNS server that responds whether something is GOOD or BAD when you ask just that specific DNS server.

Scarecrow




msg:3694806
 8:26 pm on Jul 9, 2008 (gmt 0)

No, what you're thinking about is when you have your server's email daemon, like sendmail, set to the "paranoid" anti-spam function. It does a in-addrarpa reverse lookup to make sure the IP address on the incoming email also reverse-resolves correctly. This operation is twice as expensive as a forward lookup. The usual DNS server attached to your Internet service provider, which is what AVG is using, does not do reverse lookups.

Moreover, many folks with dedicated servers don't even run two-way email on them, and have no need to register with in-addrarpa to catch spam. My sshd daemon also tries to check the reverse lookup when I log in, but the test always fails and it merely throws a message into a log file and lets me in anyway. Reverse lookups, to say the least, are not very reliable because many of them, even if they succeed, are out of date.

When I had AVG's latest version hooked up to Wireshark, I didn't see any calls to anything other than my local name servers attached to my local service provider. As far as I know, there's no "paranoid" option for requesting both a forward and reverse lookup with your usual name server.

[edited by: Scarecrow at 8:38 pm (utc) on July 9, 2008]

Samizdata




msg:3694823
 8:46 pm on Jul 9, 2008 (gmt 0)

From the article:

Last Thursday, Web masters around the world noticed unusual spikes in traffic

That one had me in stitches. But then so have many of the other attempts at "journalism" that are now plastered all over the web and getting the story hopelessly wrong - I have even seen one saying that LinkScanner affected your site's ranking in the SERPs.

Thompson said "it doesn't pretend to pre-scan. It just works off the local blacklist.

I think "pretends to prefetch" is a reasonable assessment given that the user experience has not changed and that LinkScanner does not actually "scan" links as it did before, and I don't think changing the phraseology of the question really gives Mr Thompson any wriggle-room.

But such nonsense is to be expected from that source - as in "we still enable those webmasters who want to filter our requests out of their results to do so" (but we won't tell them how).

"The real issue is that, like it or not, we're at war on the Web," said Thompson.

The real issue is that he went to war against the wrong people and lost.

He should count himself lucky that the real story of how his software was a security risk to his customers was completely bungled by The Register, and that everyone else just copied their reports.

As Jim said elsewhere, our concern should be the effect (if any) that the castrated software has on our servers. AVG Technologies seem to have done the right thing, albeit belatedly, and Mr Thompson is no longer a problem for anyone except those who pay his salary.

...

incrediBILL




msg:3694839
 9:05 pm on Jul 9, 2008 (gmt 0)

No, what you're thinking about is when you have your server's email daemon, like sendmail, set to the "paranoid" anti-spam function.

No, what I'm thinking of is a DNSBL [en.wikipedia.org].

If you do a query on 127.0.0.2 to see if it's good you do an NSLOOKUP for 2.0.0.127.dnsbl.example.com. If it returns NXDOMAIN then the site is OK, anything positive result (anything else) and it's bad.

If they're not doing this then the new incarnation is even lamer than I expected with horribly outdated information.

This can be used with email systems and other things.

Don't tell anyone (shhhhhh) because this is a unique application, but I actually use some of these DNSBL services to detect bad IPs and block visitors to my site if the results are positive.

Seb7




msg:3695224
 8:05 am on Jul 10, 2008 (gmt 0)

[computerworld.com.au...]

"AVG caves to Webmasters" - well done webmasters!

AVG wrote in a statement. "AVG will issue a product modification to address the spikes that a few individuals have seen with their Web traffic."

A few individuals uh!

nick2007




msg:3698028
 3:27 pm on Jul 14, 2008 (gmt 0)

Finally!

Just noticed my catching script stopped working yesterday evening, I was worried at first, but seems all good now.

Last 30 days resulted in 2 million bad hits!

They took their time!

blend27




msg:3698373
 10:51 pm on Jul 14, 2008 (gmt 0)

.........

and there it goes again, down-played version of importance of STATS.

I am so glad this episode is some-what over...

incrediBILL




msg:3698536
 4:25 am on Jul 15, 2008 (gmt 0)

It's nowhere near over as this LinkScanner is still being sold online as a stand alone product and there's no mention whether or not it no longer hits every site.

tangor




msg:3698755
 11:31 am on Jul 15, 2008 (gmt 0)

I'm still getting AVG link scanner hits... not as many, but still several hundred a day. And whether it is a head, get, and get I throw them all away (stats wise) since I can't really tell if a human ever came by. My site is pretty narrow niche and every hit is important... and it will be a while before I can get good numbers out of my logs. Irritating....

incrediBILL




msg:3699177
 8:30 pm on Jul 15, 2008 (gmt 0)

it will be a while before I can get good numbers out of my logs

Basing analytics on raw server logs will probably not be very accurate in your lifetime

If you use Google Analytics, which are all javascript based, you'll get the best numbers of actual humans.

System
redhat



msg:3699438
 1:16 am on Jul 16, 2008 (gmt 0)

The following message was cut out to new thread by incredibill. New thread at: search_engine_spiders/3699436.htm [webmasterworld.com]
6:03 pm on July 15, 2008 (PST -8)

This 39 message thread spans 2 pages: 39 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved