| 3:04 pm on May 24, 2008 (gmt 0)|
"serps are dynamic - creepy thouse are"
In Any event - many won't know - that is the scary part.
| 3:26 pm on May 24, 2008 (gmt 0)|
I have noticed with my test so far, sending them a 403 or 200 with no content will cause these two to continue to retry the url repeatably anywhere from 15 to 30 times.
I am testing sending a 403 with a short error message, and will later test sending a status 200 with the same error message to see if anything different comes up. And to see if my click thoughts from the ips actually increase or decrease.
| 3:48 pm on May 24, 2008 (gmt 0)|
> sending them a 403 or 200 with no content will cause these two to continue to retry
| 3:51 pm on May 24, 2008 (gmt 0)|
One thing I should mention is that the phrase "This user-agent is used by" was carefully chosen, and does not mean that scrapers, comment spammers and other nasties don't use it as well (which in the case of SV1 they clearly do).
What it does mean is that the described behaviour can be replicated by downloading the software.
| 6:19 pm on May 24, 2008 (gmt 0)|
|I have noticed with my test so far, sending them a 403 or 200 with no content will cause these two to continue to retry the url repeatably anywhere from 15 to 30 times |
What happens if an error 500 is returned?
| 6:36 pm on May 24, 2008 (gmt 0)|
|What happens if an error 500 is returned? |
I haven't tested it, but the point with all of these "tools" is that if they have no information
(for whatever reason) then they will not flag your site as "clean" and users will naturally be discouraged from visiting it.
If you want to know the definitive answer, download one of them and try it.
| 7:06 pm on May 24, 2008 (gmt 0)|
|I haven't tested it, but the point with all of these "tools" is that if they have no information (for whatever reason) then they will not flag your site as "clean" and users will naturally be discouraged from visiting it. |
I do recognize that every webmaster has different priorities; some websites are always on "paranoid mode" and other websites have their gates wide open to every nutch and libwwperl out there. That said, when Google Web Accelerator came out, it created (rightly or wrongly) a firestorm of controversy and various websites, blogs, etc. posted cookie-cutter solutions to blocking GWA. Correct me if I'm wrong, but despite Google's efforts to market GWA, it is now used by a tiny minority of users and perhaps the reason is because so many webmasters had fought back.
So if enough webmasters are fed up with the unwanted noise generated by all these different scanners, then maybe these tools will also go away in time. The Internet IS a dynamic market, new products are tested (and fail) all the time, and I don't see any reason why we must automatically concede to every superfluous scanner.
At least the developers at Grisoft et. al. could take a moment to discuss these issues with the webmaster community. Ask for our feedback, create something like a robots.txt standard, etc. So far, it seems they have been pointedly ignoring this thread.
| 7:45 pm on May 24, 2008 (gmt 0)|
So far I have sent status codes of 403 and 200 with no content, which has caused them to repeatedly retry the page.
As for scrapers I have sorted most of those out long before I check for these two cases.
I am installing AVG 8 on one pc to test it out directly. And will post my results later.
| 7:58 pm on May 24, 2008 (gmt 0)|
|So if enough webmasters are fed up with the unwanted noise generated by all these different scanners, then maybe these tools will also go away in time. The Internet IS a dynamic market, new products are tested (and fail) all the time, and I don't see any reason why we must automatically concede to every superfluous scanner. |
As an aside, the subsequent continuation of this thread in Forum 11 may replace the "Close to Perfect Htaccess as the longest ever.
| 10:02 pm on May 24, 2008 (gmt 0)|
|maybe these tools will also go away in time |
I wish they would, but no matter how much opposition we put up I fear they are here to stay, and the best we can hope for is getting them to modify their behaviour - or getting someone else to do the job properly.
My own primary objection is not about bandwidth (though I appreciate that is also a serious issue) but about the hi-jacking of the SERPs by companies who have a vested interest in promoting fear, uncertainty and doubt. While Google has long flagged pages that are known to be dangerous, they are otherwise neutral - and they inspect a vast number of URLs daily.
The anti-virus companies take the opposite view - everything is suspect until they have proved it innocent - but those such as McAfee and Trend Micro who rely on a "rating server" seem oblivious to the fact that they would have to check every page on the web every day (at least) if their assessment is to be any use at all, while branding sites they haven't checked as "Suspicious" is as absurd as it is offensive.
Grisoft's approach, which at least checks the evidence before the verdict, seems more reliable on one level, but we know how easy it is to fool their LinkScanner, and the software is clearly deficient in other respects. Unfortunately they have introduced it as a free feature and most of their users will probably see it as a good thing, so the pressure will be on other AV vendors to do something similar to keep up.
Bandwidth, of course, is something that webmasters pay for, and statistics are something they rely on. The Grisoft approach wastes a colossal amount of bandwidth and skews statistics, and the McAfee/Trend approach will do the same if they ever get serious about crawling the web.
Then there is the issue of honesty - like many here I take a dim view of robots crawling my sites while masquerading as something else, and that is something all these services have in common. They may argue that they need to conceal their identity to do their job, but if I can identify them then so can every teenage scammer on the planet.
It seems to me that the only people in a position to accurately evaluate webpages are the search engines. Yahoo already have a tie-in with McAfee and how they exchange information is unclear, but if flagging google.com as a drive-by site is any indication then they are not doing it very well.
A move from Google in this area may well be imminent, if only because the practise of second-guessing their results will surely have a negative effect on their image if it becomes widespread - I seriously doubt that they want to become the "web police", but they may have no option.
| 11:51 pm on May 24, 2008 (gmt 0)|
|I am installing AVG 8 on one pc to test it out directly. And will post my results later. |
After doing a little testing with AVG. It will happily accept a 403 status code as long as it has html content sent with it, and show the nice green check mark next to the url. It will only go bonkers when it gets no content no mater the status code returned from my simple tests.
| 12:19 am on May 25, 2008 (gmt 0)|
|It will only go bonkers when it gets no content no mater the status code returned from my simple tests. |
Any clue what kind of mark it provides when the resulting request is a redirect back to their own website ;)
| 2:47 am on May 25, 2008 (gmt 0)|
|Any clue what kind of mark it provides when the resulting request is a redirect back to their own website... |
Oh, you are cruel.
I like it though! Kind of a DDOS attack by karma...
| 7:39 am on Jun 3, 2008 (gmt 0)|
Oh Dear I seem to be getting into the habit of jumping to conclusions and having to apologise afterwards. A couple of days ago I wrote:
"I installed the latest AVG free trial one one of my computers, searched for some terms that my site ranks highly for without clicking on any of the search results. Each time the pages that appeared in the serps came up in my logs, each time the UA was Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1. Therefore, yes, at least some of these spurious log entries we are getting are down to AVG."
I've now removed the bloatware from the computer in question and re-installed Norton. Just to check up I googled some of my search terms and clicked on my site; in the subsequent log entry the UA was just the same:,
compatible; MSIE 6.0; Windows NT 5.1; SV1.
The indication then is that AVG 8 is not leaving any signature.
The slightly mitigating fact is that when the site is preloaded it is the page only without the accompanying graphics, css etc so the bandwidth hit is a fraction of what a human visitor would cause. I am going to get round the problem by giving every page a small, unique css script with the same name as the page (ie a page called blue-widgets will refer to blue-widgets.css) and I will stop my stats programme from showing hits on .html pages. Not an ideal situation and a pain in the proverbial to set up but at least by checking the number of hits on the .css files I will be able to check up instantly how many human visitors I am getting since they will be the only ones tripping them.
| 9:16 am on Jun 4, 2008 (gmt 0)|
I have posted this on another thread which seemed relevant, please forgive the duplication.
I have spoken to a person called Adam at AVG technologies in the UK who tells me that he feels that the company's product is the lesser of two evils since he feels that the disruption to millions of webmaster's stats is justified by the extra safety the product gives to surfers. I have pointed out to him that with earlier versions at least it is possible to spoof the pre-fetch search but he commented that the product was still making the web a safer place to visit.
I have brought this thread to his notice so I look forward to hearing his comments here!
[edited by: incrediBILL at 10:07 am (utc) on June 4, 2008]
[edit reason] call to action removed - see tos #26 [/edit]
| 10:09 am on Jun 4, 2008 (gmt 0)|
|he feels that the company's product is the lesser of two evils since he feels that the disruption to millions of webmaster's stats is justified by the extra safety the product gives to surfers |
He's wrong because:
a) They created a DDOS attack on popular sites with lots of bookmarks and high rankings.
b) It's less secure because everyone and his brother now knows how to spoof it, where's the safety now?
| 11:43 am on Jun 4, 2008 (gmt 0)|
I don't know what position this person holds at AVG but I assume it is in public relations.
He should know that Grisoft bought a useless product and made it substantially worse.
His customers may feel safer, but they are being deluded - every script kiddie on the planet can fool this fabulous new "security tool" and get their payload pages marked as safe by AVG.
Meanwhile ordinary webmasters are seeing their statistics rendered useless and their bandwidth charges rocketing as this useless pre-fetcher rampages through their sites.
Grisoft may know all about Windows but they appear to know nothing about the web.
| 9:52 pm on Jun 7, 2008 (gmt 0)|
Addendum: this user-agent is used by Finjan Secure Browsing (from Fingan Ltd in Israel):
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 1.1.4322)
This pre-fetcher works the same as AVG and came from the the 82.166.163.xx range.
I would not recommend blocking or cloaking this one by user-agent.
| 12:54 pm on Jun 8, 2008 (gmt 0)|
Apologies, upon futher testing it appears that Finjan Secure Browsing cunningly uses a wide range of user-agents when pre-fetching your pages, and is obviously a highly sophisticated security tool.
Too bad they always use the same IP address...
| 2:49 pm on Jun 10, 2008 (gmt 0)|
I have had to query my adwords account because of I'm regularly being charged for more clicks than show in my logs. Today there are four unexplained clicks (it's a niche product so numbers of clicks aren't high); and one of the visitors via adwords had also 'pre-fetched' via AVG link checker into my site four times for a key phrase for which my site ranks 27th in the SERPs. It is possible that the visitor had preferences set to show more that that number of results but I have asked G to tell me just what these missing clicks are. I hope I get more help this time than the usual library answer ....