Forum Moderators: open
There's only data from Awstats and Google Analytics to go by.
Awstats is showing lots of 404 errors generated by something asking for /google-analytics.com/ga.js' or for /foldername/google-analytics.com/ga.js' on the site.
It is very obvious that this is a bot that is reading through the GA Javascript and incorrectly extracting a URL from it:
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/java script'%3E%3C/script%3E"))
The closing single quote pretty much nails it. The site uses a base tag on every page.
Bots on the site in that 24 hour period include:
Gigabot/3.0_(http://www.gigablast.com/spider.html) as well as bots from Yahoo, Inktomi, and Live/MSN.
However, I note the newer ^User user-agent now also being used, and which I was not doing any special for, up until this morning.
Short answer: I don't know.
Longer answer: However, if this is a common bot, many other people who have GA JS code on their site and have section URLs that are folders and which end with a trailing "/" (and who also have a base tag in the page header, defining the page URL ending in "/" as the base), should also be seeing the same thing in their log files.
So, who else is seeing this?
So, who else is seeing this?
g1smd,
You ever see a dog chasing its own tail?
I have that same eery feeling as these issues develop ;)
I've been seeing the following for some days now:
"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)"
however, I've long had a denial in effect for this "begins with" and I'm not about to make any changes, regardless of whom uses the term. (same appiles for crawler or spider in other UA's).
Don
Is anyone else seeing lots of 404 errors generated by something asking for /google-analytics.com/ga.js' or for /foldername/google-analytics.com/ga.js' on their site.
If so, what is it that is requesting those URLs?
You ever see a dog chasing its own tail?
Nice analogy.
Purely a guess, but it looks like AVG is trying to fix one of the many LinkScanner problems and succeeding only in breaking something else - pretty much par for the course, I'd say.
who else is seeing this?
I can only say that I am seeing far fewer hits from LinkScanner lately - apparently because AVG users dislike it even more than we do and are uninstalling it as fast as they can find out how.
...
Is anyone else seeing lots of 404 errors generated by something asking for /google-analytics.com/ga.js' or for /foldername/google-analytics.com/ga.js' on their site.
I've been making mention of this in various topics here and there. It just goes in one and out the other. :)
If so, what is it that is requesting those URLs?
I've had my lead programmer on the horn multiple times investigating these failed GA calls. I think they only occur with the new GA code, I can't verify that as I use only the new code now.
There are two (02) JS calls for the GA scripts. The second one is failing at times and causing a 404. When that happens, the GA code gets appeneded to the URI of the page where the GA code failed. If I remember correctly, before the new GA code, it would cause page load delay. So, with the new GA code came some sort of timeout feature that causes the second script to fail if it takes too long to load. Does that make sense?
During our research, we came across a Chinese translation service that had their "translated" pages indexed. One of our sites was strategically framed and they were doing some real funky stuff with URI appendage behind the scenes. Low and behold, our GA script is sitting there too. I have to wonder what that does to analytics. I don't dig that deep, yet!
But, back to the topic at hand. Here are just a few of the UAs on these failed GA calls...
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MSDigitalLocker; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; InfoPath.1) Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322) Opera/9.50 (Windows NT 5.1; U; en) There is no consistency in the UAs. Is that the AVG Scanner doing all of that?
OT, we are getting ready to do what a few others are doing around here. For our US based clients who ship only with the US, most everything outside is going to get blocked from accessing the site. We're freakin' sick of it. We're tired of finding our long hours of work regurgitated on an MFA website somewhere. That's it! If you are outside the US in one of our blocked countries, send us an email and we'll put you on the whitelist!
I was assuming that it was a bot. I have only seen these errors start in the last 48 hours, and they are on a site that has had full content for several months and a "coming soon" page for many months before that.
The site has both Awstats and Google Analytics, and no access to the raw log files. The error is showing only in Awstats. The error is caused by the agent parsing the Google Analytics Javascript code (parsing it, not running it) on the page, and incorrectly extracting a duff URL out of it.
OK on the variety of user-agents. Nothing concrete there to go on.
I have catered for it. The requests get redirected to somewhere else, with a message in an appended query string that spells out what has happened.
As for copied pages skewing the data, I use domain filtering in Google Analytics to show direct accesses to the site in one profile, and accesses via all other sites in another profile. That alternative profile therefore shows people viewing the site in the Google, Yahoo, Ask, or Live cache, or via any other site that has copied the content.
What turned people on to this being AVG LinkScanner?
I merely suggested you might be having the same problem as the post I linked to above:
I have been having a problem with what I call search bots not finding a file on my website
The post was about repeated 404s for a JavaScript file and it identified a LinkScanner UA.
As you have no access to the logs it's hard to be certain, but pageoneresults seems to have a better handle on your problem, and as you have catered for it all is presumably well again.
...
Interesting, but I am wondering why this has (so far) only happened for index pages ("/" or "/folder/") on this site. There is one such error for each folder and sub-folder on the site, including the root.
However, if you use the normal site navigation you will never hit a redirect when clicking anything internal to the site.
There are only a very few rewrites, and none of those are involved with any of the URLs mentioned here in this thread.
Is this a fail safe feature to prevent page load delay? If it is, what is causing the timeouts. If I look at the small percentage of these, I'd chalk it up to a "given" in the third party calls. Its not "always" going to connect but it does most of the time. On one high traffic site, those 404s represent a small percentage.
Is it also possible that the these particular visitors have some sort of security setting (AVG) that is causing this? Its confusing for me because I don't see anything remotely related to AVG with our challenges. At first I thought we had a rewrite issue but we don't. Its that second script appending itself to the URI of the page where the script failed...
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script> Is there something wrong with that GA script that would cause these types of 404 appendages? So, if part of it fails, why does the google-analytics.com/ga.js get appended to the page? I'm wondering if that is some sort of indicator that the script failed and Google is letting us know? I'm just guessing now. I'd like to solve it. That is one more statistic I'd like to have my hands on too!
again, there is nothing wrong with that code as i can see. if you are stil talking about the ga.js 404 error, it's either a bot that tries to parse it incorrectly and appends it to your current path, or just the browser executing the 2nd js before the 1st js is finished loading.
Not the ideal solution, but I don't get 404s anymore.
Welll I'lll beee! < Remember Gomer Pyle?
I wonder why Google went through all the trouble splitting that URI into sections like it does with the new GA script. Any JS gurus around who can explain the pros and cons of what Timmay has done? I'd like to know because I'd like to eliminate those 404s and if that one liner does it, I'm off to make some changes this afternoon.