Forum Moderators: DixonJones

Message Too Old, No Replies

Short URLs prefetched and expanded.

Does this prefetch lead to fake pageview inflation?

         

g1smd

11:26 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do URL AutoExpander services (expanding previously shortened URLs) prefetch the page, and hence register as a pageview in the site logs/analytics?

I hope not.

Remember the AVG LinkScanner debacle? I would guess that there is a similar problem here if you have a lot of followers on Twitter and they all use this prefetch system to expand short URLs.

This isn't restricted to users on Twitter, and it applies to any site that has a redirect pointing at it via a URL shortening service, with the short URLs being posted to multiple other sites, or being posted to some sort of feed/stream that is read by a lot of people.

Your site stats could show that you got a lot more visitors than you actually genuinely received.

A service that just shows you what the expanded URL is, is likely to be safe.

However, recently I am seeing services that also tell you what the title of the page is. In that case I would guess that it has to prefetch it, to find out.

I'd like to think I was wrong.

Anyone done any testing with different services?

youfoundjake

12:38 am on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a site that I can test on, any particular shorters you want tested?

g1smd

1:15 am on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's not so much the short service itself, but the user's script/application that 'auto-expands' short URLs so they can see where they would go if they clicked the short.

There's a bucket load of those out there: many as browser extensions, GreaseMonkey scripts, and JavaScript bookmarklets, with others as stand-alone AIR, or Java, or other platform, Apps. There's no separate UA to detect in the request, so this isn't easy to detect like the AVG LinkScanner mess was.

It has the power to be just as devastating as the AVG problem though.

incrediBILL

3:26 am on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not only does the tiny url services prefetch the page, but their analytics are nonsense as you get about 10-20 hits instantly from all these twitter power tools and the more people adopt the power tools, the more hits you get from non-visitors.

For instance, power twitter follows all the tiny links on your page and displays the actual page title so if you have 100 power twitter users following you you'll always get 100 hits from the get-go, so you actually need to get an average baseline of the automated tools and deduct it from your stats for that link.

Total mess as the stats are totally useless for the most part.

g1smd

7:00 pm on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, how is this different to the mess caused by AVG LinkScanner?

And, why no widespread outrage with this? Surprised this thread has had so few replies...

Thanks to those that have replied so far though.

youfoundjake

7:53 pm on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would a workaround involve setting up a filter to prevent artificial inflation? Sure you lose track of the real visitor vs prefetch, but may help you keep right-sized?

incrediBILL

1:25 am on Apr 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, how is this different to the mess caused by AVG LinkScanner?

And, why no widespread outrage with this? Surprised this thread has had so few replies..

This differs because:

a) AVG was scanning everything in search results driving traffic on tons of sites in anticipation of a visit

b) Most often the use of the tinyurl's are by the request of the site owner, or someone trying to drive traffic to the page, it's not random whatsoever.

Sure your stats are still off, but your stats are bogus anyway, even if you use Google Analytics which these tools don't impact but screen shot tools do fool.

True clean analytics is a pipe dream in the current internet, you can get close but no cigar.

g1smd

8:45 am on Apr 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Re: b)

Yes, LinkScanner scanned all the results so you got a 'visit' even when the person didn't visit.

In the same way, if someone puts a short URL in a message sent on Twitter, all of the twitter clients receiving that message would expand the short URL thereby registering as a 'visit' even though the person didn't click the link and actually visit.

Obviously there's not many people using URL-expanders at the moment, but the minute these are added as core functionality this problem will become extreme.

Imagine someone like StephenFry sends such a link - within minutes you might register more than a quarter of a million visits. Could most servers cope with that load, and just how many of those were 'real' visits anyway?

cgrantski

11:16 am on Apr 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see that Power Twitter as mentioned as a specific instance. Any others? Have to have specifics in order to do any testing.

Another problem is Google's prefetch for the top search result's listing. Anyone analyzing Apache logs will be affected --- it doesn't look like IIS is, or tags. In order to see which hits in your logs are prefetched, you have to modify Apache's logging otherwise they are indistinguishable from human hits.

We analyze stats for some sites that consistently appear in the top slot AND have PPC ads running next to them. Anybody clicking on one of the PPC ads will show in the logs as having clicked on both listings. Anybody clicking on the prefetched natural ad will show as two hits to the same destination URL. Anybody not clicking at all will show as a hit and a visit.

Early indications are that about 3.5% of all our hits come from this Google prefetch. Not sure yet what the effects are on visits.

Anyway, as I said, it looks like Apache logs are affected and nothing else. I wonder if Google knew this when they started the prefetch feature.