homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 78 message thread spans 3 pages: 78 ( [1] 2 3 > >     
Zombie Traffic from Google - Analysis part 2
frankleeceo




msg:4528546
 12:18 am on Dec 16, 2012 (gmt 0)

< continued from [webmasterworld.com...] >

based on your observations, does the zombie traffic hit the same pages over and over? or do they rotate within the site?

Google can and will toss different kind of foreign traffic against the sites for testing purpose. And I think based on user engagement and their social signals (or other signals), sites will have a higher affiliation toward certain countries, regions, or even their language settings over time.

My informational niche site has had regional traffic shifts in the past months, where I would lose traffic in one country but gain in another. Regional traffic may be at play too. Although it could be the searching trend of the underlying countries. But in a nutshell, all the traffic I lost in asia I gained the exact amount back in americas traffic.

[edited by: tedster at 3:47 pm (utc) on Dec 17, 2012]

 

xcoder




msg:4528574
 1:32 am on Dec 16, 2012 (gmt 0)

^ So i think we can safely assume we have one definite conclusion. Google are actively managing the amount of free organic traffic to websites. This major issue is consistently being mentioned in many posts lately. I think most would agree this issue is a given.

from what i see (and i manage approx 15 websites for myself and for some clients) it is like a site now has a finite number of "free" hits it can get per day. A quota.

Google may often switch the free traffic streams foreign/mobile/buyintent/info etc. BUT the max total daily numbers remain constant (+/-).

This alone should beg a major explanation by Google and be added immediately into any ongoing anti trust case... it is a clear case of non disclosure and market position abuse...

"testing" = laughable excuse

TheMadScientist




msg:4528579
 2:05 am on Dec 16, 2012 (gmt 0)

Google are actively managing the amount of free organic traffic to websites. This major issue is consistently being mentioned in many posts lately. I think most would agree this issue is a given.

It's too bad if you're one of the people who think that, because you'll completely miss what it really is ... You all have fun with your conspiracy theories, because it's something else entirely and it would be so easy for them to refute a cap on traffic accusation it's not even funny.

/ThreadForMe

frankleeceo




msg:4528596
 2:32 am on Dec 16, 2012 (gmt 0)

What is it really is in your mind?

I do think Google is actively managing the traffic, but not to any individual sites as a conspiracy. It is not in their best interest to do so. It is more managing the internet as a whole. It's not really a cap per se, but more or less, the internet traffic is limited. One site has to lose traffic for another to gain. A more worthy site gains more visibility and "steal" traffic from another site. Or can lose traffic to another site.

Isn't that an active role of an index? To ensure that visitors are happy by visiting the sites in its index. Thus they would have to monitor the visitors flow before and after searches. With that, they can make active adjustments based on groups of sites.

xcoder




msg:4528598
 2:35 am on Dec 16, 2012 (gmt 0)

It's too bad if you're one of the people who think that, because you'll completely miss what it really is ...


It is very hard to miss what it really is... the cap proof is right there in daily logs of too many websites.


I do think Google is actively managing the traffic, but not to any individual sites as a conspiracy. It is not in their best interest to do so. It is more managing the internet as a whole. It's not really a cap per se, but more or less, the internet traffic is limited. One site has to lose traffic for another to gain. A more worthy site gains more visibility and "steal" traffic from another site. Or can lose traffic to another site.


Just look at your webmaster tools control panel and all the information it present. Don't know about you but to me it looks like a natural Adwords control panel extension. What was it again about them not managing the traffic to any *individual* sites?

backdraft7




msg:4528600
 2:48 am on Dec 16, 2012 (gmt 0)

My thought is this. Has Google outsourced quality testers?


God save us if they did, but I wouldn't doubt it, remember?, it's all about the money.
It's bad enough that google is such a poor judge of quality sites, now we get to have our US sites judged by a completely foreign society and their associated mores.

BTW - Anybody notice the web itself acting strangely today? Usually I get a downpour of spam every day, but hardly a drop today. Traffic is very NON interactive.

taberstruths




msg:4528602
 2:59 am on Dec 16, 2012 (gmt 0)

@frankleeceo From what I have observed, they hit a group of pages for a couple of days, then it switches. That is why I thought it might be quality raters from desperate 3rd world countries that works for pennies.

diberry




msg:4528636
 6:48 am on Dec 16, 2012 (gmt 0)

I do think Google is actively managing the traffic, but not to any individual sites as a conspiracy. It is not in their best interest to do so. It is more managing the internet as a whole. It's not really a cap per se, but more or less, the internet traffic is limited. One site has to lose traffic for another to gain. A more worthy site gains more visibility and "steal" traffic from another site. Or can lose traffic to another site.


This is something I've been feeling intuitively but failing to put into words. For the first time in human experience, there is actually more information available than our brains could hope to handle. Google is the #1 gatekeeper of that info for the world. They've got to be keenly aware of this massive power/responsibility. Years ago, they could index probably the whole darn web, but now? What do you think Panda was about? Clamping down on repetition of information, which is overwhelming them. I think it's literally just not going to be possible in a few years to present the internet to people the way search has always done. It just won't be sufficient. A lot of changes they're making and things they're testing make sense in this light - they're trying to present the BEST sites, not the whole internet. They're shifting in that direction.

So yeah, they're managing traffic, but not necessarily by choice at all. It's just the only way to deal with this huge avalanche of information that's growing every minute.

MikeNoLastName




msg:4528650
 10:12 am on Dec 16, 2012 (gmt 0)

> For the first time in human experience, there is actually more information available than our brains could hope to handle.<

Ooh, I argee, are we actually hitting the "technological singularity" (wiki it) sooner than expected? I can't wait, I've been waiting for this for a long time. Maybe we'll hit it on 12/21/12 ;).

To me the big question is... What would happen if we just delete the pages which are getting lots of impressions and clicks, but making relatively no money, in the hopes that we'll get more imp/clicks to the pages getting fewer imp/clicks and making more money?

claaarky




msg:4528676
 1:45 pm on Dec 16, 2012 (gmt 0)

A few people have touched on things that strike a chord with me.

Several sections of my ecommerce site don't convert well, however they rank really well and in fact generate the majority of my traffic. I've noticed that when traffic goes up its mostly increasing for these pages so conversion rate for the whole site appears worse, but it's just because more traffic is going to pages that don't convert well.

Then there's mobile. My site converts much worse for smartphones and at weekends in particular (and some periods during the week, such as lunchtimes and evenings) my traffic is mostly smartphone users. It's the same traffic but people are out and about instead of at their desktops.

If traffic rises at a weekend it's a double whammy for my conversion rate. More traffic going to areas that don't convert well AND more of that traffic is coming from smartphones which converts worse.

These low converting, high traffic pages have amazing click through rates from google, much better than the rest of the site, so I think, when more people are searching, I see traffic go up more on these pages because they are much better at enticing visitors in than my better converting pages.

This combination of factors is what I believe create the zombie effect, for my site at least.

I also think this is something sites demoted by Panda (like mine) are likely to notice more (possibly exclusively). I think the two issues are related.

Awarn




msg:4528687
 2:32 pm on Dec 16, 2012 (gmt 0)

^
Ok somewhat similar to my thoughts. Now consider this, a high percentage of sites have Not addressed the mobile side and they appeared to not get hit by Panda or Penguin at least in my niche. The user on a site that has not been updated for mobile traffic will spend a lot of time on the page trying to find the menu, expand things to even see etc. That equals interest in Googles eyes. Interest equals relevence so look what happens. The crappy sites rise because they did nothing because of mobile technology google is using is flawed. Then the sites with a mobile version keep getting google hitting them with bots trying to perfect their system while the crappy sites they ignore.

iamlost




msg:4528738
 7:53 pm on Dec 16, 2012 (gmt 0)

I've been following this topic not so much because I believe that there is site specific targeting by Google rather that I'm looking to see if the described affects are symptomatic of something else ala the 'sandbox effect'. I am seeing interesting points mentioned:
* switching from mostly desktop traffic to mostly mobile and back again.
* switching from mostly desired target traffic, i.e. US, to mostly undesired, i.e. Asian, and back again.
* switching from highly purchasing traffic to highly informational traffic and back again.
* switching from 'money' query terms traffic to 'non-money' and back again.
* switching from traffic mostly landing on product sales pages to landing on info pages and back again.
I don't have an ecomm site so I am primarily relying on parsing what others have shared.

There does appear to be overlap in the above list and yet sufficient distinct elements to remain separate. And the pattern/frequency of such switches is intriguing.

If a site is strictly a sales outlet then traffic in any pre-buying mode is certainly a miss on Google's part. However, if the site also provides pre-buying, informational pages then lumping the two types of traffic on the two types of pages into one conversion metric is a miss on the webdev's part. Similarly, if there are no micro-conversions on the info pages, i.e. the only conversion is the 'big' sale on the sale pages, then that too is a miss on the webdev's part. I'd be interested to know if there is any reason, if your site is simply come, buy, and go, for Google to believe that traffic in research mode would find value there.

Similarly, if a site is quite specific in it's area of business, i.e. selling only within the continental US, and Google sends significant traffic from well outside that market area then again certainly a miss on Google's part. I would be interested in knowing if there is any (other than off-shored Google quality testers :)) reason that Google might pickup that would indicate otherwise.

When the traffic switches... how do the associated query terms change? What connections or disconnections do you see between them. Are both sets of terms appropriate for the pages involved?

Etc. Far more questions than answers I'm afraid.

One thing that I have been wondering for a number of years that may well be part of this puzzle:
* we know that when one inputs a query into Google if there are few or even no full matches that the query return is often for some 'similar' (in spelling or meaning) term and not that asked; or with results for various partials from the query. Google appears to prefer padding results to having to say oops, can't help.

* does Google treat traffic in a similar fashion? Is what has been described in part due to Google simply running out of quality 'A' matching traffic and substituting quality 'B'?

* we also know that Google has it's 'main' site list and some number of supplemental site lists. Is it possible that when Google has a query that really doesn't match well with it's main index list it still returns from that list rather than dip into some other 'lower' list?

Must say that it has been a fascinating read to date and wish to thank all, especially those that have shared data.

tedster




msg:4528761
 9:52 pm on Dec 16, 2012 (gmt 0)

  • switching from mostly desktop traffic to mostly mobile and back again.
  • switching from mostly desired target traffic, i.e. US, to mostly undesired, i.e. Asian, and back again.
  • switching from highly purchasing traffic to highly informational traffic and back again.
  • switching from 'money' query terms traffic to 'non-money' and back again.
  • switching from traffic mostly landing on product sales pages to landing on info pages and back again.


  • A great summary of the phenomenon, iamlost. I'd say that's exactly what we're looking for, especially if the cycles are frequent and the total traffic seems to stay constant even though there are shifts in "type of traffic".

    I do think that server log analysis might be a better tool than out-of-the-box analytics to do this type of work. JavaScript could work too, but you'd probably need to customize what data the script actually collects.

    xcoder




    msg:4528764
     10:10 pm on Dec 16, 2012 (gmt 0)

    By the way, back to my message #4526727 in this thread about catching MSN bot masquerading (and executing JavaScript) like a normal browser. I've seen it appearing again on a number of occesions.

    Note how the word "bot" is not mentioned in the user agent string. This bot probably affecting many website logs. Look for it. Your tracking software will most likely report it as a normal viewer. Reverse DNS is the only way to detect it... so much for "old fashion" server logs. You easily notice/pick those things only in real time viewers analysis using customized scripts.

    65.52.108.222 [msnbot-65-52-108-222.search.msn.com]--[Microsoft Internet Explorer]--[Screen Size: 800x600]--[Color Depth: 16 colors]
    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607)

    xcoder




    msg:4528778
     11:04 pm on Dec 16, 2012 (gmt 0)

    @tedster
    JavaScript could work too


    Are you serious?.... statcounter and GA are javascript based and collect most of the information via their JS code. Not only it "can work too", it is by far the best way of collecting user agent information (including installed plug ins etc.) and interacting with the viewer.

    I am really surprised to hear such statements from people "managing 100's of websites".

    TheMadScientist




    msg:4528781
     11:33 pm on Dec 16, 2012 (gmt 0)

    65.52.108.222 [msnbot-65-52-108-222.search.msn.com]--[Microsoft Internet Explorer]--[Screen Size: 800x600]--[Color Depth: 16 colors]
    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607)

    Hack the PHP (or whatever server-side script you're running) to capture the X-Forwarded-For sent by the browser when the JS sends the other info ... I'd guess you'll find it's not MSN Bot at all, but rather the cache or preview and you'll get the real IP of the visitor in the X-Forwarded-For.

    xcoder




    msg:4528789
     11:49 pm on Dec 16, 2012 (gmt 0)

    but rather the cache or preview


    I know a "cache preview" when i see one. 65.52.108.222 was browsing a number of pages (i get the page urls and page titles sent from the user browser in real time). Also check out the user screen size. Must be still using an old CRT monitor the poor guy (with 16 colors). NOT.

    Its an MSN bot masquerading as joe public.

    tedster




    msg:4528798
     12:43 am on Dec 17, 2012 (gmt 0)

    @xcoder, I almost always work with both a JavaScript analytics tool AND direct server log analysis. For working with raw logs, I use a form of grep plus MACH5's FastStats - which is kind of like grep on steroids. A great text editor that can work with gigabyte-sized files is also essential. Different kinds of tools often show wide disagreement, but if a user-agent request hits the server logs, then you know you got that traffic for sure, whatever it is.

    With complex questions, especially time-based questions, I often prefer to dig into the raw logs, because I often come up with questions I can't answer using JavaScript tools. Maybe I'm old fashioned because I've been using raw logs since 1995 or so. And maybe it's because the sites I work with have such a variety of Analytics that it's hard to master all of them. I wouldn't know how to dig into these questions very well using Adobe Site Catalyst, for example.

    No two Analytics tools give the same results, from what I've seen when my clients make a transition. Whatever tools people use, I wanted to focus readers on those key questions - and then mention that the kind of Analytics tools you use can be a factor in the answers you find, too.

    xcoder




    msg:4528824
     2:10 am on Dec 17, 2012 (gmt 0)

    @tedstar

    Today's bots act like browsers (chrome included) and often send spoofed headers. This in my opinion renders raw logs files useless (except for providing quantified daily "hits").

    Raw logs is a thing of the past if its accurate up to date information is what you are after...

    Check your raw logs for 65.52.108.222 ... with hundreds of websites you must have seen this thing too more then once but i bet you probably missed it.

    [edited by: xcoder at 3:12 am (utc) on Dec 17, 2012]

    TheMadScientist




    msg:4528830
     2:22 am on Dec 17, 2012 (gmt 0)

    [webmasterworld.com...]

    I thought it was a bot too, until I got the X-Forwarded-For and overrode the storage of the downstream IP Address from the jQuery. That's why you won't find it in your server logs either. The server logs the X-Forwarded-For IP rather than the downstream IP making the request on behalf of the [non]visitor.

    xcoder




    msg:4528843
     2:58 am on Dec 17, 2012 (gmt 0)

    @ TheMadScientist

    I don't use jQuery to get the ips. I get the ip address directly from the browser using a php script which then does the reverse DNS on the spot.

    However... i will follow your advice and add an X-Forwarded-For check to my script just for the fun of it all. And hope to catch this critter in action soon again. Will report as soon as i do.

    Side note:
    I strongly doubt there are still too many viewers out there using 800x600 monitors with a 16 bit color settings screens but we must be scientific and double check to get to the bottom of this all.

    [edited by: xcoder at 3:17 am (utc) on Dec 17, 2012]

    xcoder




    msg:4528844
     3:03 am on Dec 17, 2012 (gmt 0)

    That's why you won't find it in your server logs either


    Exactly my point about old fashion server log files. They no longer provide the whole picture. As far as i know non of them check the X-Forwarded-For (but i might be wrong). What you would normally see on the logs (for my example) is a normal browser and not necessarily *marked* as a bot. One more zombie if you'd like...

    TheMadScientist




    msg:4528845
     3:14 am on Dec 17, 2012 (gmt 0)

    However... i will follow your advise and add an X-Forwarded-For check to my script just for the fun of it all. And hope to catch this critter in action soon again. Will report as soon as i do.

    Cool, thanks ... I want to know what's going on too, because without knowing, there's no way of finding a solution since you can't even identify the issue.

    Side note:
    I strongly doubt there are still too many viewers out there using 800x600 monitors with a 16 bit color settings screens but we must be scientific and double check to get to the bottom of this all.

    I know exactly what you mean about the monitor and if I need to because people are actually trying to solve this, I'll 'dig through history' and find exactly what I saw. I don't have 'a couple' of instances, I have a bunch in a database (I think) and it's detailed (like everything you can capture from JS), because the script is actually a browser sniff ;)

    I almost want to say the monitor you're seeing is a default if for some reason that info isn't found, but I don't remember for sure off the top of my head, so I'd have to dig through 'ancient history' to find the info I have (if I still have it lol ... I'm pretty sure I do, somewhere, but not 100% on that).

    tedster




    msg:4528854
     4:00 am on Dec 17, 2012 (gmt 0)

    Hoping to clarify things a bit. Is the idea here that:

    1. some bots are first doing an automated Google Search
    2. then requesting the links that Google publishes
    3. and that means including query terms in their referrer
    4. and they are doing it for hours at a time

    If that's the case, how might the total Google Search traffic end up being almost constant? Even if they are just storing the search result links once (so Google doesn't block them) it seems to me that on-and-off bot traffic would also make for up-and-down totals

    TheMadScientist




    msg:4528862
     4:29 am on Dec 17, 2012 (gmt 0)

    That could be one explanation, but I'm not convinced they're bots, but could rather be detected as bots without detailed investigation, because for the previews they would be gathered 'on load' (most likely) so when a user types in a query and they visit the results page, the page on the site 'visited by the zombie' is requested 'down stream' by Google (or Bing), but the info sent/recorded via X-Forwarded-For is from the visitor's browser, so in the server logs (and even JS without IP Address info present) it could look like a 'zombie visitor', but with live JS it would look like a bot from Google (or Bing) and without a really detailed comparison of 'real time JS data' and server logging it would be very confusing an difficult to identify.

    The way I 'caught' what I thought was a bot from Bing/M$ was by tracking page views via JS (jQuery) and actual page access via PHP simultaneously. In one instance (PHP page opening recording), I recorded the X-Forwarded-For and the other (jQuery) I didn't force an X-Forwarded-For override on the server-side storage script, so it wasn't until I really 'dug' into both I realized there was an X-Forwarded-For / jQuery IP Address difference. Then I realized the requests weren't by a bot at all, but rather by the preview generation that was requested/generated whenever someone searched on the term(s) while Bing's previews were on visited a page with a result from the site on it...

    It wasn't 'simple' to detect/identify what was going on by any stretch and might actually be beyond what most people would look for or are capable of even self-coding and identifying via script, because you have to not only know what to look for, but 'store and compare' information via multiple methods to even really see it and I'm not sure most people know what to do or how to do it, and the comparison I'm talking about is not something available in 'off the shelf' scripts I've seen.

    tedster




    msg:4528878
     5:43 am on Dec 17, 2012 (gmt 0)

    Thanks much for that. I would not have thought of it except for posts, and it sounds like a very solid idea to look for.

    TheMadScientist




    msg:4528882
     6:03 am on Dec 17, 2012 (gmt 0)

    Not a problem tedster ... I understand exactly why you went where you did and how you drew your conclusions, because I was there at one time and it's Very Easy to miss (I did, completely for a while), unless you have multiple Custom datasets and review them carefully to find out why one dataset indicates one event, yet another indicates something totally different.

    It took me a while to find the time and make the effort to really 'dig into' what was actually going on, but when I did I drew a totally different conclusion than what I expected or thought initially ... It was actually a real 'eye-opener' when I figured out what it was.

    bluntforce




    msg:4528886
     6:36 am on Dec 17, 2012 (gmt 0)

    Today's bots act like browsers (chrome included) and often send spoofed headers. This in my opinion renders raw logs files useless (except for providing quantified daily "hits").


    I'd guess that many people who use log files assume that various things may be spoofed. It's why a person might look at user interaction and how that interaction transpires.

    Your "hits" comment I find interesting. Do you really want to stand behind the opinion that people looking at server logs are just counting hits?

    All of it is really moot at this point, there is no serious analysis to be done because the general public has other concerns and are following seasonal patterns. Second or third week in January other patterns could be discussed, doing it now is just watching your words on a screen, analysis doesn't make sense unless your business is Holiday specific.

    xcoder




    msg:4528890
     7:50 am on Dec 17, 2012 (gmt 0)

    Your "hits" comment I find interesting. Do you really want to stand behind the opinion that people looking at server logs are just counting hits?


    I think I've made it crystal clear throughout this thread. Raw log files are useless due to spoofed headers. Individual user behavior can be easily missed while going over mountains of quantified daily/weekly/monthly data.

    Real time tracking (in addition to server logs) is the only way to go nowadays... good luck pin pointing the zombie problem using your server logs alone...

    It is only via real time tracking that i discovered that i often get bursts of smartphones users throughout certain times of the day or the fact that some bots masquerade as browsers.

    xcoder




    msg:4528893
     8:18 am on Dec 17, 2012 (gmt 0)

    Yahoo bot fully executing Javascript coughing up referrer, screen size, color depth, current urls and page titles (omitted for obvious reasons).
    -------------------------------------------------------

    98.137.207.209 [h081.hlfs.bf1.yahoo.com]----[Netscape]----[Screen Size: 1024x768]----[Color Depth: 16 colors]
    Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

    This 78 message thread spans 3 pages: 78 ( [1] 2 3 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved