Welcome to WebmasterWorld Guest from 3.228.21.186

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Mountain View stats oddity - Any idea why this happens?

     
1:55 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Until 2 months ago, had a site widgetsuk.net which used to do OK. But since the turn of the year it has gradually slipped out of Google's rankings and every day started receiving a consistent number of (multiple) visits from Google (Mountain View) every day.

The odd thing was that these were visiting decorated URLs that never existed. EG: www.widgetsuk.com/catalogue.php?shop=other&product=shoes. That is simply an example: every URL was different and the parameters are often illogical. I never sold shoes and my site doesn't use decorated URLs: it did about 2 or 3 years ago for a while - but nothing like the ones that Google is using!

I constantly checked WMT for clues: inbound links, 404 and Soft 404 errors, error messages etc. I only ever saw 404's and I tidied these up. I did backlink checks and never found any links to my site remotely resembling what Google came looking for: nor indeed any scrapers or dodgy links at all for that matter. i never saw any negative messages from Google in WMT either.

I put canonical tags in every page with the simple URL (ie: www.widgetsuk.net/catalogue.php) so the bots woukd know decorated URLs were not relevant. 3 months later... no change. Daily visits if anything had increased, rankings had got worse to the point where Google was now my only visitor. 14 times a day (very consistently - was always 13, 14 or 15 visits, every day).

So I gave in: I developed a new site from the ground up, moved it to widgetsuk.com and redirected the 6 year old .net site to the homepage of new site. I removed the old .net site totally. That was two months ago.

The new site is now in Google (as is the old one still) although it's rankings remain very low. But still, they come: every day I still get the 14 visits from Google Mountain View (all to the .com index page now obviously). WMT 404s are clean & tidy but Mountain View remains pretty much my only customer.

Does anyone have any clue as to what might be wrong or, failing that, how I can get them to buy something ;)

Many thanks,

Simsi.
3:15 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 12, 2004
posts:660
votes: 14


I think they're checking if your website generates pages/content based on URL.
7:01 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Are these visits coming from exactly the same IPs that Googlebot uses? If not, you might try blocking them in .htaccess.

It's possible that some hacker was playing around with a program to probe your site for vulnerabilities, but went on to other things and forgot to it off.
7:33 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Hmm that's an interesting thought... I do get those idiots stopping by from time to time. A search for one of the regular IPs results in:

" Location: Mountain View, United States - 64.233.172.34 is a static assigned Corporate IP address allocated to Google."

Is that conclusive?
7:50 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Simsi --
There have been cases where some incompetent or novice hacker messed up and created something that went awry.

There are some real experts here at WebmasterWorld who know a lot more about this type of thing than I do. You can find them at [webmasterworld.com ] . So you might want to ask there.
7:51 pm on Oct 7, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Thanks aristotle...will have a look.
4:33 pm on Oct 9, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


OK I've been told this is a genuine Googlebot IP so I am back to Square 1 if anyone has any other ideas?
8:03 pm on Oct 10, 2014 (gmt 0)

Full Member

10+ Year Member

joined:Feb 1, 2006
posts:271
votes: 2


Monsieur, any news on this?
8:50 pm on Oct 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Simsi
Do the requests have the normal googlebot user agent string?

Why don't you show us the full log entry for one of the visits. Maybe someone here can spot something.
9:24 pm on Oct 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Here's an example of what I see Aristotle:


Search Referral: www.google.co.uk/#4 (Keywords Unavailable)
Host Name: google-proxy-66-102-6-117.google.com
Browser: IE 8.0
IP Address: 66.102.6.117
Operating System: Win7
Location: Mountain View, California, United States
Resolution: 1024x768
Returning Visits: 0
Javascript: Enabled
Visit Length: Multiple visits spread over more than one day
ISP:Google


To add, if this helps in any way:

When you search for the site itself ("widgetsuk.com") it appears with sitelinks as the first result. When you search for the <TITLE> tag exactly, it is nowhere. To re-iterate, WMT seems as clean as a whistle (no messages, no 404's no obvious dirty backlinks).
9:29 pm on Oct 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


I just had another idea. Maybe somebody created a sitemap for another website, but made a typo that changed everything to your domain. That would explain where googlebot is finding the URLs
9:42 pm on Oct 10, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


What aristotle says could possibly be one reason.

Another could be an issue simmilar to what is being discussed in this thread WMT reporting 404s from another site [webmasterworld.com]
9:47 pm on Oct 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Search Referral: www.google.co.uk/#4 (Keywords Unavailable)
Host Name: google-proxy-66-102-6-117.google.com
Browser: IE 8.0
IP Address: 66.102.6.117
Operating System: Win7
Location: Mountain View, California, United States
Resolution: 1024x768
Returning Visits: 0
Javascript: Enabled
Visit Length: Multiple visits spread over more than one day
ISP:Google

I must have mis-understood the problem. I thought we were talking about googlebot requests
10:11 pm on Oct 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:July 19, 2013
posts:1097
votes: 0


...how I can get them to buy something ;)

Put up a splash screen with an ad for free Googleade with any purchase :)
10:39 pm on Oct 10, 2014 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


OK boys and girls, it's time to learn about spiders 101.

google-proxy-66-102-6-117.google.com


Many people allow anything from Google that comes from a GOOGLE IP or the reverse DNS has google.com.

That's why someone is using this Google proxy, which by the way is open for ANYONE TO USE, not just Google, to scrape the crap out of your site.

They'll also scrape via Google translate, the cache pages which you can't see them scrape (use NOARCHIVE on ALL pages see [noarchive.net)...] and many other services that provide indirect access to your site.

This is probably some Google app someone made or a web proxy, either way, I'd block if it were my site unless it claims to be Google bot and properly validates via full trip rDNS.

FYI - People use the Google proxy as shown above to join WebmasterWorld and attempt to hide their true origin when they spam our forums which is mostly why I'm familiar with those IPs.

Browser: IE 8.0


If that's the actual user agent string it's bogus as all hell but I suspect you're copying that from some stats program, not raw log data which you really need to figure out what's going on.
9:29 am on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Ok this is interesting. Thanks Aristotle and incrediBILL.

So I looked at my raw logs. A couple of oddities. Firstly, the IP being reported in my stats program isn't present but one very similar is, coming in 2 minutes later and from the same referrer (in the case of this example, google.ie)

I can't go back far enough for the example I used above so here is one from today:


Here is the Raw Log:

66.249.83.42 - - [11/Oct/2014:01:23:41 +0000] "GET / HTTP/1.1" 200 32069 "http://www.google.ie/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCMQFjAF&url=http%3A%2F%2Fwww.widgetsuk.net%2Fsomedir%2Fsomescript.php%3Fa%3Dmonp%2520gme&ei=n4Y4VISoAY6GsgeGew&usg=AFQjCNHQbyx_EB0vLOynrAyxcQW1C5ultQ" "Mozilla/5.0 (Linux; Android 4.3; en-us; SAMSUNG SCH-I545 Build/JSS15J) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/28.0.1500.94 Mobile Safari/537.36"


Does that shed any light?

I have been trying to find an IP that my stats program is reporting in the raw logs but none of them appear in both: there are IPs from the same block 66.249.***.*** in both but that's as far as it goes.

I have also checked Google for scraped content but none of the phrases from my site that I search for appear on other websites that I can see except for *a small number* of description META tags which crop up on maybe 5 or 6 other sites...does that matter?
10:58 am on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


I don't know enough to say much about that log entry, but let me ask you:

1. From your original post, I thought we were talking about pages that don't exist. But that log entry shows that a page was served.

2. Are there any images on that page, and if so, were they fetched?

3. Did it ask for a favicon?

4. Are there any scripts to be executed?
2:14 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


In answer Aristotle:

1) The page served in this instance was a page that existed several moons ago. These bots are also requesting old deleted pages with parameters that don't make any sense.

2) Images: Yes (and css + js)

3) Favicon: Yes

4) Not quite sure what you mean but if you mean is there any JS executed onload then because the page URL is very old and long since deleted I no longer have a copy to check.
2:45 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Simsi --
I don't understand what's going on here and am stumbling in the dark. But it certainly looks like that page is still on the server and was successfully fetched. The only clue I see to look for it is that its size is 32069 bytes.

The fetching of images, favicons etc is usually an indication of a real human visitor, but there are also some suspicious things like the use of a proxy.
5:51 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


Hi aristotle

I think the 200 response might be because I took the old dead/weird URLs and redirected them to the new homepage (via htaccess) and put in a canonical.

I think maybe I should 410 the old pages instead - traffic hs dies so it can't be any worse than it is already. Would you agree?
6:33 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


You're probably right about the re-direct in this case. But what about all those URLs that you mentioned in the first post which never existed at all. And what about all these requests that are coming through Google Mountain View proxies? i think you should try to get a better understanding of what's happening before you make any changes. As I mentioned before, there are some real experts here at WebmasterWorld who might be able to help you figure things out and suggest solutions.
7:16 pm on Oct 11, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4504
votes: 347


Redirecting old/gone pages to the homepage is not a good idea, it is what Google calls 'Soft 404' and it is worse for your site than real 404s (which cause no damage at all). To benefit from your real 404s, create a custom 404 page and have navigation links to parts of your site that you think might be useful. That way instead of losing a visitor and making Google cry, you might help someone find your newer content.
7:35 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


not2easy -- Thanks for that explanation. I hadn't realized that Google might penalize a site for doing that. What about webmasters who don't know anything about that possibility (such as Simsi) and did a lot of redirects to the home page during a site re-design? It seems unfair to me that they would be penalized.
7:57 pm on Oct 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2005
posts: 716
votes: 0


ut what about all those URLs that you mentioned in the first post which never existed at all. And what about all these requests that are coming through Google Mountain View proxies?


Yes - both still sources of confusion and sorry - I have sidetracked this a little with the rankings issue which may or may not be connected.


@not2easy - thanks for the input there. More confusion lol...never quite been sure what the best course of action was. Those are actually being redirected from the now defunct .net site as per my opening post. The whole site is redirected in fact.

That said, I analysed the Soft 404 errors via WMT for the .com - there were only 3 logged so I 410'ed the URLs in htaccess. Only 3, despite lots of these odd Mountain View referrals.