Forum Moderators: open
An example line:
66.249.65.236 - - [28/Sep/2004:15:48:24 +0100] "GET /tps_page.html HTTP/1.1" 404 335 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
The IP does appear to belong to Google, but whereas googlebot normally spreads it's load across several bot machines, these are all coming from one IP address.
Also they are requesting docs which do not and have never existed on my site. They do appear to be docs which exist on sites I link to. It's as though the bot has followed links but has not changed the server part of the link.
or
It could be something which could take control over the Big 2's network.
I will keep it watching....
If the webservers cannot handle it, I suggest to Block the Ip Ranges we detect over here.
When there is a Mass Block, there will be a Solution.
17 94 ms 94 ms 92 ms reserved.above.net [209.249.73.70]
18 93 ms 92 ms 92 ms 66.249.65.236
Trace complete.
Almost all pages, accross all of our websites, are indexed by Goolge on a daily basis.
Up until the 14th/15th of this month, this had been happening on IP ranges 64.68.*
We are still being indexed however not on the above IP range!
15 so-3-0-0.mpr2.iad2.us.above.net (64.125.28.214) 89.643 ms * 88.532 ms
16 reserved.above.net (209.249.73.70) 94.271 ms 94.594 ms 93.38 ms
17 66.249.65.236 (66.249.65.236) 107.996 ms 90.158 ms 90.918 ms
however, whois clearly shows google. Is this a spoof whois entry?
GOOGLEBOT (MOZILLA)
66.249.66.0
GOOGLEBOT (MOZILLA)
66.249.65.0
MSNBOT
207.46.98.0
After Blocking the above 3 ranges, Google still Crawls with its regular user agent in the below ip:
66.249.64.xx Googlebot/2.1 (+http://www.google.com/bot.html)
MSN crawls with
65.54.188.xx
msnbot/0.3 (+http://search.msn.com/msnbot.htm)
[webmasterworld.com...]
Reading through some information on this robot, at least one person states that they have confirmed with Google that it is their agent.
it's just not the bot you want to see if do aff marketing.... just a feeling and a few tests ;)
Speculation is (and I think their might be something to this) that Google may be stepping up the fight against cloaking.
So when I have these kind of links:
[mysite.com...]
gbot is trying to request from my site:
[mysite.com...]
is this a bug or is there something weird going on?
There are better ways to identify cloaking, Personally I rule out that possibility.
Apart from that, What has it to do with the MSNBOT on the same time?
There is one more person confirmed in this thread itself he also sees a mass request from MSNBOT.
aah it seems that googlebot is trying to find pages from my server where I am (out) linking.So when I have these kind of links:
[mysite.com...]
gbot is trying to request from my site:
[mysite.com...]
is this a bug or is there something weird going on?
That's precisely what's happening to me. Methinks there's a bug.
So when I have these kind of links:[mysite.com...]
gbot is trying to request from my site:
[mysite.com...]
Google may be probing to find out how various redirects are being used. They may be trying to detect, for examples, redirects intended to canonicalize a domain name (where the same page would exist on both domains) versus redirects used to count clicks (as in your example) versus cloaking and hijacking redirects.
I manage a few search engine web sites and two months ago we were getting hit hard by the new MSNbot. Their bot was actually posting search terms to our search results pages, and a lot of the 'search terms' that it was using were just a bunch of gibberish. Some of it set off some SQL inject attack alarms so I decided I had to do something about it. I contact MS about the issue, and they said that their bot would only crawl URL's that they knew had been viewed by humans, which I had to question. Somewhere their data got corrupted because the garbage they were searching for on my sites was barely text. I figured that in their zeal for adding as many pages to their SERPs as quickly as possible, they were probably taking info from their toolbar users and crawling those URL's.
For the past week or so I've noticed an approximate 20% increase on search traffic on my search engine sites so I decided I needed to investigate. Sure enough, I'm getting searched by the new Googlebot. I don't recall ever having the Googlebot (or any other bot besides MSNbot) crawl my SERPs.
I also see in this thread that others are reporting pages being crawled by the new Googlebot that no longer exist, or perhaps never existed. Getting URL's to crawl from the Google toolbar could be an explanation for this since users often type in incorrect URL's before they type in the correct page.
This is just a theory, but with my small sampling, it makes sense. Can anyone add anything that helps prove or disprove this theory?
1. Adsense stats not available for over 1 day
2. Adwords stats not showing up (on the same day Adsense had problems)
3. Gbot Running hard
4. PR not showing up on the toolbar
I have never seen the PR bar blank before - maybe it is just my connection.
Coincidence? I think not.
So when I have these kind of links:[mysite.com...]
It's off topic, but you should be URL encoding the second URL like this since it's a get variable....
http://www.mysite.com/index.php?out=http%3A%2F%2Fwww.othersite.com%2Fpage1.html
It's Gbot on Viagra,... or Levitra,... or Cialis,... or Vardenafil,... or SuperViagra or ... anyway, you get the idea. Gbot is running hard for long time.;)I posted msg #7 of this topic. At the time, GBot was going nuts to the point of putting quite noticable and prolonged spikes on my bandwidth graph 3 days last week. However, such a spike has not occurred since Thursday.
I'm going to go ahead and say GBot running normal. Or perhaps slightly elevated.
-Nuttzy
Page Views 154
Percent 5.98 %
Spider Google: 66.249.65.111 Mozilla/5.0 (compatible; Googlebot/2.1;+http://www.google.com/bot.html)
After that ... there were another 68 additional hits at a more reasonable rate. Not a very polite spider!
Come on Google, pull the reigns in on this bot. Spidering is one thing. Slamming the site is another.
Good point - I quit my full time job to do just that (among other things)
Nice :). I have little doubt it's not a computer or SEO book either.
Lesson's learned here brought my site to page 1 and all the better for us that do care when others shun the tidbits and morsels to be found on WebmasterWorld.
How many times since I've been a memember here I've heard rants and raves and conspiracy theories aboud.
All I can tell you is Google has been very good to me and no "the sky is not falling".
My site isn't very large (about 100 pages) and Gbot has spidered each page 4x today.
Page Views 154After that ... there were another 68 additional hits at a more reasonable rate. Not a very polite spider!
No. You people are talking about a couple hundred hits. Give me a break! Look at the first post of this topic...
googlebot requesting between 2 - 5 pages a second, not seen this type of spidering for a long time...we're talking magnitudes higher. I had a few hundred visits this month before last week. Last week I had 20,000 visits. Now it is back to being much more reasonable.
I'm pretty sure that's the magnitude we are talking about people... not an extra 100 here and there.
-Nuttzy
It seems that googlebot is requesting LOTS of pages that are not existing on my server.One possibility is this:
If you have a page that does an HTTP 301 Location: redirect to an HTML page on another domain, Googlebot for the last few days (at least on my site) has started using the following new behavior: it will assume the base URI is the URL of the page that issued the redirect.
Thus, if www.A.com/misc/links?dipsy_doodle issues a 301 redirect to www.dipsydoodle.com/links.html, and that HTML contains an anchor with an href of "./fred.html", the Googlebot will issue an HTTP GET for "www.A.com/misc/fred.html". As another example, if the foreign page contains an href like "/sammy.html", then Googlebot will issue a GET for "www.A.com/sammy.html".
This is a change in behavior, and is generating a lot of 404s for a lot of webmaster logs, since this mechanism is a common way to get reliable click tracking.