Welcome to WebmasterWorld Guest from 54.146.1.178

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Googlebot log entries, plus 301's for bots

     
5:12 pm on Mar 6, 2013 (gmt 0)

Junior Member

joined:Feb 20, 2013
posts: 77
votes: 0


Hi, I've been checking my logs recently. I want to make sure I understand these entries.

Throughout the day, I see several single-page requests by Googlebot for various pages on the site. I would expect a regular "scheduled crawl" to enter through the home page and request additional pages. Is it safe to assume that most single-page requests, like below, are Googlebot following backlinks from external pages?

66.249.75.172 - - [28/Feb/2013:08:58:34 -0500] "GET /page1.htm HTTP/1.1" 200 12968 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Here, it looks like Googlebot is accessing the page as a mobile device. I wonder why it would do that? To test my site for mobile compatibility?

66.249.75.172 - - [28/Feb/2013:14:07:41 -0500] "GET /page2.htm HTTP/1.1" 200 20997 "-" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

Also, I am seeing alot of 301 results on live pages that shouldn't be redirected. The 301's are only happening with Googlebot and various other bots. Even my robots.txt file is getting a 301 from bingbot (second example). It doesn't look like human users are encountering any 301's. Here are some examples:

66.249.75.113 - - [28/Feb/2013:08:32:09 -0500] "GET /page3.htm HTTP/1.1" 301 513 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

157.55.34.171 - - [28/Feb/2013:07:56:41 -0500] "GET /robots.txt HTTP/1.1" 301 511 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.34.171 - - [28/Feb/2013:07:56:41 -0500] "GET /robots.txt HTTP/1.1" 200 515 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.34.171 - - [28/Feb/2013:07:58:26 -0500] "GET / HTTP/1.1" 301 491 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

100.43.83.155 - - [05/Mar/2013:11:51:49 -0500] "GET / HTTP/1.1" 301 491 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"
100.43.83.155 - - [05/Mar/2013:11:51:54 -0500] "GET / HTTP/1.1" 200 21060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"
100.43.83.155 - - [05/Mar/2013:11:52:02 -0500] "GET / HTTP/1.1" 200 21060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"

Maybe the "MirrorDetector" on that last one is a clue. Is that normal for Googlebot and other bots to artificially trigger a 301 response? Or is there some other issue I need to look at? My error logs are about empty.

Thanks!
MarkO
5:20 pm on Mar 6, 2013 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Google doesn't crawl URLs based on directly following links and thus doesn't "enter" the site in any traditional fashion - rather, it finds links and then adds them to the crawl queue (which includes its own priorities for when to grab URLs). Of course, more external links is likely to translate into a higher priority for crawling.

In terms of 301s, most common reason is a crawler asking for a non-canonical version of the hostname, e.g. example.com/page rather than www.example.com/page. Unless you're recording hosts in the logfile they'll look the same.
6:07 pm on Mar 6, 2013 (gmt 0)

Junior Member

joined:Feb 20, 2013
posts: 77
votes: 0


Thanks Andy. That makes me feel better.

Google doesn't crawl URLs based on directly following links and thus doesn't "enter" the site in any traditional fashion - rather, it finds links and then adds them to the crawl queue (which includes its own priorities for when to grab URLs). Of course, more external links is likely to translate into a higher priority for crawling.

So I need to focus on these pages Googlebot is hitting. I can't believe I haven't been looking at this!