homepage Welcome to WebmasterWorld Guest from 54.205.197.66
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
Googlebot log entries, plus 301's for bots
MarkOly




msg:4551801
 5:12 pm on Mar 6, 2013 (gmt 0)

Hi, I've been checking my logs recently. I want to make sure I understand these entries.

Throughout the day, I see several single-page requests by Googlebot for various pages on the site. I would expect a regular "scheduled crawl" to enter through the home page and request additional pages. Is it safe to assume that most single-page requests, like below, are Googlebot following backlinks from external pages?

66.249.75.172 - - [28/Feb/2013:08:58:34 -0500] "GET /page1.htm HTTP/1.1" 200 12968 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Here, it looks like Googlebot is accessing the page as a mobile device. I wonder why it would do that? To test my site for mobile compatibility?

66.249.75.172 - - [28/Feb/2013:14:07:41 -0500] "GET /page2.htm HTTP/1.1" 200 20997 "-" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

Also, I am seeing alot of 301 results on live pages that shouldn't be redirected. The 301's are only happening with Googlebot and various other bots. Even my robots.txt file is getting a 301 from bingbot (second example). It doesn't look like human users are encountering any 301's. Here are some examples:

66.249.75.113 - - [28/Feb/2013:08:32:09 -0500] "GET /page3.htm HTTP/1.1" 301 513 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

157.55.34.171 - - [28/Feb/2013:07:56:41 -0500] "GET /robots.txt HTTP/1.1" 301 511 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.34.171 - - [28/Feb/2013:07:56:41 -0500] "GET /robots.txt HTTP/1.1" 200 515 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.34.171 - - [28/Feb/2013:07:58:26 -0500] "GET / HTTP/1.1" 301 491 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

100.43.83.155 - - [05/Mar/2013:11:51:49 -0500] "GET / HTTP/1.1" 301 491 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"
100.43.83.155 - - [05/Mar/2013:11:51:54 -0500] "GET / HTTP/1.1" 200 21060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"
100.43.83.155 - - [05/Mar/2013:11:52:02 -0500] "GET / HTTP/1.1" 200 21060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)"

Maybe the "MirrorDetector" on that last one is a clue. Is that normal for Googlebot and other bots to artificially trigger a 301 response? Or is there some other issue I need to look at? My error logs are about empty.

Thanks!
MarkO

 

Andy Langton




msg:4551806
 5:20 pm on Mar 6, 2013 (gmt 0)

Google doesn't crawl URLs based on directly following links and thus doesn't "enter" the site in any traditional fashion - rather, it finds links and then adds them to the crawl queue (which includes its own priorities for when to grab URLs). Of course, more external links is likely to translate into a higher priority for crawling.

In terms of 301s, most common reason is a crawler asking for a non-canonical version of the hostname, e.g. example.com/page rather than www.example.com/page. Unless you're recording hosts in the logfile they'll look the same.

MarkOly




msg:4551814
 6:07 pm on Mar 6, 2013 (gmt 0)

Thanks Andy. That makes me feel better.

Google doesn't crawl URLs based on directly following links and thus doesn't "enter" the site in any traditional fashion - rather, it finds links and then adds them to the crawl queue (which includes its own priorities for when to grab URLs). Of course, more external links is likely to translate into a higher priority for crawling.

So I need to focus on these pages Googlebot is hitting. I can't believe I haven't been looking at this!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved