Forum Moderators: open

Message Too Old, No Replies

Bi-weekly full crawl for PR6 sites?

Googlebot's been busy crawling this month

         

luma

12:29 am on Jul 22, 2002 (gmt 0)

10+ Year Member



rtsit was mentioning daily Googlebot visits. I checked the stats for a small PR 6 site [webmasterworld.com]. Besides almost daily visits, it seems like there was a full crawl exactly two weeks after the last full crawl. Can anyone confirm this?

dvduval

12:40 am on Jul 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No full crawl on my PR6 site. The only method I have of checking to see the last crawl however is using [search2.cometsystems.com...] The server is not at my location and I only get a Webtrends report once/month. Is this the best I can do? Is there a better way to see what has been crawled? Does crawl = cache update?

luma

12:53 am on Jul 22, 2002 (gmt 0)

10+ Year Member



If you own a domain, you should be able to download the raw HTTP server log files. Then search for "Googlebot" and you can be pretty sure.

Crawl only means, Googlebot requested the page from your server. It will take some days/weeks before the regular Google database gets updated.

dvduval

12:56 am on Jul 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would love to know the easiest method to request the raw files so that I can better participate in discussions such as this. I own 4 domains.

luma

1:51 am on Jul 22, 2002 (gmt 0)

10+ Year Member



Ask the company that is hosting your domains. Ask them if you can get the raw log files. How do you upload files? I have access to two domains. On one I can either use www.mydomain.com/logs/ to view some stats or download the logs file using FTP. On the other domain, I can view some stats online and download the raw files after specifying a time span. It really depends on the company hosting your domain. Check your hosting agreement or write them an email/call them.

ryan19

6:59 am on Jul 22, 2002 (gmt 0)

10+ Year Member



What tools do you use or do you just manually go through your raw server log to see when/how often and where googlebot has crawled?

bcc1234

9:39 am on Jul 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What tools do you use or do you just manually go through your raw server log to see when/how often and where googlebot has crawled?

User grep on your log file.

chiyo

10:11 am on Jul 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ryan, if you are not a command line UNIX type, you can download the raw file into a spreadsheet like Excell and filter/sort.. Thats quite effective. Also a good text editor will let you sort lines or search for text strings like "Googlebot".

They are simple solutions without having to download and learn Analog, Webtrends etc etc., or learn how to use grep!

luma

10:46 am on Jul 22, 2002 (gmt 0)

10+ Year Member



We are really getting off-topic here. The Tracking and Logging forum [webmasterworld.com] would be more appropriate...

But to answer your question: since I am on Linux, I am using grep, e.g., grep Googlebot access.log.29 > gb to "copy" all Googlebot records to the file gb. But you should be able to import the log files to Excel or use some other logfile evaluation tool, e.g., webalizer or http-analyze. You can then sort/group by the user-agent column and search for Googlebot.

I prefer a mix of both: the graphical tools to give me an overview and the GNU tools to dig in deep and dirty (e.g., to see all the referrer; check what people were searching for; who caused 404s; what browsers people (as opposed to bots) were really using). BTW: don't trust any browser (user agent) statistic unless they show you the raw log files. At least 50 % of Opera users cloak, many bots id as Netscape or IE, ...