homepage Welcome to WebmasterWorld Guest from 54.161.155.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
...tell which pages google spidered and when?
How to tell which pages google spidered and when?
nuevojefe




msg:61000
 2:19 am on Oct 23, 2003 (gmt 0)

I have c-panel as my web configurator-whateveratoritscalled, is there something in that or some other stats program that can tell me when and which pages googles spider has crawled? Whats the IP?

Is there a way of knowing who referred the bot? ie:which link it crawled in from?

I had multiple links from a pr7 site that gets updated daily and my serps went up the next day, two days later, same links but I've disappeared. Just wondering what could be the cause, no changes on the pr7 site. Can I tell somehow if google didn't crawl that link?

Also, my site is a pr4 but google indexes the home page and a few others every day. My content changes only minorly once every 7-10 days. Is that normal?

 

Marcia




msg:61001
 6:15 am on Oct 23, 2003 (gmt 0)

It's all normal, and it doesn't matter if it's cpanel, it depends on whether your host makes logs available to you, or what kind of stats program you've got. That's how you'd see.

Mark_A




msg:61002
 9:17 pm on Oct 23, 2003 (gmt 0)

nuevojefe if you can get your raw logs, google's crawler asks for robots.txt regularly (it you dont have one of these this log entry may be in your error log)

Plus it usually always leaves a referrer saying something like

"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

or similar, once you have some of these its easy to isolate what it has requested.

nuevojefe




msg:61003
 3:16 pm on Oct 24, 2003 (gmt 0)

Thanks for the pointers, I think my raw log was turned off, I changed it yesterday to auto-archive so I'll see if it's not empty anymore in a minute.

Thanks!

martingj




msg:61004
 8:13 pm on Oct 24, 2003 (gmt 0)

The most helpful I found, was using a custom 404 page that sends an e-mail.
Thus I see any stupid mistakes I make and as I don't have a robots.txt all the crawlers looking for it
See if your cpanel allows you to define a custom error page and if yes plug a nice error message in that also sends an e-mail (look on g for custom error pages).
M

Arnett




msg:61005
 4:23 pm on Oct 27, 2003 (gmt 0)

I put <!--#echo var="DATE_GMT" --> in the footer of my pages. When Google caches the page it will show the exact date and time that the page was crawled.

u4eas




msg:61006
 8:03 pm on Oct 27, 2003 (gmt 0)

Great idea about the date...

Anyone know how to code this same thing in asp?

<!--#echo var="DATE_GMT" -->

u4eas




msg:61007
 8:16 pm on Oct 27, 2003 (gmt 0)

Nevermind the post I made above I used:

<% Response.Write FormatDateTime(Time, vbLongTime) %>
<% Response.Write FormatDateTime(Date, vbLongDate) %>

Arnett




msg:61008
 4:04 pm on Oct 29, 2003 (gmt 0)

For those who are interested: In php files I use the following in the footer area to catch the spider date and time in the cached copy of the page:

<?php echo date("l dS of F Y h:i:s A T");?>

On another topic,using the line below in the second line of a php file sets the file header information to show "recently modified" and can help with fresh listing:

header('Last-Modified: '.gmdate('D, d M Y H:i:s \G\M\T', time()));

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved