Google spidering

Forum Moderators: open

Message Too Old, No Replies

Google spidering

MustBeFirst

11:20 am on Mar 25, 2004 (gmt 0)

Hi all,

Just a quick question about google's spidering. Google spiders/crawls me daily, but only about 3 pages (out of about 300) and always the same ones. Every once in a while however, it will spider more pages (40+) but never all of them.

The other funny thing is that one of the pages it crawls daily is default.asp - I dont have a page called default.asp and I never had. Why would it spider this page which I dont have? Also, the pages it DOES spider everyday is very irrelavant (to me anyway) - it's things like "contact us" and some other very irrelevant pages. I would like it to spider/crawl my PRODUCT pages which surely is more relevant than "contact us".

Any ideas? As far as I can see my directory/layout should enable any spider to easily go through the whole site - no broken links etc etc.

Thanks,
MBF

BroadProspect

3:44 pm on Mar 25, 2004 (gmt 0)

try to arrange some incoming links to inner pages
/BP

dazzlindonna

4:37 pm on Mar 25, 2004 (gmt 0)

Perhaps there is the odd link somewhere out there on someone's site that is mistakenly pointing to a default.asp page on your site. Google sees the link and attempts to spider the non-existant page.

Nova Reticulis

4:49 pm on Mar 25, 2004 (gmt 0)

It also might have to do with frequency of updates.

MustBeFirst

4:51 pm on Mar 25, 2004 (gmt 0)

Thanks for the responses....

dazzlindonna, I have never thought about that - I think you may be right. Thanks, I'll see if I can figure out who it is.

Cheers,
MBF

GoogleGuy

4:59 pm on Mar 25, 2004 (gmt 0)

Or just add a default.asp page. :) Make it a site map. And make your "contact us" pages point to the site map too. In the site map, include links to the product pages.

Changing your link structure on your site is a great way to suggest to Googlebot which pages you think are the most important. If every one of your pages points to the "contact us" page, you can see where we'd crawl that earlier instead of later.

So I'd be thinking about how to take the visits you do get and how to try to guide them in the direction you want. Something to think about..

MustBeFirst

5:07 pm on Mar 25, 2004 (gmt 0)

Hi GoogleGuy,

Thanks for this. I was thinking about exactly that, but I was/am scared that google in some how frowns upon that.

To tell you the truth - all the things you read about google banning you and "negative effects" on pr and position etc scares me so much that I never know what I can and cannot do.

I will however take your advice and give it a go.

Thank you very much.
MBF

hutcheson

5:26 pm on Mar 25, 2004 (gmt 0)

>To tell you the truth - all the things you read about google banning you and "negative effects" on pr and position etc scares me so much that I never know what I can and cannot do.

I venture to say Google NEVER has banned a site for linking to itself. dmoz.org has approximately 6 MILLION links to itself: you'd think if this were a bannable offense, we'd be first. Yahoo! probably has 2 to 4 million. I have a quite nicely spidered personal subsite with 4 times as many internal links as pages -- and I think I need more internal links.

Dayo_UK

5:33 pm on Mar 25, 2004 (gmt 0)

>>>I venture to say Google NEVER has banned a site for linking to itself.

Although someone has experienced a penalty for a site that provides a navigation system/site map that links to itself - apparantently Keyword stuffing (!) - and this may be part of the way that Google has cracked down on Directory sites that link to their own serps (These sites typically have hundreds of site maps to get the thousands of pages indexed).

HybernaTe

7:46 am on Mar 29, 2004 (gmt 0)

hello all

i know this question of mine is very lame. but i want to ask something.

how would i know the pages that googlebot crawled?

if im in the webserver (linux) how would i know that googlebot is crawling my website(s)?

*logs?

thanks

MustBeFirst

8:06 am on Mar 29, 2004 (gmt 0)

Hi Hibernate,

Yes, your weblogs are the way to go......

HybernaTe

8:45 am on Mar 29, 2004 (gmt 0)

could you site specific log files. i have awstat?

kovacs

8:47 am on Mar 29, 2004 (gmt 0)

It depends on your OS... my servers are freebsd, and the logs are in /usr/www/logs... so I a quick and dirty check for spider activity would be:

grep googlebot /usr/www/logs/*

or a quick count of spider visits would be:

grep googlebot /usr/www/logs/* ¦ wc -l

HybernaTe

8:55 am on Mar 29, 2004 (gmt 0)

thanks :)

Nova Reticulis

2:44 pm on Mar 29, 2004 (gmt 0)

Hyber, awstats has specific functionality that can detect Google spider and tell you when it visited last and what it took