googlebot, robots.txt and spidering

Forum Moderators: goodroi

Message Too Old, No Replies

googlebot, robots.txt and spidering

Perplexed by lack or spidering of the site

partnermine

12:47 pm on Apr 21, 2005 (gmt 0)

Our site is two weeks old. And we are getting traffic. That's the good part :)

If it makes a difference to any answers here we are also using google Adwords

Googlebot visits us daily and inspects robots.txt and also / And we are indexed on google with meta tags content appearing in the google results when I search on our domain (oddly similar to my user name here)

robots.txt contains:

# basically allow everything
User-agent: *
Disallow: /somedirectoryorother/

It validates and I am content that it ought to allow full spidering

Belt and braces I also have

< meta name="robots" content="index,follow" > (without the spaces near the < and > )

on every page I want indexed (though this was a recent addition and seems not to help).

Is it that googlebot only visits home pages for a while before it gets around to deep spidering? Or am I missing something fundamental?

ThomasB

3:26 pm on Apr 21, 2005 (gmt 0)

You need more links, because that's the key to make Googlebot "hungry" and spider more pages of your site.

partnermine

3:56 pm on Apr 21, 2005 (gmt 0)

Now that makes perfect sense. We then get into an issue of ergonomic design versus robot friendly. We do have quite a fearsome and growing links page behind the home page, but this page never gets touched by googlebot. But I take the meaning from your post that the home page itself requires those links.

May I be specific in my questions over this, please:

Do you mean "internal links" within the site?
Or do you mean "external links" from the site?
Or do you mean "Internal links on the homepage"?
Or do you mean "External links" on the home page?
Or do you mean "links from other sites to ours?"

Deab

6:23 am on Apr 22, 2005 (gmt 0)

I think ThomasB is refering to incoming links.

partnermine

6:46 am on Apr 22, 2005 (gmt 0)

I think ThomasB is refering to incoming links.

And yet I have an obscure personal site on my ISP's obscure server that is fully spidered though I think no-one links to it ever. Well 2 things do according to google. One is dreambook (yup, vanity guestbook, but it seems they have a use) and the other is a personal site of some other nut, too.

Both of those link only to the homepage of the very specialist and uninteresting personal site.

My commercial site is linked to. Google sees the sites that link to it and the links. And yet googlebot only hits the / page and does not burrow deeper.

We're throwing content and links within it and from the site outwards at it at present.

I imagine googlebot ignores google adwords links in, but if you search on my user name here there are currently 144 hits of which I claim 80% as "mine". Among that lot are several genuine permanent links

partnermine

9:31 am on Apr 23, 2005 (gmt 0)

Interesting. In case it makes any difference at all, which some experts say anecdotally it does (but how can you prove this is true or not ;-)?) I have been surfing away on a google toolbar equipped IE browser. How boring surfing your own site is! I have read my terms and conditions of trading about 40 times now!

Anyway that was a side issue. Last night google sent two different bots. One was quite bright and could read gzipped files. It hit / and nothing else.

Regular Googlebot arrived and hit 3 or four more pages. While they are not yet indexed in the SERP I am starting to think that just maybe googlebot visits severla times to make sure it isn't going to make a fool of itself indexing a site that then chnages character immediately or goes away, and then schedules a deep spidering event for the future once it's done a sample spidering. OK that is a pure guess based on what I would design.

We have had the google imagebot visit as well. No images yet. Though we only expect the basic site images to appear and not the images of the dating site members. Those are protected from spidering as are their profiles.

We submitted the site to google on the 7th of April. Maybe we are just expecting miracles :)

partnermine

1:51 pm on Apr 28, 2005 (gmt 0)

well, I promised you an update. And last night it happened. Google put us happily in its directory, pretty much fully spidered.

Now I am an 80% happy bunny. I can work on my content and specialised meta tags to my heart's content.

One oddity, though. a search on link:http://somedomainorother produces no "links in" under google's regime but produces a small but perfectly formed list under Yahoo (which, by coincidence, was a good little robot last night.)

Google finds sites that link to us if I search for my domain (well actually my user id here). It simply has not "registered" those links. So my next question is "Are these a separate process from the main spidering and listing?"

Reid

5:12 am on May 6, 2005 (gmt 0)

google is pretty secretive about who is linking to you.
Using link: in google is pretty much a waste of time, everyone uses yahoo for that.