Welcome to WebmasterWorld Guest from 23.23.53.177

Forum Moderators: open

Message Too Old, No Replies

Googlebot not following links

   
12:36 pm on Feb 8, 2004 (gmt 0)

10+ Year Member



Have a problem with google indexing my site:
Google only indexes the index.htm page, and googlebot is only visiting that one.
My logfile shows this for weeks now:

64.68.82.143 - - [08/Feb/2004:08:22:08 +0100] "GET /robots.txt HTTP/1.0" 200 98 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.143 - - [08/Feb/2004:08:22:09 +0100] "GET / HTTP/1.0" 200 12306 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Everyday googlebot requests the robots.txt file, and after that it gets the index file and thats it.
Other bots index my entire site, froogle preditcor looks ok, search engine spider simulators look fine...
Anybody knows what the problem is?

[edited by: Marcia at 4:15 am (utc) on Feb. 9, 2004]
[edit reason] URL not necessary. [/edit]

8:44 am on Feb 9, 2004 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



You might want to double-check that the robots.txt validates

[searchengineworld.com...]

8:52 am on Feb 9, 2004 (gmt 0)

10+ Year Member



Just done that...

No errors detected! This Robots.txt validates to the robots exclusion standard!

my robots.txt file is just this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

What my concern is, that only googlebot seems to be having problems with indexing my site.

9:08 am on Feb 9, 2004 (gmt 0)

10+ Year Member



I've got a couple of sites that get the robots.txt and index page crawled daily and then about every 3 weeks to a month the bot does a deep crawl.

I think if you get more links pointing at your site (and at deep pages, rather than just the index page) your site should get more pages crawled more frequently.

10:58 am on Feb 9, 2004 (gmt 0)

10+ Year Member



Yep noticed that too. I put a new site online, over a week ago, and gave it some (2) links: a PR5 and PR7 link.

This site got only the index.html spidered. Another site that was launched just about a month before, was expanded with about 50 pages, and got completely indexed over the weekend.

I think Google has sharpened the rules of getting in the index - maybe storage/analysis capacity starting to play a role? Maybe it won't even fully index sites if they don't have more than, say, about 10 backlinks from different sites?

1:40 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



I've had the same thing. I put a site up before the austin update and it got deepcrawled almost immediately with just a few pr4 links. I put a site up just after Austin and google only takes robots.txt and index.htm. At the moment though, all my sites are only having these two pages spidered by google, even my high pr ones.
1:50 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



I have excately the same problem. Most other sites in our industry sector are getting spidered and are showing fresh dates but none of my sites are.

Some of my sites do not have a robots.txt file but some do and they've all been validated ok.

This has been going on for aprox. the past 4 - 5 weeks.

3:25 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Last night I saw a similar visit from Googlebot, then it left without indexing. This is the code.

64.68.82.168 - - [09/Feb/2004:01:45:49 -0500] "GET / HTTP/1.0" 200 26432 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

I did notice that msnbot, openbot and Turnitinbot were all on the site last night and all indexed some pages. Could they have had anything to do with Googlebot leaving?

Can anyone interpret the above code? It appeared twice during the night.

4:49 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Other bots can't really interfere with each other. That line simply means Gbot came by and requested the default document of your website (index.html?).

Same thing happens here on a new domain, but it *should* request more files (links from withing the document). I checked the error log, but Gbot did request the file correctly so the server/site shouldn't be the problem.

6:48 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Had the same problem last night. Googlebot came by and grabbed my index page only.
I lost 2/3 of my other pages that were indexed.
Something is very wrong with the Googlebot crawls.
7:17 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Could this be a preliminary visit before a deep crawl? The last one I had was around January 12.
8:51 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Keeping the sites to be nicely indexed, have well managed internal and external links and getting links accross several pages of your website and not just the homepage works well in getting the websites indexed full. I got my website with over 5000 pages indexed within 4 days...still NO PR though, but has started getting me traffic and sales.
9:16 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Same thing here, everything checks out.. robots.txt valid, all nice links.. and now i've even gone as far as offering a stripped version to googlebot (removed the layout code for it) .. still no luck..

Just 2 request robots.txt and index..
inktomisearch.com does the same aswell.. :(

9:28 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



"Had the same problem last night. Googlebot came by and grabbed my index page only.
I lost 2/3 of my other pages that were indexed.
Something is very wrong with the Googlebot crawls."

Correction:
It grabbed my index page and Sitemap only.
Here's hoping for a deep crawl sooner than later

10:02 pm on Feb 9, 2004 (gmt 0)

10+ Year Member



Mmmm interesting.

Gbot comes to my index page once a day and deep crawls once a month, usually a week before any major update - just changed recently. This has been happening since October. However I've also noticed the index page is returning a 304 status (not modified since)and assumed this was the reason for the lack of gbot activity.

Anyway just changed my index and added 100 or so pages so I'm hoping freshbot will be along in the next few days.

Can only wait and see. ;)

11:12 pm on Feb 9, 2004 (gmt 0)

10+ Year Member




What works for me is FTP'ing everything down and back up again, to keep Last-Modified changing daily (and small text changes to 50 - 60% of the pages).

I'd think the call for your site map's a good sign.. :-)

Mebbe Gbot will be back now that it shows a site map in it's doc. batrrels for the site, and one that hasn't been followed at that...

9:03 am on Feb 10, 2004 (gmt 0)

10+ Year Member



I've changed several pages the past days, and I've had a sitemap from the start (always good to have a sitemap for SEs!). Our server/site shouldn't be the problem, as far as I can see.

Adding a PR7 link to the sitemap led to it being indexed, but it doesn't show up in the SERPs yet. Still, it should have spidered more links from the sitemap by now.

Older sites are getting deepcrawled allright, so it's either a problem with new sites, or some adjustment in the bot NOT to deepspider new sites unless they meet certain terms (e.g. mucho links :) ).

7:12 pm on Mar 4, 2004 (gmt 0)



It is a new algorihm

[cs.toronto.edu...]

Check This out.

8:17 pm on Mar 5, 2004 (gmt 0)

10+ Year Member



Hello,

I am getting the same thing on my site. I submitted to Google about 6-8 weeks ago. It does the same thing everytime it comes to my site; goes to robots.txt then to the home pages and then leaves. It does this process, sometimes multiple times per day.

ed