homepage Welcome to WebmasterWorld Guest from 54.226.0.225
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Googlebot not following links
DJAxion




msg:42293
 12:36 pm on Feb 8, 2004 (gmt 0)

Have a problem with google indexing my site:
Google only indexes the index.htm page, and googlebot is only visiting that one.
My logfile shows this for weeks now:

64.68.82.143 - - [08/Feb/2004:08:22:08 +0100] "GET /robots.txt HTTP/1.0" 200 98 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.143 - - [08/Feb/2004:08:22:09 +0100] "GET / HTTP/1.0" 200 12306 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

Everyday googlebot requests the robots.txt file, and after that it gets the index file and thats it.
Other bots index my entire site, froogle preditcor looks ok, search engine spider simulators look fine...
Anybody knows what the problem is?

[edited by: Marcia at 4:15 am (utc) on Feb. 9, 2004]
[edit reason] URL not necessary. [/edit]

 

Marcia




msg:42294
 8:44 am on Feb 9, 2004 (gmt 0)

You might want to double-check that the robots.txt validates

[searchengineworld.com...]

DJAxion




msg:42295
 8:52 am on Feb 9, 2004 (gmt 0)

Just done that...

No errors detected! This Robots.txt validates to the robots exclusion standard!

my robots.txt file is just this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

What my concern is, that only googlebot seems to be having problems with indexing my site.

mcavill




msg:42296
 9:08 am on Feb 9, 2004 (gmt 0)

I've got a couple of sites that get the robots.txt and index page crawled daily and then about every 3 weeks to a month the bot does a deep crawl.

I think if you get more links pointing at your site (and at deep pages, rather than just the index page) your site should get more pages crawled more frequently.

tribal




msg:42297
 10:58 am on Feb 9, 2004 (gmt 0)

Yep noticed that too. I put a new site online, over a week ago, and gave it some (2) links: a PR5 and PR7 link.

This site got only the index.html spidered. Another site that was launched just about a month before, was expanded with about 50 pages, and got completely indexed over the weekend.

I think Google has sharpened the rules of getting in the index - maybe storage/analysis capacity starting to play a role? Maybe it won't even fully index sites if they don't have more than, say, about 10 backlinks from different sites?

Silent_Bob




msg:42298
 1:40 pm on Feb 9, 2004 (gmt 0)

I've had the same thing. I put a site up before the austin update and it got deepcrawled almost immediately with just a few pr4 links. I put a site up just after Austin and google only takes robots.txt and index.htm. At the moment though, all my sites are only having these two pages spidered by google, even my high pr ones.

needinfo




msg:42299
 1:50 pm on Feb 9, 2004 (gmt 0)

I have excately the same problem. Most other sites in our industry sector are getting spidered and are showing fresh dates but none of my sites are.

Some of my sites do not have a robots.txt file but some do and they've all been validated ok.

This has been going on for aprox. the past 4 - 5 weeks.

metrostang




msg:42300
 3:25 pm on Feb 9, 2004 (gmt 0)

Last night I saw a similar visit from Googlebot, then it left without indexing. This is the code.

64.68.82.168 - - [09/Feb/2004:01:45:49 -0500] "GET / HTTP/1.0" 200 26432 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

I did notice that msnbot, openbot and Turnitinbot were all on the site last night and all indexed some pages. Could they have had anything to do with Googlebot leaving?

Can anyone interpret the above code? It appeared twice during the night.

tribal




msg:42301
 4:49 pm on Feb 9, 2004 (gmt 0)

Other bots can't really interfere with each other. That line simply means Gbot came by and requested the default document of your website (index.html?).

Same thing happens here on a new domain, but it *should* request more files (links from withing the document). I checked the error log, but Gbot did request the file correctly so the server/site shouldn't be the problem.

Jakpot




msg:42302
 6:48 pm on Feb 9, 2004 (gmt 0)

Had the same problem last night. Googlebot came by and grabbed my index page only.
I lost 2/3 of my other pages that were indexed.
Something is very wrong with the Googlebot crawls.

metrostang




msg:42303
 7:17 pm on Feb 9, 2004 (gmt 0)

Could this be a preliminary visit before a deep crawl? The last one I had was around January 12.

KeywordROI




msg:42304
 8:51 pm on Feb 9, 2004 (gmt 0)

Keeping the sites to be nicely indexed, have well managed internal and external links and getting links accross several pages of your website and not just the homepage works well in getting the websites indexed full. I got my website with over 5000 pages indexed within 4 days...still NO PR though, but has started getting me traffic and sales.

Helza




msg:42305
 9:16 pm on Feb 9, 2004 (gmt 0)

Same thing here, everything checks out.. robots.txt valid, all nice links.. and now i've even gone as far as offering a stripped version to googlebot (removed the layout code for it) .. still no luck..

Just 2 request robots.txt and index..
inktomisearch.com does the same aswell.. :(

Jakpot




msg:42306
 9:28 pm on Feb 9, 2004 (gmt 0)

"Had the same problem last night. Googlebot came by and grabbed my index page only.
I lost 2/3 of my other pages that were indexed.
Something is very wrong with the Googlebot crawls."

Correction:
It grabbed my index page and Sitemap only.
Here's hoping for a deep crawl sooner than later

tantalus




msg:42307
 10:02 pm on Feb 9, 2004 (gmt 0)

Mmmm interesting.

Gbot comes to my index page once a day and deep crawls once a month, usually a week before any major update - just changed recently. This has been happening since October. However I've also noticed the index page is returning a 304 status (not modified since)and assumed this was the reason for the lack of gbot activity.

Anyway just changed my index and added 100 or so pages so I'm hoping freshbot will be along in the next few days.

Can only wait and see. ;)

a_chameleon




msg:42308
 11:12 pm on Feb 9, 2004 (gmt 0)


What works for me is FTP'ing everything down and back up again, to keep Last-Modified changing daily (and small text changes to 50 - 60% of the pages).

I'd think the call for your site map's a good sign.. :-)

Mebbe Gbot will be back now that it shows a site map in it's doc. batrrels for the site, and one that hasn't been followed at that...

tribal




msg:42309
 9:03 am on Feb 10, 2004 (gmt 0)

I've changed several pages the past days, and I've had a sitemap from the start (always good to have a sitemap for SEs!). Our server/site shouldn't be the problem, as far as I can see.

Adding a PR7 link to the sitemap led to it being indexed, but it doesn't show up in the SERPs yet. Still, it should have spidered more links from the sitemap by now.

Older sites are getting deepcrawled allright, so it's either a problem with new sites, or some adjustment in the bot NOT to deepspider new sites unless they meet certain terms (e.g. mucho links :) ).

Nuri




msg:42310
 7:12 pm on Mar 4, 2004 (gmt 0)

It is a new algorihm

[cs.toronto.edu...]

Check This out.

bnmwebmaster




msg:42311
 8:17 pm on Mar 5, 2004 (gmt 0)

Hello,

I am getting the same thing on my site. I submitted to Google about 6-8 weeks ago. It does the same thing everytime it comes to my site; goes to robots.txt then to the home pages and then leaves. It does this process, sometimes multiple times per day.

ed

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved