Googlebot not crawling

Forum Moderators: open

Message Too Old, No Replies

Googlebot not crawling

Seeks index page, then leaves

feeder

6:12 am on Feb 12, 2004 (gmt 0)

Googlebot visits often. It requests the index page, but doesn't crawl any deeper. This happens two or three times a day.

The MediaBot crawls deeper into the site without issue. The site runs AdSense.

Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.

Any ideas?

Powdork

5:48 pm on Mar 18, 2004 (gmt 0)

You can read in the Tracking and Logging Forum [webmasterworld.com] and get some great ideas. There is usually a 'Which stats program do you use' thread going if you look hard enough. Also, I have found much better tech support there than with the products' producers.

Oops, I forgot why I came here. She just grabbed all the gallery pages she skipped before. I guess she just had to go back for more PR. I'm pretty stoked as my site just grew by a factor of 4 since yesterday in Google's eyes.

mayor

3:51 pm on Mar 19, 2004 (gmt 0)

Hey Googleguy, here's my recommendation to Google, as an answer to Yahoo's pay-for-inclusion program:

Index new pages that contain AdSense immediately. That's right, send mediabot out right away and index the page right away. BINGO! Puts those pages to work right away. Google begins earning revenues and the website begins earning revenues too, RIGHT AWAY.

To get their new pages indexed right away, webmasters would have a choice of paying Yahoo or putting AdSense on their new pages and receiving revenue from Google. Which do you think they will choose?

Dolemite

4:03 pm on Mar 19, 2004 (gmt 0)

Index new pages that contain AdSense immediately. That's right, send mediabot out right away and index the page right away. BINGO! Puts those pages to work right away. Google begins earning revenues and the website begins earning revenues too, RIGHT AWAY.

Do you think they haven't considered this?

Google has consistently tried to maintain a barrier between their revenue streams and SERPs. Its an admirable goal and tends to squelch accusations of impropriety. Why would they change that policy now?

BallochBD

4:22 pm on Mar 19, 2004 (gmt 0)

My site must be a revenue stream because they have put a barrier up to stop the Googlebot crawling it ;o)

Powdork

5:50 am on Mar 20, 2004 (gmt 0)

<snip>oops, I read the quotes without reading the original post before replying</snip>

MrSpeed

12:37 pm on Mar 20, 2004 (gmt 0)

I was getting ready to post if perhaps the common thread was affiliate sites.

I then remembered I launched a hobby site about a month ago. It's the same story with googlebot, lot's of homepage visits.

As a matter of fact I'm getting more traffic from dmoz and AV to the site than google. It's pretty sad that AV is outperforming google.

HarryM

2:31 pm on Mar 20, 2004 (gmt 0)

I launched a hobby site about a month ago. It's the same story with googlebot, lot's of homepage visits

You don't say what PR Google has given the site, if any, and what are the value of the inbound links.

IMHO Google appears to have triggers whether a site is worth crawling, and to what extent. I think that if Google has followed a link from a high PR page to get to a new site, it awards a temporary highish PR to the page it found on the new site (typically the home page). This triggers deeper crawls, and eventually Google awards a more permanent PR based on pages and inbound links found - which could be greater or less.

If it has followed a low PR link the temporary PR awarded may not be sufficient to trigger deeper crawling. Increasing the PR depends upon Google following other inbound links, but this depends on how often Google crawls the linking pages.

This also applies in reverse. If the high value inbound links start disappearing, the PR reduces, crawling reduces, and if the PR becomes low enough, the site starts to die.

I'm not sure if the above scenario is absolutely accurate, but common sense says that Google must have some mechanism in place to determine what sites are important enough to be worth crawling immediately, and those that are of less priority. It's a big web out there, and getting bigger.

Patricio

12:18 am on Mar 21, 2004 (gmt 0)

sounds reasonable, but this was not the googlebot behavior some weeks ago... normally it wouldn't stop everyday at the homepage, even in new sites with no pr or pr zero. Perhaps the bot didn't go very deep but surely it did go further than the homepage.

Patricio

11:35 am on Mar 23, 2004 (gmt 0)

Finally, after six days of no googlebot activity, now it seems googlebot is doing a deepcrawl. Doesn�t know yet how deep, but is visiting second and third level of the site.

BallochBD

1:16 pm on Mar 23, 2004 (gmt 0)

You are lucky Patricio! I still see no signs of recovery. My page rank remains at zero (from 5) and I only have four cached pages from 80 plus. I get absolutely no Google traffic and apparently no action from them despite assurances that they have passed my emails on to their engineers. It's now about two months since they targeted my business and for what?

MrSpeed

1:54 pm on Mar 23, 2004 (gmt 0)

I still see nothing on my new sites from as far back as early February.

It appears that google has removed a number of crawlers from service and has had to prioritize the crawls due to less resources. It seems like new sites are getting ignored except for the home page. Existing sites are still getting crawled based on their PR.

I suspect that Google is getting ready to pull the trigger on a revamped crawler. There are a few threads about a test crawler crawling js files.

And who knows what else is in store? Maybe they will now be able to follow cgi/php redirects.

It should be interesting.

Patricio

2:47 pm on Mar 23, 2004 (gmt 0)

well, it seems to be a really deep crawl! googlebot is still coming and now is going to the fourth level. I didn't do a site map because since googlebot was stopping at the homepage, it seemed to be a waste of time. But i'll do it now because to reach some articles you must click 10 times and i don't think google will go so deep in a pr2 site (the home page is pr2). But now I know is "googleable" i'll prepare it for next crawl.

it would be interesting to know if other sites with the same problem are beeing crawled now.

HarryM

2:02 am on Mar 24, 2004 (gmt 0)

to reach some articles you must click 10 times

I would suggest that could be a real problem for users, who typically have the attention span of a gnat. The golden rule is not more than 2 or 3 clicks to get anywhere. If the average user hasn't found what he wants by then, he's gone.

Your idea of adding a sitemap is a good one. Then as far as Google is concerned each page is only 2 clicks deep. This should help with crawling and pushing PR deeper into the site.

steveb

3:24 am on Mar 24, 2004 (gmt 0)

BallochBD, your index page is duplicated by your home.html page, and that seems to have all the internal links pointing at it. If Google chooses the home.html as the canonical page, then it has no external links to attract the crawler. In any case you gut your internal pr with that duplicate home.html page. Maybe it isn't pointless but it seems to be, so why does it exist? It certainly invites trouble.

BallochBD

4:10 pm on Mar 24, 2004 (gmt 0)

WRT the above, I am no expert in this and I am not sure that I understand what you are saying? The way my site is configured and hosted home.htm is recognised as the index file. i.e. the file that is referenced when someone links to the domain name. What do you mean by duplicate home.html page?

irishaff

10:16 pm on Mar 24, 2004 (gmt 0)

all my deep pages that were changed a week ago now have fresh tags on them, some of them are deep and are pr2/3

steveb

11:10 pm on Mar 24, 2004 (gmt 0)

Balloch, all your internal links go to site.com/home.html rather than simply site.com/ Even if home.html defaults to the root domain it is unneccesary clutter to have it in the links, but this seems like you could easily be confusing googlebot.

MrSpeed

11:16 pm on Mar 24, 2004 (gmt 0)

Somebody refresh my memory. Deepbot and freshbot were two seperate animals at one time and then they sort of became freshdeepbot.

Could it be that we're seeing more of a freshbot style crawl?

AdamG

3:07 pm on Mar 25, 2004 (gmt 0)

Hi,

This is my first post and I would like to say hi to everyone.

I have built a small static site. Submitted it to google about 3 months ago. Took a couple of weeks to visit the first time and has then returned on the same day every month since - no deep crawl - just gets the robots.txt file and index page and leaves.

I have added a sitemap today hoping it might make a difference.

Hoping google comes 'a calling' soon for the real deal.

Adam

MrSpeed

4:15 pm on Mar 25, 2004 (gmt 0)

AdamG - Welcome!

When I checked my logs this morning I noticed that googlebot came twice within 3 hours.

I just checked again now and I am finally getting a deeper crawl. We'll see how many pages get crawled.

These two sites have been waiting for a crawl since early February. To tell the truth I was being a little stubborn about trying some of the suggestions to encourage a deeper crawl because there was nothing different about these sites that I haven't done in the past.

nkakar

9:13 pm on Mar 25, 2004 (gmt 0)

Ive done a website based on a script that has internal links to my other pages on the same site, yet nopages link to each other, they all link to a page that doesnt link back to them and is not being linked from anywhere else.. eg. test1.htm has a link for a widget that goes to test1a.htm, which has the link for the same widget going to test1b.htm and so forth..

is that a good idea? i did this so theres no cross linking and multiple pages of the same content with different urls.. let me know guys

ronin100

9:56 pm on Mar 25, 2004 (gmt 0)

Hi!
I have posted on a different forum several times and have given up on them(no responses). My questions are: 1. Why does the googlebot come to my site several times a day but my software says it leaves w/o spidering my site. There is no robots disallow text or anything like that on my site. My other site is being crwled & updated daily but the one in question is being "bumped into" and leaving, last cack=he is over a month old. 2. I have both of my sites set up with site maps, HTML text links in the navigation bar, breadcrumbs and small textlinks on the footer of every page that duplicates the left navigation, image alt tags have the product names, is this to many links? I just removed them from the site that sn't being freshly cached as an experiment. Any suggestions?
Thanks!
Chuck

BallochBD

10:16 am on Mar 26, 2004 (gmt 0)

Hi Ronin! Welcome to Webmaster World.

Don't try too hard with this. It seems that Google has developed a major problem in crawling sites at the moment. No one from Google has actually said so but there are so many people affected it is hard to conclude otherwise.

My own site has not been crawled properly for weeks and despite numerous pleas to Google they have did nothing about it. I think my symptoms are similar to most of the others. For example, this morning Googlebot came along, looked at my robots text and index page then left. Despite these random, brief visits I still have no PR, title, cache or descriptions on my pages.

No one knows what is going on and Google have not been tempted to comment so just don't bust a gut on this one.

Schneewittchen

10:39 am on Mar 26, 2004 (gmt 0)

it looks like a dayly downgrade on google. firt G have over 100.000 sites in cache, then 40.000, later 20.000 and now less then 10.000. the index and mainpage is not longer listet. if i search for "mysitename" i found only spammer-results. (sites they are using my sitename and parts of content for redirects to his own page) what is going on....it hurts :-(

MrSpeed

12:49 pm on Mar 26, 2004 (gmt 0)

nkakar -
Why do you think it is necessary to link pages in this fashion within the site? It's not natural at all.

BallochBD

1:20 pm on Mar 27, 2004 (gmt 0)

Can anyone help me with this? I noted today that Googlebot visited my site and I found the following two entries in the logs.

"GET /robots.txt HTTP/1.0" 200 23 www. mydomain.co.uk "-" "Googlebot/2.1

"GET / HTTP/1.0" 200 44320 www. mydomain.co.uk "-" "Googlebot/2.1

I don't know a lot about how to interpret these results. Does this second entry (with no files specified) signify a deep crawl or whatever?

BallochBD

1:31 pm on Mar 27, 2004 (gmt 0)

It's OK - I just figured it out myself. (Not a deep crawl unfortunately!)

Schneewittchen

1:36 pm on Mar 27, 2004 (gmt 0)

@baloch
is see this 2 times a day in my logs...but never an updated version in serps. i have lost 100% of google-traffic, my site is no longer listet....only old html-files like www.domain.com/anything.html. PR is still the same (6) no index-file,no description, no title, no cache, no deepcrawl... i see only G-freshbot get the index (with frameset) and robots.txt, then leaves.

BallochBD

2:03 pm on Mar 27, 2004 (gmt 0)

Anyone else seeing this?

steve40

2:25 pm on Mar 27, 2004 (gmt 0)

Hi all,
Just thought would share my similar experiences with sort of timeline on one of my sites similar probs

1 launched new site ( 6 weeks all pages indexed in google ) appearing in serps ok

2 added classified adds software ( still ok but some url's appearing no title or description

3 added click tracking via php mysql ( over next 2 months no title or description and no google referals )

PR also went down pr4 to pr1 ( reason i thought was some sort of penalty )

Like the fool i am decided must be some sort of penalty and just left domain sitting in limbo for 3 months before relooking at

this is what i then found when investigated and tried some stuff
took classified adds software off
stopped using PHP MYSQL clicktracking

over next 6 weeks all pages back in google and appeared back in serps
my own view is it was php clicktracking that stopped google indexing but not 100% sure on that
don't know if the above is relevent to any of you guys but just my own experiences

steve

This 182 message thread spans 7 pages: 182