Googlebot not crawling

Forum Moderators: open

Message Too Old, No Replies

Googlebot not crawling

Seeks index page, then leaves

feeder

6:12 am on Feb 12, 2004 (gmt 0)

Googlebot visits often. It requests the index page, but doesn't crawl any deeper. This happens two or three times a day.

The MediaBot crawls deeper into the site without issue. The site runs AdSense.

Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.

Any ideas?

johnlim

9:13 am on Mar 9, 2004 (gmt 0)

hi,

How to differenciate "freshbot" and "deepbot"?

Before, if IP is 64.x.x.x then it is freshbot;
if IP is 216.x.x.x then it is deepbot.

Are these still valid?
What are the behaviour of "freshbot" and "deepbot"?

Will freshbot only get two or one level of pages in the sites while deepbot crawl almost all the pages in the site?

Schneewittchen

9:51 am on Mar 9, 2004 (gmt 0)

@johnlim

negativ. the last deepcrawl on my pages was 64.68.82.XXX

HarryM

11:47 am on Mar 9, 2004 (gmt 0)

A year or so ago the monthly deepcrawl would have taken all my pages. I never see that now, but (until last week) about 2/3 of my pages were taken every week.

So is there a deepcrawl these days?

allanp73

11:54 am on Mar 9, 2004 (gmt 0)

I also noticed that Google is slow to deep crawl new sites. Even when they have proper link structures. I have sites with four levels and Google doesn't seem to get past the first two (which would be the index and pages linked off the index page). It has been very slow about getting deep into the site.

tamborine

11:58 am on Mar 9, 2004 (gmt 0)

HI I have had same exeperience as you, I have had my site for several years , it is a alexa top 10000 site, it is listed under the top 10 under a very popular keyword that gets over 3000000 searches a month, Google index 100% pages practically everyday, since feb04 this year only my homepage has been indexed on daily basis and not the new content listed on my site. I don't know why it is not indexing I don't spam them, I have very useful content on my site and I pay web writers alot of money to write original stuff. I'm totally puzzled, the only reason I can think of why google is not indexing all my pages is becausue I'm not a partner with google but overture.com, I don't know if they would favour sites that partner with them only and not others.

Duke_of_Url

7:48 pm on Mar 9, 2004 (gmt 0)

I've read plenty of posts from people waiting for gglbot to properly crawl past their home page on new sites.

I launched a site around 6th Feb (1280 static pages) then sat down to keep an eye on my logs for visits from googlebot. Time and time again it looked for a robots.txt (404ing as there isnt one) then would pick up index.htm, index it, and go no further into the site. This has happened nearly every day since 6 Feb. These visits by gbot are thanks to a small number of inbounds I set up to the site, from a few random locations on other sites I have.

Finally, today, after weeks of waiting, the bot had a deep crawl to not just home page, but levels 1 & 2 below, and had a really good look around. I've not made any changes to this site for a while, as I thought I'd hang fire to see if the thing ever did get crawled.

I'm mighty relieved to see it calling by, after all its over 1 calendar month now since launch. Quite when the results of its crawl will make it into the live index is anyones guess, but really I just wanted to reassure anyone who is worried that, assuming you have a few inbounds links set up, and easily navigable pages in place on your new site, you *should* see a deep crawl, just not straightaway.

Now it just needs to get my more recent sites on its radar! Time for a beer

DoU

mayor

8:20 pm on Mar 9, 2004 (gmt 0)

Appreciate your report, Duke. My logs aren't available until Midnight, but am getting the beer mug ready. Hope to see some G-Bot action like you did.

mr_strong

8:30 pm on Mar 9, 2004 (gmt 0)

Yeah, thanks for that Duke. I launched a site on the 15th Feb, so hopefully I won't have to wait much longer...

Beach_lady

9:35 pm on Mar 9, 2004 (gmt 0)

I had Googlebot come to visit three days ago and check just the home page. Today when I checked yesterdays logs it came back and had checked several more of the pages so maybe it is finally getting around to us little guys. I certainly hope so. As far as links go, even though I have been actively securing links to my site I guess none of that will show up until they do the PR update. I hope that is soon.

HarryM

10:31 pm on Mar 9, 2004 (gmt 0)

I have some php code which I include on new pages which sends me an email if Googlebot visits the page. Today Google revisited my index page and a group of recent pages which it had already visited several times, but from which I had not yet removed the code. But it completely ignored all my really new pages which were also linked from the index page.

I just do not understand what Google is doing.

Schneewittchen

10:48 pm on Mar 9, 2004 (gmt 0)

hey what is that? gbot deep-crawled my site at 7.3 with 60% of my new pages. since yesterday, really stupid?! google shows a cache of my site from january. hey, what is going on....first gbot is not deeper crawling than the mainpage, after 50 coffees and a lot of cigarettes gbot get a part of my new pages, then G switch to the oldest version he ever had....i dont know what G try to do, i only know...i need holidays after....no, i dont go to Florida ;)

SyntheticUpper

11:15 pm on Mar 9, 2004 (gmt 0)

It's always amusing to see the ebb and tide of member criticism move between celebrating Google, and chastising Google - depending upon personal experience.

Well, my 2 cents worth is this, Google deep crawls my site every day or so - so 'I'm alright Jack.'

But I still think their serps in certain areas are rubbish. And I think their crawlers are experiencing technical problems. Why crawl an index page, and not follow / index the links? It's just incompetence isn't it

ChrisKud5

8:29 am on Mar 10, 2004 (gmt 0)

this is most certainly a new little "feature" in gbot. I have a PR 7, the index pages get hit, no new links are followed, and no new pages are indexed.

The index page is updated all the time, new pages are added all the time to the site, and nothing is being crawled. This has been going on for a month now.

If gbot isnt following links, or indexing new pages, why use google anymore.

I can search through archive.org if i wanted to find old information with a lack of new content.

tribal

10:40 am on Mar 10, 2004 (gmt 0)

It does and will spider deeper in all your sites, seen it here a couple of times too. It just takes alot more time - depening on launch date and PR I suspect - then it used to to get an entire site indexed.

Maybe it's a problem, but it's more likely to be a feature. What about all those dynamically generated SEO pages full of spammy words? What if lots of those sites had a completely different link structure every time you would reload the site?
Maybe that's what the bot is supposed to do, see if a site really has a genuine link structure (I know of no legit sites in which ALL URLs change daily) by requesting the index file frequently during the first weeks. It could check links to other pages, and if more than 50% of the links have changed (just an example), it will flag the site as "SEO/spam" or something and not deepspider it. If it finds the site legit, then it will deepspider it after all.

Besides the fact I have no real proof for this theory, I'm pretty sure the people at Google are becoming more selective on what to index and what not to. They must be able to handle the expansion of the web, and at the current rate, it's going pretty fast. What if they run out of storage or processing power and are unable to keep upgrading with the fast pace of the webs growth? Even if my theory proves wrong, I know they are aware of this and will be taking action soon.

Johny Favourite

11:05 am on Mar 10, 2004 (gmt 0)

I've been wait since Feb 4 as well. I keep seeing a repertition of dates in people's comments. I don't know if that has anything to do with it?

Obviously I'm very new to all this but today I think google has deep crawled in a way. It went to every link on my homepage but no deeper. I just hope that at some point it actually looks at my sitemap, something I didn't have on my site until 4 weeks ago (I never knew how google worked and relied on a java menu to nav).

Prehaps one day my whole site will be spidered.

Schneewittchen

11:55 am on Mar 10, 2004 (gmt 0)

@tribal

i think your theory is plausible...but the only one i have changed since 1 month ist the position of the link to the other pages in the index.php, that CANT be a problem(?). next pages are permanently the same. it will be generated from a mysql database....always the same. my link- strukture ends after 3 clicks in a static html-file. G have spidered all these html-pages at march 7...i dont know why G show now only a cache version from january...my sitestructur is likely millions of other pages.

Moncher

3:02 pm on Mar 10, 2004 (gmt 0)

Google bot is not crawling my index page which has very good page rank.No fresh bot indication since 3 weeks.

the_nerd

4:01 pm on Mar 10, 2004 (gmt 0)

Did he awake?

after only eating robots.txt and root for about 2 weeks yesterday his appetite seemed to come back and he (she?) reluctantly took in some more. (on 2 relatively young pages with few backlinks, older site is crawled very deep on a more-than-daily basis)

rmjvol

8:04 pm on Mar 10, 2004 (gmt 0)

I'm finally seeing a bit of beyond index crawling on a new site. Just started the crawl in the past few hours after 3 weeks live.

Googlebot is both she & he.

Net_Wizard

1:13 am on Mar 11, 2004 (gmt 0)

64.68.82.xx is busy crawling all the 350+ pages of a fairly new site.

mayor

10:54 am on Mar 11, 2004 (gmt 0)

nerd >> reluctantly

Yes, that's the right word. "reluctantly".

Googlebot is reluctantly visiting my sites. Googlebot has become lethargic. Googlebot is, well, dragging it's butt!

What's the problem? Have we insulted the bot? Has someone abused the critter? Passed out too many tricks and not enough treats? Has Googlebot not been awarded the "crawler of the year" award at the Plex?

Lets all give Googlebot a morale boost. Tell Googlebot how he/she's been kicking Slurp's butt all over the web. Let's have it for our favorite bot! YEAH GOOGLEBOT! GO GOOGLEBOT, GO!

HarryM

12:19 pm on Mar 11, 2004 (gmt 0)

Earlier I posted that previously Googlebot would always index my new pages if linked from my index page, but this no longer seemed to be the case.

Just in case it was my own problem, I took a look at the links to my new pages. As the pages were in different encodings, I had used gifs so that the links would display correctly to all users. But I found I had inadvertently left the alt tags empty. I added some alt text, and bingo! Googlebot came along and indexed the lot. Perhaps Googlebot ignores image links which do not have an alt or title text.

However this could be just a coincidence. IMHO there has been a lull in normal Googlebot activity, possibly connected to the test spiders Google has been running, and this now seemes to be over.

stargeek

12:49 pm on Mar 11, 2004 (gmt 0)

I dunno, when comparing Slurp's ativity to googlebot's Slurp seems way more active, even the useless MSNbot is almost as active as google.
But Google did pick up two new pages this week, and indexed them the day after they were created. YAY googlebot! A for effort.

Duke_of_Url

1:40 pm on Mar 11, 2004 (gmt 0)

Just to follow up on my post of the 9th Mar, when I spotted a deep crawl for the first time on a site launched a month earlier, the site is today showing hits to the deep crawled pages for the first time today from .com and co.uk, and is ranking roughly where I'd hoped it would in the serps

DoU

Robert Scott

5:07 pm on Mar 11, 2004 (gmt 0)

Yesterday googlebot finally started deepcrawling pages created about 6 weeks ago. Prior to that the index page was getting hit daily, but nothing deeper.

needinfo

1:53 pm on Mar 12, 2004 (gmt 0)

Maybe Googlebot is a female and doesn't go all the way on a first date!

or second.. or third....

stripey

2:52 pm on Mar 12, 2004 (gmt 0)

nah, googlebot is male, it eats shoots and leaves :)

jimkrynn1

3:35 pm on Mar 12, 2004 (gmt 0)

so here's my problem...

googlebot comes in once or twice a day.

it hits robot.txt then index.html.

on the index.html page, however, it gets stuck in a "noscript" tag that redircts the robot to a "you need javascript" landing page.

I can't get rid of the javascript validation because my checkout procedure and shopping cart rely on it, but i'd like to have google crawl my site!

what should i do here?

SyntheticUpper

4:04 pm on Mar 12, 2004 (gmt 0)

jimkrynn1

What to do:

A lot of e-commerce software uses Javascript, and Google doesn't always follow these links (I know to my cost - my site does o.k., but many of the 'product detail' pages are only linked to by Javascript, and Google doesn't even know about them.) I could easily outrank my competitors if these detailed pages were included in the G index.

So, what to do is this:

Change your e-comm software - perhaps something to consider in the medium to long term.

But what you can do right now is 'hand edit' your index page after uploading, and make all those JS links nice simple HTML links that point to your product sections / individual product pages.

It's a bit of a pain to do, but if you save these links in a .txt file you can always cut and paste them in future to speed things up.

Best of luck ;)

Powdork

6:50 pm on Mar 12, 2004 (gmt 0)

Google bot is not crawling my index page which has very good page rank.No fresh bot indication since 3 weeks.

GoogleBot HAS been crawling my home page and more everyday but for the last two weeks I have not had a fresh tag although I always had one before. This started when the new crawler started coming around.

This 182 message thread spans 7 pages: 182