homepage Welcome to WebmasterWorld Guest from 54.211.219.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 182 message thread spans 7 pages: 182 ( [1] 2 3 4 5 6 7 > >     
Googlebot not crawling
Seeks index page, then leaves
feeder

10+ Year Member



 
Msg#: 21899 posted 6:12 am on Feb 12, 2004 (gmt 0)

Googlebot visits often. It requests the index page, but doesn't crawl any deeper. This happens two or three times a day.

The MediaBot crawls deeper into the site without issue. The site runs AdSense.

Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.

Any ideas?

 

ThomasB

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 11:48 am on Feb 12, 2004 (gmt 0)

feeder, since when do you see that? I've a site which is public since 10 days where I see the same. Some other domains get about 100 GB hits/day which is also just a bit of the normal crawling.

vrtlw

10+ Year Member



 
Msg#: 21899 posted 11:55 am on Feb 12, 2004 (gmt 0)

It may be worth trying to navigate the site using a browser like lynx.

tantalus

10+ Year Member



 
Msg#: 21899 posted 1:00 pm on Feb 12, 2004 (gmt 0)

Is your index page returning a 304 status (not modified since). I don't think google will go deeper anymore if it recieves this status.

pardo

10+ Year Member



 
Msg#: 21899 posted 1:28 pm on Feb 12, 2004 (gmt 0)

Is your site new or did you recenlty moved to another host?

tantalus

10+ Year Member



 
Msg#: 21899 posted 1:34 pm on Feb 12, 2004 (gmt 0)

No to both.

Gbot would come on a regular basis to my index page and recieve a 304 status and then go away again like clockwork.

Added a new link and some minor changes to my index and gbot crawled but only one level deep.

Waiting for G to crawl the the rest of the pages the next level down.

walrus

10+ Year Member



 
Msg#: 21899 posted 4:20 pm on Feb 12, 2004 (gmt 0)

Its normal for it to grab the index one day and then usually return a week or two later and spider all.
If its been going on longer than a month than
you can be sure theres something wrong.
Im no expert but this has been par for the course in my logs for a year.

johnlim

10+ Year Member



 
Msg#: 21899 posted 2:07 am on Feb 13, 2004 (gmt 0)

I also have a site, the googlebot visited the sites and get about 80 pages then it refrain from to deep crawl.

Now every day googlebot just go to the homepage and then leave.

I cannot understand why the googlebot don't continue to crawl the left hundreds of pages......

wkitty42

10+ Year Member



 
Msg#: 21899 posted 2:16 am on Feb 13, 2004 (gmt 0)

do you have a sitemap page linked to your main page? if not, i'd do it... and include a link to it from every page, too...

feeder

10+ Year Member



 
Msg#: 21899 posted 2:25 am on Feb 13, 2004 (gmt 0)

Thanks.

As I say, other spiders crawl it fine. It isn't a link issue. Googlebot visits two or three times a day, grabs the first page, leaves.

Yes, the site is new.

HarryM

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 2:30 am on Feb 13, 2004 (gmt 0)

You don't say what PR your index page has. If it's not very much that will discourage Google from going any deeper.

As tantalus said, the other problem could be if your index page isn't modified. My solution is to add a link from the index page to any new pages. The next time Google visits it sees the index page has been modified and usually within a few hours has come back to take the new pages. I remove the links when the toolbar PR for the new pages show white.

I usually get about 70% of my pages indexed every week, but I don't know whether this is because of my PR or because Google perceives the site as active.

Harry

feeder

10+ Year Member



 
Msg#: 21899 posted 2:37 am on Feb 13, 2004 (gmt 0)

I have many sites. I've never had any problems getting them deep crawled.

This is the first time I've seen this type of behavior. I wondered if it is something other people are seeing, or it it's just me.

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 3:38 am on Feb 13, 2004 (gmt 0)

Just speculating...

Googlebot crawls higher PR pages with a greater frequency than lower PR pages. If your new site has an index page PR of 4, and inner pages that are less than that, then the bot won't hit the inner pages very often.

On a personal note: I got back to the internet several days ago after having been in places where digital, at best, means counting on your fingers. I missed Austin entirely... it didn't seem to make a lot of difference for us, but we have serps for some minor kw combos bouncing in and out of the serps with every other search. I can either dig into the WW archives for Austin, to figure this out, or just assume that Google has become slightly schizophrenic, (not that there's anything wrong with that). Anyone who feels like giving me a quick run-down on what happened gets a free underground tour in Jamaica.

blakekr

10+ Year Member



 
Msg#: 21899 posted 3:53 am on Feb 13, 2004 (gmt 0)

I launched two sites before Austin that were deep crawled and ranked. I launched one that was "due" to be listed/ranked right around Austin, but Google won't touch it.

The previous sites are not being recrawled or updated either. It's driving me crazy, but you're not alone.

UK_Web_Guy

10+ Year Member



 
Msg#: 21899 posted 6:28 pm on Feb 13, 2004 (gmt 0)

feeder

Wjat is your sites PR?

Need3lives

10+ Year Member



 
Msg#: 21899 posted 7:19 pm on Feb 21, 2004 (gmt 0)

You are not alone - a new site I launced about 8 days ago is seeing the same thing - Index page spidered and indexed for over a week, but no additional pages spidered or added... My site includes a site map, linked to from all pages, as well as only using basic text links to link to most pages off the index as well. It is a new site, about 60 pages so far, just index.htm listed in Google. And I am getting a fresh date for the index as well - last visit according to Google was the 19th.

barronbali

10+ Year Member



 
Msg#: 21899 posted 8:18 pm on Feb 21, 2004 (gmt 0)

Mine has crawled by google every 2 or 1 day, but only the index page, even i changed other page, Google still crawl only my index page. Also on serps, i got 1rst and 5th position but for the last 4 update it gone. not even appear until page 6.

Hissingsid

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 8:43 pm on Feb 21, 2004 (gmt 0)

Hi,

I'm seeing this also. Visits once a day picks up robots.txt and index and then leaves. This is a relatively new thing (2 or three weeksI havn't had time to plough back through my logs) and coincides with the sites in question being dropped from SERPs.

I worried for a while that this may be caused by a poison word. I have a folder called redirects with pages that do a meta refresh to an outside page. Perhaps redirects is a poison word. I disallowed this for a while in robots.txt. I've just gone and stripped this down to just one line.

User-agent: *

And I've changed the home page and that directory name to something less obvious. I then spent much of yesterday submitting pages that have a link to this domain to google submit in the vain hope that Googlebot might follow the backlinks and think "this sites worth crawling". PRs low only 3 but pages from this site were previously #1 for what I would call secondary three word terms.

Best wishes

Sid

MrSpeed

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 3:44 pm on Mar 3, 2004 (gmt 0)

Has anybody had any luck getting crawled?

I created a new site almost a month ago and linked to it from a few PR4 sites. Same story, index page get's visited every day and appears in the index. Google hasn't crawled any deeper though.

feeder

10+ Year Member



 
Msg#: 21899 posted 8:41 pm on Mar 3, 2004 (gmt 0)

feeder - Wjat is your sites PR?

I'm not sure how that's relevant. New sites don't show PR, but that doesn't stop them getting crawled.

The site has strong inbound linking.

Update: the site has been crawled, and pages included in the index. Googlebots behaviour hasn't changed, however. It arrives, grabs the index page, leaves. Once in a blue moon it will crawl half the site.

Odd.

subway

10+ Year Member



 
Msg#: 21899 posted 10:09 pm on Mar 3, 2004 (gmt 0)

I've noticed the very same thing with sites from PR4-PR5, new and old - but mostly new. Google grabs the robots and index page and then leaves. This goes on for weeks.

I've noticed this ever since the Florida update. GB seems to be much much slower at crawling whole sites these days.

metagod

10+ Year Member



 
Msg#: 21899 posted 10:56 pm on Mar 3, 2004 (gmt 0)

i have a theory that google won't deepcrawl your website unless your sub pages have incoming links from external sources, like a different ip, different domain... only then will your page be worthwhile crawling..

do you concur?

MrSpeed

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 11:58 pm on Mar 3, 2004 (gmt 0)

do you concur?

No.

steveb

WebmasterWorld Senior Member steveb us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 21899 posted 12:15 am on Mar 4, 2004 (gmt 0)

"do you concur?"

Nope.

HarryM

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 12:26 am on Mar 4, 2004 (gmt 0)

I'm not sure how that's relevant.
re site PR.

i have a theory that google won't deepcrawl your website unless your sub pages have incoming links from external sources

My take on this question is that these are two aspects of the same issue. PR is very relevant.

The number of pages Google crawls is probably proportional to the PR value of the index page. If the index page is PR0 because Google had not yet determined its appropriate value, then there is little chance of being crawled.

Even when the index page is more than PR0, as Google penetrates deeper the PR decreases and at a certain point Google stops crawling. The presence of deep links boosts the PR for those pages and so Google continues crawling. However once the index page becomes a reasonable value the presence of deep links is not so necessary, at least as far as crawling goes.

I suspect another factor is that if Google does not discover any changed or new pages part way through its crawl then it abandons the crawl.

All IMHO, and I disclaim any responsiblity for being wrong. :)

Harry

feeder

10+ Year Member



 
Msg#: 21899 posted 12:34 am on Mar 4, 2004 (gmt 0)

You're wrong :)

One link can get a new site crawled, no problem.

HarryM

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 12:42 am on Mar 4, 2004 (gmt 0)

One link can get a new site crawled, no problem

How many pages are we talking about here? 10? 100? 1000? And does Google return?

HarryM

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 21899 posted 12:46 am on Mar 4, 2004 (gmt 0)

Hey, who cares about Google! I've just made Preferred Member. So who's going to offer me a beer?

metagod

10+ Year Member



 
Msg#: 21899 posted 2:13 am on Mar 4, 2004 (gmt 0)

feeder, if you guys even cared to read through my post properly you would see that I said an EXTERNAL link... without the external link your page is not worth viewing because no-one else is voting for your page...

do you concur now?

oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...

feeder

10+ Year Member



 
Msg#: 21899 posted 2:23 am on Mar 4, 2004 (gmt 0)

feeder, if you guys even cared to read through my post properly you would see that I said an EXTERNAL link... without the external link your page is not worth viewing because no-one else is voting for your page...do you concur now?

No :)

I've been a full-time SEM since 2001, I know what a link is and how Google crawls. My question relates to a recent CHANGE in crawl activity. As I said in my posts, the linking structures of the site in question, both internal and external, are strong.

oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...

I wasn't talking to you, I was talking to Harry :)
I was pointing out I thought he was wrong in this instance. I know this because I have access to countless clients sites logs that demonstrate otherwise. I can, and do, get sites crawled with one inbound link to an index page.


This 182 message thread spans 7 pages: 182 ( [1] 2 3 4 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved