homepage Welcome to WebmasterWorld Guest from 54.204.128.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 182 message thread spans 7 pages: 182 ( [1] 2 3 4 5 6 7 > >     
Googlebot not crawling
Seeks index page, then leaves
feeder




msg:185700
 6:12 am on Feb 12, 2004 (gmt 0)

Googlebot visits often. It requests the index page, but doesn't crawl any deeper. This happens two or three times a day.

The MediaBot crawls deeper into the site without issue. The site runs AdSense.

Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.

Any ideas?

 

ThomasB




msg:185701
 11:48 am on Feb 12, 2004 (gmt 0)

feeder, since when do you see that? I've a site which is public since 10 days where I see the same. Some other domains get about 100 GB hits/day which is also just a bit of the normal crawling.

vrtlw




msg:185702
 11:55 am on Feb 12, 2004 (gmt 0)

It may be worth trying to navigate the site using a browser like lynx.

tantalus




msg:185703
 1:00 pm on Feb 12, 2004 (gmt 0)

Is your index page returning a 304 status (not modified since). I don't think google will go deeper anymore if it recieves this status.

pardo




msg:185704
 1:28 pm on Feb 12, 2004 (gmt 0)

Is your site new or did you recenlty moved to another host?

tantalus




msg:185705
 1:34 pm on Feb 12, 2004 (gmt 0)

No to both.

Gbot would come on a regular basis to my index page and recieve a 304 status and then go away again like clockwork.

Added a new link and some minor changes to my index and gbot crawled but only one level deep.

Waiting for G to crawl the the rest of the pages the next level down.

walrus




msg:185706
 4:20 pm on Feb 12, 2004 (gmt 0)

Its normal for it to grab the index one day and then usually return a week or two later and spider all.
If its been going on longer than a month than
you can be sure theres something wrong.
Im no expert but this has been par for the course in my logs for a year.

johnlim




msg:185707
 2:07 am on Feb 13, 2004 (gmt 0)

I also have a site, the googlebot visited the sites and get about 80 pages then it refrain from to deep crawl.

Now every day googlebot just go to the homepage and then leave.

I cannot understand why the googlebot don't continue to crawl the left hundreds of pages......

wkitty42




msg:185708
 2:16 am on Feb 13, 2004 (gmt 0)

do you have a sitemap page linked to your main page? if not, i'd do it... and include a link to it from every page, too...

feeder




msg:185709
 2:25 am on Feb 13, 2004 (gmt 0)

Thanks.

As I say, other spiders crawl it fine. It isn't a link issue. Googlebot visits two or three times a day, grabs the first page, leaves.

Yes, the site is new.

HarryM




msg:185710
 2:30 am on Feb 13, 2004 (gmt 0)

You don't say what PR your index page has. If it's not very much that will discourage Google from going any deeper.

As tantalus said, the other problem could be if your index page isn't modified. My solution is to add a link from the index page to any new pages. The next time Google visits it sees the index page has been modified and usually within a few hours has come back to take the new pages. I remove the links when the toolbar PR for the new pages show white.

I usually get about 70% of my pages indexed every week, but I don't know whether this is because of my PR or because Google perceives the site as active.

Harry

feeder




msg:185711
 2:37 am on Feb 13, 2004 (gmt 0)

I have many sites. I've never had any problems getting them deep crawled.

This is the first time I've seen this type of behavior. I wondered if it is something other people are seeing, or it it's just me.

Stefan




msg:185712
 3:38 am on Feb 13, 2004 (gmt 0)

Just speculating...

Googlebot crawls higher PR pages with a greater frequency than lower PR pages. If your new site has an index page PR of 4, and inner pages that are less than that, then the bot won't hit the inner pages very often.

On a personal note: I got back to the internet several days ago after having been in places where digital, at best, means counting on your fingers. I missed Austin entirely... it didn't seem to make a lot of difference for us, but we have serps for some minor kw combos bouncing in and out of the serps with every other search. I can either dig into the WW archives for Austin, to figure this out, or just assume that Google has become slightly schizophrenic, (not that there's anything wrong with that). Anyone who feels like giving me a quick run-down on what happened gets a free underground tour in Jamaica.

blakekr




msg:185713
 3:53 am on Feb 13, 2004 (gmt 0)

I launched two sites before Austin that were deep crawled and ranked. I launched one that was "due" to be listed/ranked right around Austin, but Google won't touch it.

The previous sites are not being recrawled or updated either. It's driving me crazy, but you're not alone.

UK_Web_Guy




msg:185714
 6:28 pm on Feb 13, 2004 (gmt 0)

feeder

Wjat is your sites PR?

Need3lives




msg:185715
 7:19 pm on Feb 21, 2004 (gmt 0)

You are not alone - a new site I launced about 8 days ago is seeing the same thing - Index page spidered and indexed for over a week, but no additional pages spidered or added... My site includes a site map, linked to from all pages, as well as only using basic text links to link to most pages off the index as well. It is a new site, about 60 pages so far, just index.htm listed in Google. And I am getting a fresh date for the index as well - last visit according to Google was the 19th.

barronbali




msg:185716
 8:18 pm on Feb 21, 2004 (gmt 0)

Mine has crawled by google every 2 or 1 day, but only the index page, even i changed other page, Google still crawl only my index page. Also on serps, i got 1rst and 5th position but for the last 4 update it gone. not even appear until page 6.

Hissingsid




msg:185717
 8:43 pm on Feb 21, 2004 (gmt 0)

Hi,

I'm seeing this also. Visits once a day picks up robots.txt and index and then leaves. This is a relatively new thing (2 or three weeksI havn't had time to plough back through my logs) and coincides with the sites in question being dropped from SERPs.

I worried for a while that this may be caused by a poison word. I have a folder called redirects with pages that do a meta refresh to an outside page. Perhaps redirects is a poison word. I disallowed this for a while in robots.txt. I've just gone and stripped this down to just one line.

User-agent: *

And I've changed the home page and that directory name to something less obvious. I then spent much of yesterday submitting pages that have a link to this domain to google submit in the vain hope that Googlebot might follow the backlinks and think "this sites worth crawling". PRs low only 3 but pages from this site were previously #1 for what I would call secondary three word terms.

Best wishes

Sid

MrSpeed




msg:185718
 3:44 pm on Mar 3, 2004 (gmt 0)

Has anybody had any luck getting crawled?

I created a new site almost a month ago and linked to it from a few PR4 sites. Same story, index page get's visited every day and appears in the index. Google hasn't crawled any deeper though.

feeder




msg:185719
 8:41 pm on Mar 3, 2004 (gmt 0)

feeder - Wjat is your sites PR?

I'm not sure how that's relevant. New sites don't show PR, but that doesn't stop them getting crawled.

The site has strong inbound linking.

Update: the site has been crawled, and pages included in the index. Googlebots behaviour hasn't changed, however. It arrives, grabs the index page, leaves. Once in a blue moon it will crawl half the site.

Odd.

subway




msg:185720
 10:09 pm on Mar 3, 2004 (gmt 0)

I've noticed the very same thing with sites from PR4-PR5, new and old - but mostly new. Google grabs the robots and index page and then leaves. This goes on for weeks.

I've noticed this ever since the Florida update. GB seems to be much much slower at crawling whole sites these days.

metagod




msg:185721
 10:56 pm on Mar 3, 2004 (gmt 0)

i have a theory that google won't deepcrawl your website unless your sub pages have incoming links from external sources, like a different ip, different domain... only then will your page be worthwhile crawling..

do you concur?

MrSpeed




msg:185722
 11:58 pm on Mar 3, 2004 (gmt 0)

do you concur?

No.

steveb




msg:185723
 12:15 am on Mar 4, 2004 (gmt 0)

"do you concur?"

Nope.

HarryM




msg:185724
 12:26 am on Mar 4, 2004 (gmt 0)

I'm not sure how that's relevant.
re site PR.

i have a theory that google won't deepcrawl your website unless your sub pages have incoming links from external sources

My take on this question is that these are two aspects of the same issue. PR is very relevant.

The number of pages Google crawls is probably proportional to the PR value of the index page. If the index page is PR0 because Google had not yet determined its appropriate value, then there is little chance of being crawled.

Even when the index page is more than PR0, as Google penetrates deeper the PR decreases and at a certain point Google stops crawling. The presence of deep links boosts the PR for those pages and so Google continues crawling. However once the index page becomes a reasonable value the presence of deep links is not so necessary, at least as far as crawling goes.

I suspect another factor is that if Google does not discover any changed or new pages part way through its crawl then it abandons the crawl.

All IMHO, and I disclaim any responsiblity for being wrong. :)

Harry

feeder




msg:185725
 12:34 am on Mar 4, 2004 (gmt 0)

You're wrong :)

One link can get a new site crawled, no problem.

HarryM




msg:185726
 12:42 am on Mar 4, 2004 (gmt 0)

One link can get a new site crawled, no problem

How many pages are we talking about here? 10? 100? 1000? And does Google return?

HarryM




msg:185727
 12:46 am on Mar 4, 2004 (gmt 0)

Hey, who cares about Google! I've just made Preferred Member. So who's going to offer me a beer?

metagod




msg:185728
 2:13 am on Mar 4, 2004 (gmt 0)

feeder, if you guys even cared to read through my post properly you would see that I said an EXTERNAL link... without the external link your page is not worth viewing because no-one else is voting for your page...

do you concur now?

oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...

feeder




msg:185729
 2:23 am on Mar 4, 2004 (gmt 0)

feeder, if you guys even cared to read through my post properly you would see that I said an EXTERNAL link... without the external link your page is not worth viewing because no-one else is voting for your page...do you concur now?

No :)

I've been a full-time SEM since 2001, I know what a link is and how Google crawls. My question relates to a recent CHANGE in crawl activity. As I said in my posts, the linking structures of the site in question, both internal and external, are strong.

oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...

I wasn't talking to you, I was talking to Harry :)
I was pointing out I thought he was wrong in this instance. I know this because I have access to countless clients sites logs that demonstrate otherwise. I can, and do, get sites crawled with one inbound link to an index page.


This 182 message thread spans 7 pages: 182 ( [1] 2 3 4 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved