Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google not finding internal links from a simple home page

         

fish_eye

3:33 am on Mar 22, 2006 (gmt 0)

10+ Year Member



I created a site for a client before Christamas. It basically has a nice background, a logo and six buttons - no other text (other than meta etc).

Google has had the home page in its index since inception but has not indexed any of the site's other pages.

I'm using simple CSS buttons no flash or whatever..... is it simply that it looks to them like I have no content or something?

Y! etc etc have all pages indexed.

PS. Someone was told about and linked to the URL before the URL was active.

roseplant

4:02 am on Mar 22, 2006 (gmt 0)

10+ Year Member



Try creating a simple HTML sitemap and linking to it from the homepage. No more than 2 minutes work, guaranteed to work.

fish_eye

5:20 am on Mar 22, 2006 (gmt 0)

10+ Year Member



... but .... going on my documented experience .... G would not find the simple HTML site map ...

Thanks anyway.

PS. Perhaps I should have also mentioned that my simple CSS buttons are just that: simple HTML divs with classes (with background images).

PPS. The home page is effectively a site map. The site only has six pages (+ home). The home has 6 buttons, each button:

<div id="index-main">
<div id="index-body">
<div id="buttonLeft">
<div id="button1">
<div id="logo1"></div>
<div class="button">
<div class="button-text"><a href="site/about-us.php" title="Find out more about us, our people, our philosphy">about us</a></div>
</div>
</div>
<div id="button2">
<div id="logo2"></div>
<div class="button">
<div class="button-text"><a href="site/location.php" title="Where you can get our services">locations</a></div>
</div>
</div>
etc etc

Is this too complicated?

Maybe they don't like my spelling ;)

ronburk

5:31 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



More info might be helpful.

The home page PR is? Was the website sans internal links for a long period of time? When did Googlebot last fetch the home page and produce a 200 response? Finally, did Googlebot ever get an error fetching the home page and, if so, when?

is it simply that it looks to them like I have no content or something?

Posting the exact form of the code you use to link to internal pages from the home page might be helpful.

ronburk

5:34 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, you slipped the code in when I wasn't looking. I don't see a problem there.

Have you cut and pasted the URLs directly out of the code into your browser (with the requisite prefix) to make sure there's no misspelling?

Have you looked at the raw logs to confirm that Googlebot has not attempted to fetch these other URLs?

ronburk

5:41 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And, have you run the home page through the W3C validator (if you have no better validator on your desktop) to look for HTML errors so stupifying that they might derail Googlebot's parser?

fish_eye

5:50 am on Mar 22, 2006 (gmt 0)

10+ Year Member



It only has a PR1 - but I have other sites where that's not been a problem (in fact I have fully indexed PR0s).

As far as cut and paste... surely if the buttons work there should be no problem? I tried as suggested and there's no problem there.

I'll go back and check logs (when I get to a machine with less rigid firewalls!).

I may add a "deep" link from another site and see if that makes a difference.

This is the first time I've made a "splash" type home page / minimilist content home page with CSS and I thought maybe this was a problem.

fish_eye

5:53 am on Mar 22, 2006 (gmt 0)

10+ Year Member



have you run the home page through the W3C validator

Yes indeed I have (I used these to help teach me CSS!:)) and XHTML 1.0 trans and CSS fully passed. I was tempted to proudly display the images but it kind of wrecked the impact of the graphics.

ronburk

6:05 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(in fact I have fully indexed PR0s).

Me too. I also have a PR0 that Googlebot can go months without checking, presumably because it sat unchanged for a long time, and Google kept adjusting the frequency downward.

In fact, just to refresh my memory, I went to look. Googlebot visited this 1-page site on:

2005/05/04
2005/07/01
2005/07/15
2005/09/10
2005/09/29
2006/01/04
2006/03/13
2006/03/18

I expect Googlebot to return by July :-)

If it had internal pages that also had sat there unchanged for a long time, I would expect their Googlebot frequency to be even lower.

colin_h

6:10 am on Mar 22, 2006 (gmt 0)



On splash page introductions I used to always offer an all text alternative. In UK it is now law that your website should be equally accessible to blind browsers and this text page can both cover this and give the valuable html links to you back pages that you require for SE direction. So as not to make the front page confusing, make the first graphic that is downloaded your link to your DDA page, using ALT Text Descriptions like "A text version of this widget information website is available here". If you use your main keywords in this description somewhere your page relevancy will go up a bit also.

The sites that I have built this way have had no difficulty getting picked up and usually get a PR3 within 6 months.

I hope this makes sense, I'm bleary eyed at this time of morning.

Best of luck

Col :-)

ronburk

6:13 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(when I get to a machine with less rigid firewalls!)

I place much more faith in the raw logs when debugging a problem, but perhaps your reporting software will simply tell you

a) have there been any failed page fetches at all and by whom?

b) what IP addresses have fetched your internal pages?

In both cases, looking for 66.249.6?.* and presuming that to be Googlebot.

Another sanity check is to Google for:

site:www.yerdomain.com the

or some other common word known to be on the internal pages.

Which brings me to:

has not indexed any of the site's other pages.

What did you base that assertion on?

fish_eye

6:22 am on Mar 22, 2006 (gmt 0)

10+ Year Member



has not indexed any of the site's other pages.

What did you base that assertion on?

site:mysite.com -jkhsklj

and also searches on some pretty weird and unique combinations.

charliecon

8:15 am on Mar 22, 2006 (gmt 0)

10+ Year Member



Hi

My site is the exact same. 6 months old and a site:www.mydomain.com only shows the home page indexed.

All I have is a simple home page with some links to other pages.

For a while I could see the other pages indexed on BigD but that is no longer the case.
My server logs show Googlebot hitting robots.txt and Sitemap.xml daily and sometimes some of the other pages

I rank on page 1 of the serps for certain keyword combinations.
Yesterday I put in a 301 redirect from non-www to www and am hoping that this might correct it

whatcartridge

9:07 am on Mar 22, 2006 (gmt 0)

10+ Year Member



I have started many sites since last year. I have to say that Google has been crap at indexing nearly all of them. Keep in mind there are no dupe pages etc. I always strive for W3C validation on pages I put up too.

The old Googlebot used to rip through and index really well, but the combination of Bigdaddy and Mozillabot has made Google's indexing grind to a halt, they are getting worse than Yahoo. Even my old PR6 site has heaps of new pages which haven't been indexed (new pages on that site used to get crawled and indexed within 48 hours).

I have tried lots of things - Google Sitemaps, submitting URL's to Google's submit URL page, putting more links to individual pages out there, just about anything to help Google 'find' new sites and pages. Nothing gets indexed though, even though the crawler is a regular fixture on the site.

Maybe things will improve once Bigdaddy has settled in? MSN has no problems crawling and indexing new pages, usually within a week.

fish_eye

11:15 am on Mar 22, 2006 (gmt 0)

10+ Year Member



Been a bit of a mixup at my virtual host and I don't think I'll get logs for a few days (and no history).

I can tell you (from web stats) there have 17 hits and 7 robots.txt this month from G the most recent was on index.php about 4 hours ago.

I have no google ads so I can assume these are real hits.

Curiously though they are asking for "/index.php" not "/". I shall have to look at my external links.....

g1smd

8:09 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Crawl your site using Xenu LinkSleuth but do it twice, once starting at domain.com and then again starting at www.domain.com and see if you can access the whole site (if you used relative links then you will be able to do that).

The fix is to either

- add the <base href="http://www.domain.com/"> tag to every page, and set up a 301 redirect from non-www to www (all internal links should then START with a / and count from the root) OR

- to hard code every link on every page with the full domain-and-page URL (this latter option will increase bandwith quite a lot, and should be avoided).

roseplant

6:11 pm on Mar 24, 2006 (gmt 0)

10+ Year Member



"- to hard code every link on every page with the full domain-and-page URL (this latter option will increase bandwith quite a lot, and should be avoided). "

g1smd I was just about to suggest that before I read your post. But I wasn't aware that it would increase bandwidth (besides that obvious fact that all the pages will be crawled rather than 1). Could you explain why please?

g1smd

8:42 pm on Mar 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bandwidth increases, because every link on your site to every page, every image, every CSS file, and every Javascript file, will include these extra characters http://www.your-domain-name.com/ within it.

Adding the <base href="http://www.your-domain-name.com/"> tag to every page just once, does the same job but using a lot less characters.

steveb

8:52 pm on Mar 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a incredibly trivial increase in bandwidth to hard code links. It's truly horrible to suggest that it is any sort of meaningful bandwidth burden, or that it should be the slightest concern, when in fact the opposite is true.

Links to pages should always be full path for any small to middling site. Links to images can be relative, but there is no reason to create indistinct structure and a pile of reasons not to use relative links.

fish_eye

1:59 pm on Mar 25, 2006 (gmt 0)

10+ Year Member



a pile of reasons not to use relative links

Can you please point me to a post about this or let me know the top ones (5)?

Thanks, Sam.

PS. I have the logs (at last) but have not gone through them yet - also I have checked external and internal links and my .htaccess etc (in case I had accidentally put it in a ErrorDocument directive) and have no idea why G would be hiting my /index.php directly (rather than just the root).

I should add that I like relative links as it makes testing easier - for me anyway....

...and my IP gets blocked by my site if I run Xenu (just kidding - sort of) as it does not obey robots.txt.... but seriously - it's a 7 page site (and I use htaccess to add a www)

fish_eye

2:22 pm on Mar 25, 2006 (gmt 0)

10+ Year Member



The bot has visited 6 times this month.

1) Once it's looked just at /robots.txt

2) 3 times it's looked at:
* /robots.txt and /index.php

3) Twice it has looked at:
* /robots.txt, the root (just /) and the sub pages which are 5 * /site/somepage.php and one /contact/

The IPs are all 66. and one has occurred twice, once in the 2nd category and once in the 3rd. The order has been 1, 2, 2, 3, 3, 2.

I'm not a real bot watcher but it looks a little odd to me - as if there are two separate strands for different reasons.

I added a deep link from another site but have not checked to see if that page has been spidered yet.

steveb

5:26 am on Mar 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The simple reason of detering content theives is way more than enough.

glengara

12:08 pm on Mar 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



*Can you please point me to a post about this.."

There was a thread in which GG expressed a preference for the use of absolutes for maximum simplicity for the spiders, can't find it now though.

On the CSS thing, check the text-only version of the cache to see if those links show up, there was an strange thread a while ago claiming some CSS dropdowns weren't showing up in the text-only cache.....

charliecon

11:35 am on Apr 2, 2006 (gmt 0)

10+ Year Member



from March 22nd

Hi
My site is the exact same. 6 months old and a site:www.mydomain.com only shows the home page indexed.

All I have is a simple home page with some links to other pages.

For a while I could see the other pages indexed on BigD but that is no longer the case.
My server logs show Googlebot hitting robots.txt and Sitemap.xml daily and sometimes some of the other pages

I rank on page 1 of the serps for certain keyword combinations.
Yesterday I put in a 301 redirect from non-www to www and am hoping that this might correct it

---------------------------
Update 2nd Apr. Default Google now shows 4 pages of my site. I did a 301 on March22nd.Yesterday I ran Linksleuth and fixed one outbound link and I also had a broken link in Sitemap.html(Not my Google sitemap.xml) Hopefully this will help me in the Serps

g1smd

2:28 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The combination of <base href="http://www.domain.com/"> and <a href="/some.folder/the.page.html"> makes the linking absolute.

It is impossible to navigate to the non-www version of any of the pages from within such a site.

arikgub

3:04 pm on Apr 2, 2006 (gmt 0)

10+ Year Member



I don't think the problem is with internal linking structure.

I have a site which is few months old and although Googlebot has crawled many its pages with 200 response, only few of them (4 out of ~200) are indexed.

Until I got an incoming link from PR6 site, I had only the home page in the index. Today, the 4 pages that are indexed are the only pages that have inbound links from external sites.

It may indicate that G gives higher priority to pages having external links and those that don't have just can wait....

g1smd

3:25 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Run Xenu LinkSleuth over your site and study the report it generates very closely.