getting google to index whole site

Forum Moderators: open

Message Too Old, No Replies

getting google to index whole site

can't get google to see other than one page

favedave

10:35 am on Nov 3, 2003 (gmt 0)

Google has crawled me for a month now, several times a week.

I'm listed in the searches, but it has not indexed any other page then my index page.

How do I get it to follow the links on my index page to the rest of my site?

Do I need robot text (I have none - I don't know how)or some other headers?

When I go to W3C to validate my html I get "Fatal Error: No DOCTYPE specified!"

I use Adobe GoLive and have no idea how to add this doctype thing. It's a freaking web page - that's what document type it is!

Help pelase.

Thanks

Brett_Tabke

11:29 am on Nov 3, 2003 (gmt 0)

> follow the links

Double check your links to see that they are of standard nature and fairly bug free. Run the page through the W3C validator and fix up any major problems. It could be a case of the spider not being able to read the links on the page.

> doctype thing

There are about 20 different document types :-) (just pick one as default off the drop down there at the w3c).

>Do I need robot text

No. robots.txt are used to BLOCK parts of your site from being indexed.

Lastly, build more content on your site and make sure it is all well interlinked with itself. Then go get more inbound links. Google will not index a site that does not have inbound links.

tigger

11:56 am on Nov 3, 2003 (gmt 0)

you could also try using a site map from index page that should help get the site fully crawled

favedave

9:54 pm on Nov 3, 2003 (gmt 0)

I use Lynx to verify it and it sees all the links okay. I also used the link checker at W3c and all the links are good. I checked down 3 levels recursively.

It's only a 7 page site.

I also picked a doc type (html transitional) and got 45 errors. That's a useless thing. I mean, my site reads fine on all browsers on PC and Mac that are fairly recent. I don't care about cell phones, etc. The W3C validator looks like a useless geek toy to me. For people like me, we just use Adobe GoLive and will NEVER EVER go through and tweak a site to make the W3C happy. As long as my customers are happy that's all I care about.

anyway, thanks for the advice - but I still don't see why google won't index my measly 7 pages when it's updated the index page quite a few times.

athinktank

10:02 pm on Nov 3, 2003 (gmt 0)

Personally, I have noticed that the strength of the site (one could say that PR is a guage of this) will determine how much and how fast googlebot looks at pages. I have a pr2 site that google almost never goes into deeply. I have been waiting for almost 3 months for google to index the site. However, i have other sites that are *stronger* that get read and indexed at quite a quicker clip.

How old is your site? Sometimes this takes months. I would do as Brett suggest and 1) get inbound links 2) make sure that your navigation is clean.

jaylark

10:39 pm on Nov 3, 2003 (gmt 0)

What does the search engine spider simulator say? Can search engines see your links? [searchengineworld.com...]

Maybe some of your errors are preventing the search engine from getting to your other pages.

For me, Google crawled my entire site within two weeks of me putting it up. After the initial crawl, I made some changes to the underlying pages and Google didn't pick this up for three months even though it checked my index page (PR 4) almost every day.

Stefan

10:41 pm on Nov 3, 2003 (gmt 0)

The W3C validator looks like a useless geek toy to me.

It isn't. Your code might render in IE but be so messed up that a bot can't crawl right through it. You have to validate the important pages, at least.

Using W3C validators, and tag checkers, will ensure that your site works for every bot, and on every computer, and every browser, (and you might eventually feel embarrassed about your quoted statement above).

aus_dave

11:23 pm on Nov 3, 2003 (gmt 0)

Stefan is spot on, the W3C validator taught me as much about writing HTML code as WebmasterWorld :).

Out of interest, a site I set up recently had its 100 or so pages indexed in Google within 17 days of it going live. It had inbound links from a few sites of PR 5-7.

In comparison, another site I haven't found any links for yet that was submitted a few weeks earlier has its index page dropping in and out of Google at the moment.

Stretch

11:42 pm on Nov 3, 2003 (gmt 0)

A couple of weeks back I launched 2 new sites, both similar size and similar construct.

Both got a link from the same PR6 site. One got spidered in it's entirety and was in the SERPS in 2 days, the other is getting the index page read by google every day (sometimes twice per day) but no other pages.

Both sites have been successfully spidered by other bots and both validate.

I'm hoping G will choose to take a look around the second site soon. I'm pretty sure there's nothing wrong with it so I figure it's just a matter of time.

Maybe I'm wrong...

Stretch.

Stefan

11:53 pm on Nov 3, 2003 (gmt 0)

Yeah, Stretch, Google works in mysterious ways...

I didn't mean to say that I thought favedave's problem was necessarily bad code... I just couldn't let the comment about W3C validation slip by. Like aus_dave, it's been a great help to me. You have to figure the first thing you do when you're concerned about googlebot not listing internal links past the index is validate the code.

I'm, of course, not inviting people to start validating my entire site on a voluntary basis... :-)

Stretch

12:13 am on Nov 4, 2003 (gmt 0)

>> I didn't mean to say that I thought favedave's problem was necessarily bad code

And I didn't mean to imply you were saying that :-)

I absolutely agree that validating is a solid starting point. I guess the point of my post was that sometimes everything can be right a G just won't bite (at least, that's true in my experience).

And I can assure you I won't be sending out validation invites either!

Stefan

12:35 am on Nov 4, 2003 (gmt 0)

If favedave checks back to read this thread, then Brett's advice is the plan.

Stretch, cool opportunity to watch Google at work... the two sites online at the same time, same backlinks. Maybe Google liked the one site more than the other, maybe it's digital laziness... maybe they just do things like that to keep us confused :-)

favedave

5:36 am on Nov 4, 2003 (gmt 0)

Thanks for all the help.

The links in the pages are fine.

Every page links to every other page on my site (only 7 pages).

There are about 20 inbound links, with more coming.

The site is only 6 weeks old.

The DMOZ listing isn't up yet, but I'm listed in Yahoo and other search engines as well (it's a site for an independent movie). The IMDB lisitng will be up in a few weeks, too. That will link to me.

The site works fine with latest versions of:
IE
Netscape
Opera
Safari

on Windows and Mac (except Safari which is Mac only). obviously.

I've tried various robot emulators, and I hope the real robots don't do what the emulators do, which is list the alt tags on the images first on the page.

I'm not a web programmer, I simply maintain the site and tweak it. So I'm not messing around with code that works fine in order to make W3C happy. And I can't afford to pay someone to do it.

Why should I when it works fine on all the browsers and platforms I mentioned above?

And we show up in the first page when you search on google for our keyword, so that's cool.

Stretch

10:04 pm on Nov 7, 2003 (gmt 0)

Thought I'd post an update to message # 9.

After posting, I launched a third site with a link from the same PR6 site. The new site got spidered by G and listed in the SERPS in two days.

This got me worried about the site that wasn't being spidered so I began to think about how I could change the site layout to encourage spidering. But, before doing anything drastic, I checked the logs and surprise, surprise - today G has spidered the entire site.

I have a few ideas about why this site took longer but in the end it was just a matter of time. Thankfully :-)

Stretch