Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is there a minimum size of content for googlebot to index?

         

kjmt

2:35 pm on Aug 19, 2006 (gmt 0)

10+ Year Member



I'm developing a name pronunciation site and am having an issue with googlebot indexing my pages.

A name pronunciation page for a name consists a relatively small amount of content-- something like:

Name: Stuart
Phonetic Pronunciation: STOO-urt
Audio Pronunciation: <Javascript button>

Googlebot looks at these pages but does not index them. Could the size of the content (# of bytes) be an issue, or should I be looking at other things for potential issues. I've reviewed Google's webmaster guidelines and think I've covered everything they noted.

Thanks,
kjmt

tedster

5:47 pm on Aug 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see all-Flash pages indexed where the html is next to nothing -- so I doubt that file size is the problem, although with Google I will never say "never". How about unique and page specific title tags and meta descriptions? Those two factors often seem to be critical these days.

ronburk

11:18 pm on Aug 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot looks at these pages but does not index them

Just to clarify, you're saying that your weblogs show, for example, Googlebot fetched www.yerdomain.com/something.html, and then several weeks later you perform a Google search of "site:www.yerdomain.com", and the URL "/something.html" appears nowhere in the results?

walkman

11:46 pm on Aug 19, 2006 (gmt 0)



I think G is looking at 1000+ pages with essentially indentical info and that maybe considered spam to them. If your main template is 10kb, and only 4-5 words are different from page to page, the pages are pretty much 100% idential.

Try cutting down on the template, and add some randon information to each page.

kjmt

4:13 am on Aug 20, 2006 (gmt 0)

10+ Year Member



Yes-- my weblog shows that googlebot looked at a page, but it does not get indexed at all when I search with site:mydomain.com

The current format of the site is many pages nearly identical with a few words changing per page as walkman pointed out.

I'm having the same problem with Yahoo/slurp and the MSN search bot that have indexed the site.

It sounds like I may need to explore a different format/template.

Quadrille

10:29 am on Aug 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember that Google looks at the code, not the visible page.

"Name: Stuart
Phonetic Pronunciation: STOO-urt
Audio Pronunciation: <Javascript button>"

will be a very small fraction of the code on any page, once site navigation, etc, are taken into account. To Google, it is unlikely the pages will appear unique - even with unique page titles and meta descriptions, which I am sure you have ;)

Also ask yourself how much use your style is to readers - it strikes me that a lot of clicking is required.

Why not use an alphabetical system, subdividing pages (Ab, Ac etc), as required. Or some other form of clustering your information.

You'll need to tinker with your database, but the results will be impressive, for visitors AND for SEs.

Halfdeck

7:39 pm on Aug 20, 2006 (gmt 0)

10+ Year Member



I've also created 10 words or less pages that Google indexed, but I still do not think having very little content on a page helps.

I believe Google's algorithm takes several factors into account before deciding to index or not index a page. Low word count, for example, may not be a death knell but it could be the straw that broke the camel's back. Imagine a point system, with each "problem" (i.e. identical description snippets, low word count, low PR) jacking up your score, till you reach a tipping point. I see that as a reason why some sites can get away with identical description snippets while another site finds itself waist deep in supplementals.

When working on a site, if something makes me wonder "maybe this is a problem" I put it on a todo list (and hopefully eventually get around to fixing it). That way, I can stop wondering.

[edited by: Halfdeck at 7:46 pm (utc) on Aug. 20, 2006]

kjmt

10:59 pm on Aug 20, 2006 (gmt 0)

10+ Year Member



Thanks for all the helpful suggestions!