Welcome to WebmasterWorld Guest from 54.227.1.130

Message Too Old, No Replies

How to get Google to crawl thousands of raw data pages?

     

ildarius

12:34 am on May 9, 2009 (gmt 0)

5+ Year Member



I have a large site (aproximately 35k) which is filled with raw scientific data.

I've already submitted several large gzip sitemaps, but the Googlebot is very slow to crawl the pages.

Can you suggest anything else? A linking strategy of some kind? Or may be a better linking structure of the site?

Swanny007

2:08 am on May 9, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



What do you mean by very slow? How long have the pages been live? If you have a text link to each of the pages in a hierarchy that makes sense, be patient :-) That's a lot of pages!

Another thing that comes to mind is how different are the pages from each other? Is much of the content the same and it's full of numbers? If so Google might have the idea that the content is too similar and could be bordering on duplicate content.

Robert Charlton

2:13 am on May 9, 2009 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Without getting into specifics...

It's possible that unless Google sees a recognizable content form or structure of some sort, they might not want to index the data.

Is this in fact data that's in a standardized enough form that it is intelligible to others and is of enough interest that others might search for it?

Beyond that, are your page titles differentiated, and is are the pages organized into some kind of structure? Is the data in any way prioritizable?

Are there sufficient inbound links to the site that Google might feel the data is of sufficient interest to be indexed?

Or maybe a better linking structure of the site?

Can you indicate to us what your navigation structure is now? Ie, you can't simply have 35,000 links from your home page. How is the data organized?

ildarius

10:06 pm on May 10, 2009 (gmt 0)

5+ Year Member



Another thing that comes to mind is how different are the pages from each other? Is much of the content the same and it's full of numbers?

The content is pretty similar, it's mostly numbers

Is this in fact data that's in a standardized enough form that it is intelligible to others and is of enough interest that others might search for it?

Beyond that, are your page titles differentiated, and is are the pages organized into some kind of structure? Is the data in any way prioritizable?

Are there sufficient inbound links to the site that Google might feel the data is of sufficient interest to be indexed?

The data is structured the following way: home page>country>season>region> Page 1,2,3,4,5 etc...

I believe that certain people would search for it in Google. I'm building some links to the site, but they're hard to come by, this is a very specific field.

Do you think I should put greater emphasis on Sitemaps?

If we were to consider deep linking building where do you think I should point most of the links?

Home page / country / season / region

Thank you

 

Featured Threads

Hot Threads This Week

Hot Threads This Month