Forum Moderators: open

Message Too Old, No Replies

How to get Google to deepcrawl and index a brand new large site

that has alot of pages

         

PFOnline

7:58 pm on Jun 25, 2004 (gmt 0)

10+ Year Member



I managed to get Googlebot to come to my site, but only the index page is showing up in the google results so far, and it's only been to about 500 pages of about 20,000...

So my questions are:

How long before atleast those 500 or so get into the index?

and

Is there anything else I can do besides try to get more links to entice google to deepcrawl the rest of the site?

Thanks

Abdelrhman Fahmy

4:41 pm on Jun 26, 2004 (gmt 0)

10+ Year Member



- try external linking from PR5,6.. Pages to your deep pages (second and third level pages)

- Static Site map linked from home page

walkman

5:58 pm on Jun 26, 2004 (gmt 0)



1. several nice sitemaps since it's a large site (no larger than 100K since Google stops indexing after that)
2. Link the sitemaps from your front page
3. Get plenty of inbound links ideally to the front page and a few inside categories or sitemaps, since Google might not index that deep on new domains.

communitynews

7:02 pm on Jun 26, 2004 (gmt 0)

10+ Year Member



Walkman, are you saying that google will only index 100K files on a given domain? This appears to be true. I've been trying to get 250K files in google since January but never get over about 100K (as measured by site:domain.com). Do you know if subdomains can be used to overcome this?

sun818

7:17 pm on Jun 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Do you know if subdomains can be used to overcome this?

No, there is an upper limit to a web page file size. 100 Kb sounds right, 250 Kb will only have the first 100 Kb indexed.

ExpLarry

7:18 pm on Jun 26, 2004 (gmt 0)

10+ Year Member




Walkman, are you saying that google will only index 100K files on a given domain?

I think what is meant here is files up to a size of 100 kilobytes, not 100,000 files.

walkman

7:51 pm on Jun 26, 2004 (gmt 0)



"I think what is meant here is files up to a size of 100 kilobytes, not 100,000 files. "
yes, sorry for the confusion.

PFOnline

8:46 pm on Jun 26, 2004 (gmt 0)

10+ Year Member



Thanks, thats one thing I haven't done... Try to get links to the 2nd and 3rd level directories within the site... I've just been trying to get links for the homepage.

One last question... I recently heard something about adding some sort of index, follow code to the robots.txt...

Currently my robots.txt file looks like this:

User-agent: *
Disallow:

Is that robots.txt OK or is there anything I should add? That might help make Google want to crawl and index the whole site a little more?

Also: Google's been to even more pages, since I posted this, and seems to come about every day, but no files are showing in the index yet... How long before the pages it did crawl will be in the index? Will I have to wait for another update?

Cheers.

Abdelrhman Fahmy

9:36 pm on Jun 26, 2004 (gmt 0)

10+ Year Member



may be your robots file is the problem?

User-agent: *
Disallow: / will disallow all the bots from indexing your site but i don't know the effect of the disallow without the slash

may be this is the problem
wait a minutes and you'll get an answer now from robots.txt experts ;)

conroy

9:43 pm on Jun 26, 2004 (gmt 0)

10+ Year Member



Your robots.txt file is fine. You just need links. Preferably with high PR.

bluermes

9:46 pm on Jun 26, 2004 (gmt 0)



If you want a report of googled pages, use Visitors [hping.org] is, open source, very fast web log analyzer for Linux and Windows.

PFOnline

12:16 am on Jun 27, 2004 (gmt 0)

10+ Year Member



So no one really knows about how long it takes after google crawls some of your pages, for those pages to show up in the index?

conroy

12:30 am on Jun 27, 2004 (gmt 0)

10+ Year Member



Depends on what you mean by "showing up" in the index. Pages can be found in under 24 hours as listed using the site:www.domain.com command.

If you mean showing up as in receiving traffic, it can be months.

SEO is in many ways just a big waiting game now, especially with Google.

PFOnline

12:34 am on Jun 27, 2004 (gmt 0)

10+ Year Member



Ya, thats whats strange... I just mean to be able to see them in the index... not good rankings or anything... It's been like 3 or 4 days since google first came and crawled some pages, but none are showing yet, just the homepage of the site.

conroy

12:35 am on Jun 27, 2004 (gmt 0)

10+ Year Member



It can take a few days. Especially if you don't have high PR links pointing at you. Just keep gathering links and don't worry about your pages showing up right now. They will be there soon.

PFOnline

12:39 am on Jun 27, 2004 (gmt 0)

10+ Year Member



Ya, the highest link I have pointing to my new site right now is a PR4...

Good stuff, thanks... answered my question... :)

Robert Charlton

12:42 am on Jun 27, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



1. several nice sitemaps since it's a large site (no larger than 100K since Google stops indexing after that)

Actually, this should be "no more than 100 links per page, since Google may not crawl more than that."

The 100K limitation is also correct, but I don't think that's the operative limitation in this discussion.

Incidentally, I've seen Google crawl more than 100 links, but GoogleGuy has mentioned that 100 is generally the limitation. (Anyone have the thread he mentioned this on?)

The quick answer to your question is inbound links and PageRank.

It's also helpful to prioritize, and to structure your site and your site maps so that the pages that really need the PageRank get it first.

Take a look at:

Search Engine Theme Pyramids and Google
Optimising the Pyramid for PageRank
[webmasterworld.com...]

Also, on a really large site, perhaps not all pages are worth getting crawled, and you might want to think about that as you make use of what PageRank you have.

BigDave

1:16 am on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google mentions the 100 link limitation as a good design issue for the users. They will follow all the links.

Also remember that any recommendation that google gives will have a safety margin included. There has been some discussion about Google diluting PR even more than usual at somewhere around 200 links, but they still follow them even out to several hundred.

As for the 100k limit, that is a hard limit to how much google will cache of your page (if it is HTML, they go larger for other formats).

Google absolutely *does* follow links that are after the 100k point in a file. It follows them, it passes PR and it counts them as backlinks.

Googlebot reads the whole file, not just 100k. The links to follow are retrieved at this early point. Indexing and caching are at a later point. It currently appears to me that both of these stop at 100k.

Robert Charlton

4:34 am on Jul 2, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A PS on the 100 links per page. Just tripped over GoogleGuy's msg #5, posted yesterday on this thread...

How strict is the 100 links per page concept?
[webmasterworld.com...]

The limit to be more aware of is the 101K limit on pages. 100 links is just a guideline that helps encourage to keep pages < 101K. You can put 150 or 200 links on a page with no problem, but keep in mind what an earlier poster mentioned: if you have several hundred links on a page, you might want to take a step back and think if that's the best thing for users. There could be a way to rework the links so that they make more sense for both users and search engines (e.g. break down links chronologically, into alphabetical chunks, by topic, etc.).

sit2510

12:09 pm on Jul 2, 2004 (gmt 0)

10+ Year Member



>>> 100 links is just a guideline that helps encourage to keep pages < 101K. You can put 150 or 200 links on a page with no problem.

So now the truth is revealed! It is very funny how people had reacted to the message of G about 100 links per page when it was first published in the guideline a year ago, and people started to divide their 1-link page to several pages or to directory style of many pages as we can see this day. At the beginning, I used to argue with many link partners, only to find that the wave was too strong so better go on the same boat.

I recall BigDave insisted many times in other old threads that G does follow several hundreds links and it proves you're right.

BigDave

4:41 pm on Jul 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, fairly early on, GG suggest that it was mostly a useability guideline.

I just knew for a fact that google follows at least out to 1200 links on a page, and they follow links that are located out at around 140k in a file, and that the link at 140k will even show up as a backlink.

While they do follow all the links, it is unknown how having all those links on one page might impact your ranking.

Since Googlers seem to think that pages with over 100 links are less appealing to visitors, it is quite possible that the number of links on a page is one of the "more than 100 factors" considered when ranking a page.

My home page has over 160 links, and it does not get that much google traffic. I have no idea if they are related, as the home page is not optimized for anything other than the site name. But you go against Google's good advice at your own risk.