Welcome to WebmasterWorld Guest from 34.201.121.213

Forum Moderators: open

Message Too Old, No Replies

Could Limit on Number of Pages Indexed Impact Deep Crawling?

Does Google stop indexing pages after a certain number of pages is reached?

     
3:38 pm on Jun 10, 2003 (gmt 0)

New User

10+ Year Member

joined:June 10, 2003
posts:2
votes: 0


This is my first post to this forum, but I've been doing SEO for about 3 years. That being said....

I'm currently working on a large web site with content that is dynamically delivered via an application layer (essentially, the file name is passed via an id parameter to a jsp file which then calls the page from a database). For the initial optimization effort, I did some standard on-page optimization, instantiated a small PPC campaign for region-specific key terms, and created a site map and submitted. Six weeks after the submission, while the site map was indexed, none of the pages that are accessible from the site map via the following URL syntax appeared in the index:

www.domain.com/jsp.jsp?page=/dir/name.htm

I've even submitted some URLs directly, waited patiently, and still no dice.

I realize there are number of issues in play here. Research has shown that the "id" var may be problematic (although I've found examples in the index), some of the URLs were quite long, possibly indicating to Google that the page in question was too deep in the site to bother indexing, etc., but there is another issue I'm wondering about.

When I look to see what pages from the site ARE in Google's index, there are 11+ pages of results. The vast majority of these results are PDF files. Is it possible that b/c Google has indexed so many of these PDFs, my site has tripped some component of the algo that says "we've got enough pages from this domain, don't index any more pages". If this is the case, a simple tweak to the robots.txt file may be all I need.

Additional (related) question: considering that Google keeps cached versions of sites, is it possible that after a certain file size has been reached for a site(as opposed to a page) that Google stops indexing additional pages from the site? Google doesn't cache PDFs to the best of my knowledge, so it probably doesn't play into my sitatution.

Sorry for the lengthy explanation...thanks in advance.

D

3:56 pm on June 10, 2003 (gmt 0)

Junior Member

joined:Apr 30, 2003
posts:62
votes: 0


HI,
can't answer your dynamic problem, there are real techies here to answer that, but so far as PDFs and pages are concerned, if there are no problems with the code then no matter how big your site once a deep crawl has been done the whole lot usually get listed, PDFs of not, there should be no connection.

Be careful with that code Eugene!

If you have been deep crawled (check your logs) then i would say your problem is that Google spits out some dynamic strings, while it relishes static pages always.

6:34 pm on June 10, 2003 (gmt 0)

New User

10+ Year Member

joined:June 10, 2003
posts:2
votes: 0


Thanks for the feedback. I must say, its seems like Google would need to set some upper bound to how much they can catalog, particularly when you consider that they cache copies of indexed web sites, but perhaps that upper bound is so high, a site would need to be immense to actually hit it.

One clarification to the thread, the URL syntax for the site in question is:

[domain.com...]

...as opposed to what I put in my initial post. A subtle difference, but I wanted to make sure I have described the situation as accurately as possible.

Thanks again,
D

8:54 pm on June 10, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 31, 2003
posts:457
votes: 0


I've heard mention that it may be a function of PR. ie the higher your PR the more pages will be indexed.

Daisho