Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
I'm currently working on a large web site with content that is dynamically delivered via an application layer (essentially, the file name is passed via an id parameter to a jsp file which then calls the page from a database). For the initial optimization effort, I did some standard on-page optimization, instantiated a small PPC campaign for region-specific key terms, and created a site map and submitted. Six weeks after the submission, while the site map was indexed, none of the pages that are accessible from the site map via the following URL syntax appeared in the index:
I've even submitted some URLs directly, waited patiently, and still no dice.
I realize there are number of issues in play here. Research has shown that the "id" var may be problematic (although I've found examples in the index), some of the URLs were quite long, possibly indicating to Google that the page in question was too deep in the site to bother indexing, etc., but there is another issue I'm wondering about.
When I look to see what pages from the site ARE in Google's index, there are 11+ pages of results. The vast majority of these results are PDF files. Is it possible that b/c Google has indexed so many of these PDFs, my site has tripped some component of the algo that says "we've got enough pages from this domain, don't index any more pages". If this is the case, a simple tweak to the robots.txt file may be all I need.
Additional (related) question: considering that Google keeps cached versions of sites, is it possible that after a certain file size has been reached for a site(as opposed to a page) that Google stops indexing additional pages from the site? Google doesn't cache PDFs to the best of my knowledge, so it probably doesn't play into my sitatution.
Sorry for the lengthy explanation...thanks in advance.
joined:Apr 30, 2003
Be careful with that code Eugene!
If you have been deep crawled (check your logs) then i would say your problem is that Google spits out some dynamic strings, while it relishes static pages always.
One clarification to the thread, the URL syntax for the site in question is:
...as opposed to what I put in my initial post. A subtle difference, but I wanted to make sure I have described the situation as accurately as possible.