Forum Moderators: Robert Charlton & goodroi
I have doubt that the text in the body is under <pre> tag, i am not sure if google do not index the file because of using <pre> html tag?
[edited by: tedster at 6:43 am (utc) on July 14, 2009]
No, the <pre> tag is not a problem. But if there is a crawling problem, you will often see reports about it in your Webmaster Tools account.
Google does not guarantee to index every page that they spider - in fact they usually don't. This is especially true when the same text is available at more than one address, such as the situation you report with both HTML and PDf versions of the information.
How many total pages does your site have, and how many pages does Google show they've indexed?
Although the url is not found under Google successful crawled urls
< snip >
[edited by: Robert_Charlton at 7:40 am (utc) on July 14, 2009]
[edit reason] removed specifics [/edit]
crawling >> indexing >> ranking
Why any URL gets skipped at each of these three stages can be quite complex. But from what you described, I'd say the duplicate text is the major issue.
Results 1 - 10 of about 3,040 from example.com. (0.15 seconds)
When i reach at the last page which is 37, and now google says
Results 361 - 370 of 370 from example.com. (0.25 seconds) :) first they wrote 3,040 pages and when browsing till last page it says 370....
About duplicate text, no i don't have any duplicate text! but YES other domain do have the same content! but unfortunately their page also not indexed?
[edited by: Robert_Charlton at 8:31 am (utc) on July 14, 2009]
[edit reason] changed to example.com [/edit]
----
The site: operator usually begins by giving only a rough estimate of the total number, and sometimes it's a VERY rough estimate. That's why the word "about" is included until you drill down toward the final pages of results.
Over the years, several Google people have made mention of this - the way that Google shards the data and stores those fragmments over several hundred thousand servers makes an immediate count very difficult to return to the user.