Forum Moderators: Robert Charlton & goodroi
Just recently, some PDF files were amended on a site and its mirror, to remove the address and I am tracking how long it takes Google to reindex them. Two were reindexed within a week, and no longer show up in a Google search for the email address. They do now show up for a search for the mailform URL. I use a search term like name@keyword1-keyword2.domain (without the ccTLD - don't want to send the full email address in any referrers) to find the sites that contain the email address.
Several PDF files have yet to be reindexed, and one shows up in searches for both the email address and for the webform URL. It is caught "inbetween", showing for both old and new content (I see this a lot - especially for supplemental results).
Google has the ability to highlight words used in your search query if they appear in the SERPs. I have noticed that for indexed HTML files, the search terms are highlighted if they appear in the title or the snippet. For PDF files I don't see this highlighting happening as consistently.
For this email-address search, the modified PDF files are on two domains. For one domain the words are highlighted in the snippet. For the other domain (the mirror) they are not (and are supplemental results too).
The mirror results never used to appear in the SERPs (they were filtered out as duplicate content) but reappeared a week or two ago (this was one of the sites that triggered my WebmasterWorld post last week about Google appearing to relax their restrictions of duplicate content filtering).
In order to test the keyword highlighting of SERPs for the PDF file, I did a new query like name@keyword1-keyword2.domain name keyword1 keyword2 domain and I saw that for most entries (all except one) all of the words in the query were now highlighted in the snippet (none of the query words appear in the document title anyway).
However, for one document, there was no highlighting. Looking closer, and to my surprise, I saw that for the file caught "inbetween" that now the title and snippet changed to something completely unrelated to the content of the PDF file itself. It is a completely different document title and a completely unrelated snippet. There is no cache for this PDF document, and no "View as HTML" option so I could not investigate much further.
If I remove the word "keyword2" from the search query then the title and snippet for this document reverts back to what it should be. If I do other content searches that bring this PDF file back as a result, then the title and snippet are also correct. As soon as I add "keyword2" back to the query, the title and snippet changes back to a completely incorrect one again.
Has anyone seen this before? Looks like a corrupted DocID in the index, or a DocID that has been reused without clearing up previous associated data. Anyone got any other explanations?
[edited by: lawman at 10:31 pm (utc) on Sep. 19, 2005]
[edit reason] Speeling [/edit]