|My last two new pages from blog not indexed (first time for years)|
| 8:07 pm on Nov 7, 2009 (gmt 0)|
I wrote a blogpost last week and noticed yesterday that page is not indexed
For your information, I only indexed index page and posts (434 posts), others pages (category, date, tags, etc) are excluded using robots.txt
Blog is 2 y.o. (feed with feedburner) and all new posts were usually indexed WITHIN MINUTES (if not seconds). So I am really puzzled ?
|Sitemap is fine, although Total URLs: 438 - Indexed URLs: 371 |
Page not excluded by robots.txt
Not found in crawling errors
Metadata are fine
Page is clean, compliant, blah blah
Labs-> Fetch as Googlebot - Success
Furthermore, the above mentioned page has been crawled the same day, because trackback and tag URLs have been crawled and excluded as expected
Crawls error >> Restricted by robots.txt
http://www.example.com/URL/trackback/ - URL restricted by robots.txt - Nov 2, 2009
http://www.example.com/tag/SPECIFIC-TAG/ - URL restricted by robots.txt - Nov 2, 2009
Any idea page has been crawled but not indexed ?
I know that Google doesn't have to index everything but it is the first time since 2007 (niche topic, PR4), so I am really surprised !
| 11:14 pm on Nov 8, 2009 (gmt 0)|
My typical first guess is that you have little or no links going to that page. I doubt that is the case since you mention this is the first time in 2 years you have had this issue.
My typical second guess would be that your website was not available when google tried to crawl it. This is more common amongst websites that are not using the most reliable hosting companies. Since the mentioned the trackback page was crawled this is also probably not the case.
My typical third guess would be a duplicate content issue. I know you wrote the blog post so it should be unique content but I don't know how long your post is or if someone else copied it on their site in that very small window of opportunity before Google crawled your page. Have you ruled out duplicate content?
| 6:23 am on Nov 9, 2009 (gmt 0)|
Thank you Goodroi for your reply. Much appreciated.
First guess, indeed no external links pointing to that page but as you said, don't think it is an issue here.
Second guess, it could be. I am on shared server with h*stgat*r.
Third guess, no duplicate content issue and content is very much unique.
Fourth guess, the url was may be a bit too long (105 characters in total) but again it is not the first time I wrote an article with long URL.
Anyway, wrote another article yesterday and it got indexed... And one of the articles mentioned above has been indexed since, so everything is back to normal....
Again, thank you Goodroi