homepage Welcome to WebmasterWorld Guest from 54.166.53.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
My last two new pages from blog not indexed (first time for years)
tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4021059 posted 8:07 pm on Nov 7, 2009 (gmt 0)

I wrote a blogpost last week and noticed yesterday that page is not indexed

For your information, I only indexed index page and posts (434 posts), others pages (category, date, tags, etc) are excluded using robots.txt

Blog is 2 y.o. (feed with feedburner) and all new posts were usually indexed WITHIN MINUTES (if not seconds). So I am really puzzled ?

Sitemap is fine, although Total URLs: 438 - Indexed URLs: 371
Page not excluded by robots.txt
Not found in crawling errors
Metadata are fine
Page is clean, compliant, blah blah
Labs-> Fetch as Googlebot - Success

Furthermore, the above mentioned page has been crawled the same day, because trackback and tag URLs have been crawled and excluded as expected

Crawls error >> Restricted by robots.txt
http://www.example.com/URL/trackback/ - URL restricted by robots.txt - Nov 2, 2009
http://www.example.com/tag/SPECIFIC-TAG/ - URL restricted by robots.txt - Nov 2, 2009

Any idea page has been crawled but not indexed ?
I know that Google doesn't have to index everything but it is the first time since 2007 (niche topic, PR4), so I am really surprised !

Thank you

Thank you

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4021059 posted 11:14 pm on Nov 8, 2009 (gmt 0)

My typical first guess is that you have little or no links going to that page. I doubt that is the case since you mention this is the first time in 2 years you have had this issue.

My typical second guess would be that your website was not available when google tried to crawl it. This is more common amongst websites that are not using the most reliable hosting companies. Since the mentioned the trackback page was crawled this is also probably not the case.

My typical third guess would be a duplicate content issue. I know you wrote the blog post so it should be unique content but I don't know how long your post is or if someone else copied it on their site in that very small window of opportunity before Google crawled your page. Have you ruled out duplicate content?

tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4021059 posted 6:23 am on Nov 9, 2009 (gmt 0)

Thank you Goodroi for your reply. Much appreciated.

First guess, indeed no external links pointing to that page but as you said, don't think it is an issue here.

Second guess, it could be. I am on shared server with h*stgat*r.

Third guess, no duplicate content issue and content is very much unique.

Fourth guess, the url was may be a bit too long (105 characters in total) but again it is not the first time I wrote an article with long URL.

Anyway, wrote another article yesterday and it got indexed... And one of the articles mentioned above has been indexed since, so everything is back to normal....

Again, thank you Goodroi

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved