Forum Moderators: phranque
I'd be grateful for any views.
I have an x-cart site (bookshop). It's normal pages are php - i.e. "site.com/product.php?productid=xfmkk" urls
But being x-cart, it also has an html catalogue, so there is a "site.com/catalog/sjsjs.html" page for the same product.
Most of the spidering tends to be php pages. I would say 75-80% of requests, as opposed to 25% for html pages. But of course it's said that the html pages rank better, because they have the words in the title, unlike php.
Would you put something in the robots.txt saying "disallow /php" to force search engines to just spider the html pages. My worry is that because they usually only ask for those 20% of the time, they may wonder why they have suddenly been banned from 80% of what they would normally get. Would this have an effect? Would you introduce a change? Would it be beneficial or detrimental?
I hope to soon get a modification to have only search engine friendly urls, but until then I thought I would ask the question. Grateful for any views.
Regards
Dilip
i would make sure you are using and serving canonical urls, providing unique content on your pages, and providing a navigation path through links and/or a sitemap to all the urls you want indexed.
you should also spend some time reading the Hot Topics thread pinned to the top of the Google Search forum [webmasterworld.com].
most of that will apply to all major search engines.