Forum Moderators: phranque

Message Too Old, No Replies

file robot txt

how to properly insert some disallows

         

Loki03

9:14 am on Sep 29, 2010 (gmt 0)

10+ Year Member



Hello, the SEO company which is auditing my site is requesting me to disallow the following:

Disallow: /showthumb.php?
Disallow: /index2.php?
Disallow: /index.php?


the reason is that the crawler is finding many pages such as these

/showthumb.php?width=280&height=175&quality=100&img=images/stories/.......

or

/index.php?stanzasuite=13&step1=ok

that should not be indexed nor crawled.

I would like only to be sure that the syntax /index.php? will tell the robots only not to crawl what is after the (?) but crawl what is before... of course the index is quite an important page for me ;)

thanks for your opinion!

g1smd

11:22 am on Sep 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The URLs in robots.txt 'disallow' directives only match for all of the characters present, so for any URL which is shorter it is, by definition, 'allowed'.

Additionally, you should not be having the root of your site indexed as example.com/index.php or similar. Instead it should be linked to, and indexed as, www.example.com/ only.

sublime1

3:59 pm on Sep 30, 2010 (gmt 0)

10+ Year Member



Loki03 --

A great tool that might help you verify the syntax of your robots.txt is one of the features of Google Webmaster Tools. Many other good SEO related tools there, too.

I agree with g1smd -- most likely, you'll need to create some apache rewrite rules, or if you're using a CMS like WordPress, Drupal, Joomla they can probably be configured to do this for you.

Be absolutely sure that the crawlers can access your home page before putting in robots.txt directives. I messed up once and inadvertently instructed all bots to never index any page. Fortunately, I caught it quickly, but that would have been a rather serious disaster :-)

Tom