Forum Moderators: open

Message Too Old, No Replies

Do Any Search Engines Spider cgi?

         

guillermo5000

6:10 am on Sep 14, 2003 (gmt 0)

10+ Year Member



I am trying to promote my forum which uses cgi. Will search engines spider the cgi pages? or should I consider rewriting the urls? Thanks all.

victor

7:28 am on Sep 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a site where all pages, including the index page, are generated from CGIs -- no static HTML at all.

The site's page are all indexed in all the search engines I've ever checked.

Points to note:

  • You don't want your cgi-bin indexed -- just the stuff that the scripts in there emit.
  • You don't want your CGIs to time out (can happen often on a busy server, especially with IIS). A user mey be prepared to hit reload. A spider will go away with a partial page.
  • Redirecting www.widget.com to www.widget.com/cgi-bib/index.cgi takes a special skip'n'jump in .htaccess.
  • guillermo5000

    7:39 am on Sep 14, 2003 (gmt 0)

    10+ Year Member



    Does that include cgi extentions that are followed by?s=fa172f55cfe51f16e35ca4ea131f74d1;act=SF;f=1

    And, how do I keep my cgi-bin from being indexed?

    Thank you.

    pageoneresults

    3:50 pm on Sep 14, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    And, how do I keep my cgi-bin from being indexed?

    By using the Robots Exclusion Protocol [searchengineworld.com].

    Upload a plain .txt file to the root of your web. Within that plain .txt file you'll have this...

    User-agent: *
    Disallow: /cgi-bin/

    Name the plain .txt file this...

    robots.txt

    Make sure that you validate your robots.txt file by utilizing the resources available at the link posted above.

    guillermo5000

    5:33 pm on Sep 14, 2003 (gmt 0)

    10+ Year Member



    But won't that prevent them from looking at mysite.com/cgi-bin/somescript.cgi?

    Thanks.

    claus

    5:37 pm on Sep 14, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    ahm... you will use a .htaccess redirect to make sure that when someone enters www.domain.com they will really see /cgi-bin/somefile.cgi but they will never know about it as this exchange of paths takes place on the server.

    So, the spider will not be banned from indexing that file, as it's not indexing it from the /cgi-bin/ location, rather it is indexed from the domain.com/ location.

    It's just like a copy of the file at another location, only there is no copy, the original file will just get shown at another location than where it really is.

    /claus

    guillermo5000

    6:21 pm on Sep 14, 2003 (gmt 0)

    10+ Year Member



    I've considered this:

    RewriteRule keyword.htm /cgi-bin/ib3/ikonboard.cgi [R=301,L]

    So I could use [mysite.com...] to promote the board with SE's.

    Are you saying I should do this, along with the disallow statement to keep search engines out of my cgi-bin? Thanks all.

    claus

    7:58 pm on Sep 14, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    >> RewriteRule keyword.htm /cgi-bin/ib3/ikonboard.cgi [R=301,L]

    No, don't do that. The

    R=301
    part is not what you want. This makes it an external rewrite, and that one is visible to the spider, so it will not index it. Make it an internal rewrite in stead:

    RewriteRule keyword\.htm /cgi-bin/ib3/ikonboard.cgi [L]

    You might even want to speed it up by doing like this:

    RewriteRule ^keyword\.htm$ /cgi-bin/ib3/ikonboard.cgi [L]

    Above ^ and $ defines the start and end of the filename so the match and redirect gets faster.

    All you'll ever want to know (and then some) is here: [engelschall.com...]

    /claus