Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Perl and robots

How do robot deal with html served by perl

11:59 am on Nov 17, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 20, 2004
votes: 0

Hello everybody
I do not know is my question is in the right forum but here it is.
Do robots index the html content of a page when this page is generated by a perl script?
And do also robots follow links like:

[edited by: encyclo at 6:03 pm (utc) on Nov. 17, 2006]
[edit reason] switched to example.com [/edit]

6:02 pm on Nov 17, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 5, 2006
votes: 0

robots can be programmed to work differently, so some robots might index script generated pages and follow links and some might not.
4:31 pm on Nov 20, 2006 (gmt 0)


WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
votes: 211

i would assume that a well-behaved robot would try to do a HTTP "GET" on the href value of every anchor tag.
4:53 pm on Dec 7, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 19, 2005
votes: 0

Robots can indeed behave differently for different URL patterns. But you can write your content so that there's no way for the requesting agent to have any idea how it was generated. I've developed many sites where all content is generated by perl scripts. But every URL is a directory URL (/blue/, /red/, etc). There is no ".html", no ".cgi", etc. This hides the underlying technology, which the agent doesn't need to be concerned about anyway. Could be Perl CGI, could be PHP, could be a flat HTML file. You don't need to know.

A simple example is if you use server-side includes for dynamic content. Your included script can display today's date, file contents, etc. But there's no way for the requesting agent to know that this happened.

Here's a good article written by Tim Berners-Lee many years ago about this topic: [w3.org...]