Forum Moderators: goodroi
For example:
php pages accessed via 'include' statements
javascript pages accessed via <script language="JavaScript" src="/script.js"></script> tags
stylesheets linked to in the <head> tag.
Maybe the short question is, do SE's spider as though they're a typical user, only visiting files explicitly linked to in the displayed pages, or does it read your code and access every file it comes across?
<script language="JavaScript" src="/script.js"></script>
when requesting a page the php includes will be handled by the server and the results will be sent to the robot.
Also, what about style sheets:
<link rel="stylesheet" href="/style.css" type="text/css">
Just out of curiosity, you say robots ignore images; how do they turn up in a Google image search then? Or is that done by a special (separate) spider?
I wouldn't be worried about PHP files and such but .doc or .pdfs you may not want them indexed. Even if they aren't linked to from your site the bot may still find them if someone views them with a Google tool bar or if someone else who knows the URL links the document from their site.
If you don't want the files to end up in Google then restrict the bot, if you don't care then don't bother.
[edited by: Demaestro at 5:37 pm (utc) on Dec. 1, 2006]
If you don't change your pages based on the requesting User-agent, then an easy way to see what the spider sees is View->Page Source in any browser.
There is no need to Disallow these included files or their directories unless you have reason to suspect that some third party may know or find their URLs and link to them for some malicious reason. If you're in a competitive market segment, then Disallow your include files directory and be done with it. But I'd be more worried about how someone found a URL that could be successfully used to reach them in the first place. In other words, this is not a robots/search ranking problem, but rather a security problem.
Jim
There is no need to Disallow these included files or their directories unless you have reason to suspect that some third party may know or find their URLs and link to them for some malicious reason.
since i am a paranoid person i would not take a chance. i assume a competitor will eventually try to cause me trouble so i would block these files. by leaving files hanging out in the open you take a risk (albeit a very small risk).
as jdMorgan wisely saidThere is no need to Disallow these included files or their directories unless you have reason to suspect that some third party may know or find their URLs and link to them for some malicious reason.since i am a paranoid person i would not take a chance. i assume a competitor will eventually try to cause me trouble so i would block these files. by leaving files hanging out in the open you take a risk (albeit a very small risk).
included files are not the same as css files.
unless you are blocking unreferred access to the css file, for example, anybody can determine and enter the url and browse that file.
a server side include file can be included from a file path that is not web accessible.
a css file must necessarily be web accessible since it is directly requested by the browser in the normal course of accessing a page.