Forum Moderators: open

Message Too Old, No Replies

robots.txt and include files

Can I tell google not to index my SSI files?

         

wrgvt

8:15 pm on May 16, 2003 (gmt 0)

10+ Year Member



Maybe this answer is obvious, but after a lot of reading, I'm still confused about robots.txt. I have a few web pages that exceed the 101K limit for Google indexing. Now the smart answer is to break it into smaller pages. Let's assume I'm not that smart (insert your own joke here).

Looking at the page as it loads, a lot of the first part of the page is information I'd rather not have indexed by Google. These are things like drop-down menus for navigation, search boxes, affiliate logos, ads, etc and loaded as included files. Now most of these items are stored in the root with an extension of .htm (while the main code is index.html), so if I use robots.txt to disallow all files with extension of htm, will Googlebot not load these when it reads index.html?

jatar_k

8:28 pm on May 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld wrgvt,

no, it is the nature of included files.

when googlebot requests index.html (which doesn't have an htm extension) the server then responds by putting the index.html page together and returning it.

googlebot doesnt know, or care anything about, what files were included, the server takes care of that and returns it upon request.

If you want to reduce your page size you will have to do it the old fashioned way, reduce it. ;)

AmericanBulldog

8:31 pm on May 16, 2003 (gmt 0)

10+ Year Member



While not quite there with it myself, I understand you can position parts of your page where you like using CSS, so that the more relevant copy is read first.

You might want to hang out at the CSS forum for awhile.

freyman

8:37 pm on May 16, 2003 (gmt 0)



Even if U use included files they still be increasing your file size, because engine will read the file after including them, not before.

bhartzer

8:44 pm on May 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I don't understand this correctly, but if you tell the spider which parts of the page it can spider and which part of the page it can't spider (by denying the spider certain parts of a page via ssi) then wouldn't you be giving a page to the normal visitor that you're not giving to the spider? The spider would see one version of the page and the normal visitor would see another version of the page when it loads?

If that's the case, then wouldn't that be cloaking (which the engines don't like)?

jatar_k

8:45 pm on May 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



it is not possible to disallow spidering of a file that is included via SSI.

wrgvt

8:45 pm on May 16, 2003 (gmt 0)

10+ Year Member



Actually, isn't cloaking just the opposite, the spider sees things on a web page that the user doesn't?

jatar_k

8:47 pm on May 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



lets not get into another cloaking debate.

for a description of cloaking check here
The Truth About Cloaking [webmasterworld.com]