Forum Moderators: phranque

Message Too Old, No Replies

Robots and IFrames

does a robot spider the content?

         

old_expat

7:09 am on Nov 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If an iframe tag calls a "complete" HTML page that resides in a subordinate directory, and that directory is off limits to spiders via the robots.txt file, (or by meta tags?) does the robot spider the content?

Ex: <iframe src ="dir/this-page.php" width="100%">
</iframe>

If it does, is there another *relatively* easy way to keep the robot from spidering just a portion of a page? How about building the pages using frames?

It would be a bit too bandwidth intensive with images .. and a pain as well.

Receptional Andy

11:37 pm on Nov 28, 2006 (gmt 0)



How a page is referenced (e.g. by a link, and iframe, an image tag) is not relevant to the robots exclusion protocol - spiders should simply not visit it at all (except for the noindex,follow directive, of course).

The real question is why you need to keep spiders away from the page. Depending on the reason, you might try other methods of protecting the page (e.g. blocking common spiders, password protection, bot traps etc).

If you can explain the reason, I could give some more specific advice.

old_expat

2:15 am on Nov 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The real question is why you need to keep spiders away from the page. Depending on the reason, you might try other methods of protecting the page (e.g. blocking common spiders, password protection, bot traps etc).

If you can explain the reason, I could give some more specific advice.


It is actually just a portion of the page .. varies from 20-60% of the page content.

The portions of the page that I want to keep the spiders away from is duplicate / affiliate content.

I recently had 1,200 (all but 4 pages of the site) drop out of Google's index.

Receptional Andy

3:39 pm on Nov 29, 2006 (gmt 0)



>>The portions of the page that I want to keep the spiders away from is duplicate / affiliate content.

In which case robots exclusion (as you've already implemented) should be fine.

>>I recently had 1,200 (all but 4 pages of the site) drop out of Google's index.

This may well indicate a more serious problem - even sites with lots of duplicate content don't necessarily get all the pages dropped - they just perform badly.

old_expat

5:10 pm on Nov 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This may well indicate a more serious problem - even sites with lots of duplicate content don't necessarily get all the pages dropped - they just perform badly.

I can't figure what else it could be .. unless it's that every page has a link pointed to the same merchant site.

No cloaking, no invisible text .. actually, I'm not talented enough to do all that black hat stuff.:)

Everything on the page other than a <h2> tag has been duplicate / affiliate content

old_expat

3:27 am on Nov 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another question about the same concept, not being sure how spiders work ..

If a PHP include is stored in a remote directory and that directory is excluded by robots text, does the spider 'see' what is in the include?

I'm assuming it does as the include is server parsed?

Receptional Andy

4:31 pm on Dec 8, 2006 (gmt 0)



If a PHP include is stored in a remote directory and that directory is excluded by robots text, does the spider 'see' what is in the include?

Spiders only see the output - if you view source in your browser, that's what the spider will 'see'. They have no opportunity to even know the include exists (unless there are direct links to the included file).

Everything on the page other than a <h2> tag has been duplicate / affiliate content

I suppose you might be dropped for having such a high level of duplication, although I don't think this is very common (could be wrong though ;)).