Forum Moderators: phranque
Our site has an advanced search in the left column that is on (almost)every page of the site. The advanced search has a dropdown with a bunch of different locations from US - Alabama - Anniston all the way to US - Wyoming - Sheridan. Google crawls this dropdown on every page and when I look at the top words on our sites content, the first 50 are all from this dropdown list. From my reading of websites, it seems like robots.txt can only be used to block out urls or directories. Is there any way that a robots.txt file can be used to block spiders from crawling a specific element on a page but not the whole page? Or is there someother way to make it clear to search engines that these lists are not to be indexed? I would really appreciate any help on this issue, I am perplexed.
P.S. Don't worry about being too technical with your answers, luckily, I only have to find out the answer, not actually implement it.
unfortunately you can't stop a spider from reading a segment of your HTML. you can only stop it from crawling the page itself, or from crawling your links.
the only way you could achieve what you're asking is to remove the words from the HTML completely, either by including it in an iframe, or writing it on with javascript.
or maybe you could change the code into an HTML form, and redirect them that way.
I would appreciate it if you wouldn't make fun of me.
regarding your question, what are you trying to accomplish:
- prevent the indexing of the pages linked?
- control the page rank flow through your internal linking?
- reduce the importance of the anchor text in your document?
- something i didn't consider?
the only thing i can think of that specifically addresses your approach is that yahoo search supports a class attribute value of "robots-nocontent" as described here:
How do I mark web page content that is extraneous to the main unique content on the page? - Yahoo! Search [help.yahoo.com]
1) Use an <iframe>
2) Use javascript to create the content.
The overall goal may also be achieved by moving the required content to the bottom of the html (so that that search engines give it less prominence) and use CSS to place it higher on the page. Depending on exact requirements, this is probably the best solution.
Kaled.