Forum Moderators: open

Message Too Old, No Replies

Robots.txt and Robots behavior with many tables

         

tomda

10:45 am on Feb 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two simple question :

1/ I am working on a beta version of my website and will upload it on a subdomain (beta.example.com) just to try it.
I do not want robots to crawl the site. If I put a robots.txt file in the root of the directory (with User-agent: * and Disallow: /), will robots ignore the site despite I have on each file a meta tag CONTENT="all".

2/ Second question. I have many tables within table in my webpages. Is it safe for robots when they strip all HTML tags.

Thank you

[edited by: tedster at 6:43 pm (utc) on Feb. 3, 2004]

tedster

2:09 am on Feb 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. Well-behaved bots will ignore the site. They won't even ask for disallowed pages, so they get no chance to see the meta tags. Not all bots are well-behaved, but the majors usually are, with the occasional accident.

2. If your table markup is valid, the content will get indexed by most engines. Your biggest concern would be a skipped or mangled tag -- table, row or cell.

If important text is nested deeply, it used to be devalued, but it was still indexed. I haven't taken a hard look at this issue in a few years, because I avoid nesting layout tables more than two levels deep. But I'd guess that deep nesting is not the problem it
once was. I'd still keep it minimal, because you just never know.

One of the issues with complex table layouts is that sometimes text that is visually connect when you view the page is not at all connected in the HTML - this can hurt proximity factors on multiple word searches.