Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
I a SEM newbie and wanted to know what is the absolute best robot txt coding one should use to insure Google and the major SE's spiders index all of my sites pages?
I've seen the following used:
<meta content="index,follow" name="robots">
<meta name="robots" content="all">
<meta content="all" name="robots">
Which one is best? Or are they all equally good and does it even matter which one I use?
Next, is there an even better one than these three? If so, what is it?
Lastly, is there any other key coding/files I need to insure spiders index all of my sites page? If so, please list with enough details on how to implement for us newbies!
Welcome to WebmasterWorld!
Robots are paid to index all your files, that is their task in life. So they will index everything unless you forbid them.
The various thingies you mention have only one useful task: to keep robots out. Since you want them these things are useless.
The main value of having a totally empty robots.txt file, rather than simply not having one, is to prevent error messages in your logs. Every well-behaved robot begins each session by requesting the robots.txt file, and if it is not there you get an error message.
> Lastly, is there any other key coding/files I need to insure spiders index all of my sites page?
Clear navigation helps. If the user has to click more than a few times (around three) to get from the index page to a given internal page that page has a lesser chance of being indexed. In their Webmaster Guidelines [google.com] google suggests:
Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages.
If you want to have robots index everything you do not need any <meta name=robots" ... lines at all.
Regarding the robots.txt file (which is not part of your web pages but a separate file at your domain root), if you do not want to disallow spiders from everything you could e.g.
a) put an empty robots.txt file (http://www.example.com/robots.txt) at your domain root
b) create a robots.txt file referencing a directory that does not exist, e.g.
Advantage of this: you can build on this syntax if you later do want to keep spiders from some parts of your site.
The worst alternative is not to have a robots.txt file because
1) this swamps the real errors in your error log
2) some webspaces are set up to serve the default page (rather than an error page) as answer to a request for a file that does not exist. I have seen quite a few sites where a request for robots.txt returns the home page of the site - a search engine spider programmed in a non-robust way might choke on that.