Forum Moderators: goodroi
to index the page and to follow the links on your page to other pages on your website.
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
Would tell the bot not to index this page or follow the links.
I'm guessing in most cases the bot would index the page and follow links by default BUT, it probably doesn't hurt to have it on there anyways.
I think the solution given before about not having the trailing '/' will probably do the trick though.
disallow: /directory/
will disallow all files in that directory.
disallow: /example
will only disallow the index of that directory?
means disallow /example/? or example.htm?
(depends on the bot or wether there is a directory and a file with the same name)
disallow: /click.php
means disallow click.php?id=2 id=3 ect. (or you could use /click.* )
and while were at it we might as well
disallow: /
means disallow everything
disallow:
means disallow nothing
If a page is disallowed in robots.txt, then no robots.txt-compliant robot will ever fetch that page. Therefore, the Robots meta-tag on that page is irrelevant.
<meta name="robots" content="Index,Follow"> is not needed. Index,Follow is the default if the tag is not present, and using it only wastes bandwidth and pushes your content down in your file.
The Robots standard is based on prefix-matching. Therefore:
Disallow: /directory/
will disallow anything that starts with "/directory/" - It will disallow all files in a subdirectory named "/directory".
Disallow: /example
will disallow anything that starts with "/example" - It will disallow all files in a directory named "/example" and it will disallow any file in the root of the site whose name starts with "/example", e.g. /example.php or /example.gif
Disallow: /click.php
means disallow anything that starts with "/click.php" such as /click.php?id=2
You can use "Disallow: /click.*" only in robots.txt records specific to Google or another search engine that supports this non-standard extension to the robots.txt "standard" (It's not really a standard, because it was never formally adopted, there is no sanctioning body for it, and compliance is purely voluntary). Do not use this wildcard construct in a catch-all record that starts with "User-agent: *" -- It is invalid for most robots.
Jim
[edited by: jdMorgan at 3:26 am (utc) on April 29, 2005]
Your robots.txt looks valid to me. Check it here [searchengineworld.com] just to be sure.
Make sure it is a plain-text file using LF or CR,LF line-enders. Edit it in NotePad or some simple editor, not a fancy word processing program or HTML editor. The file should contain only those lines you posted above.
You might also want to request robots.txt and a few of your pages manually with a server headers checker [webmasterworld.com], and make sure they return a 200-OK server response code.
If your site is new, and you have only a few incoming links from moderate-to-high PageRank sites, it may just take awhile for Googlebot to get interested in it. Incoming links and patience are required.
Jim
mbatta - here's a cool tool to check out.
It gives you 'googlebot view' of your site and includes some diagnostics with headers returned and stuff. (no guarantee that it is exactly as googlebot but if this tool gets hung up, so will googlebot).
I found a lot of 'little errors' with it.
example:
a href="file.htm"
is different than
a href= "file.htm"
Just type a URL into the box and it will spider the page and the pages it links to.
google search 'poodle predictor'
it will show up in your logs as a bot 'poodle predictor'
I always knew it pretty much seemed to ignore <meta "keywords" and "description" .
But when I ran poodle it seemed to go ahead and spider through any pages that I said "NOINDEX" or "NOFOLLOW".
It may just act that way in Poodle, but I may remove the tags any ways and rewrite my robots.txt accordingly.
Thanks guys ( even if you did mean to give it to Mbatta ) It will be very helpful for me.
Under Construction
...
2002 examplehost. All Rights Reserved.
I dont know what examplehost is and my homepage in no way says any of this. I realize this is now getting off topic so Im going to move it to the general G board.
[edited by: ThomasB at 1:28 pm (utc) on May 4, 2005]
[edit reason] examplified [/edit]
Under Construction
...
2002 examplehost. All Rights Reserved.
Again, my home page in no way says anything about being under construction. very strange.
[edited by: ThomasB at 1:28 pm (utc) on May 4, 2005]
[edit reason] examplified [/edit]