Forum Moderators: goodroi

Message Too Old, No Replies

ABC of robots.txt

Now that will be the best tutorial!

         

Monzilla

9:35 am on Nov 18, 2005 (gmt 0)

10+ Year Member



Hello,

Now truely telling u that I am really new to this robots.txt

Now I want to know :

1.what exactly is robots.txt

2.Is it only applicable in html files or in php files too.

I have my whole site based on php and m on linux hosting.

How can I make use of it in there.

Can it work on php also or can linux hosting also use it?

Also I used google analytics
Now problem is that my site is on php so.. in index.php I dont find option to add the script that Google Analytics has given me.

Can you please help me out in this matter.
From A to Z

Staffa

6:04 pm on Nov 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



robotstxt.org

Dijkgraaf

6:05 am on Nov 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) It is a mechanism for telling compliant bots/spiders what URL's not to request from your web site
2) It doesn't even have to be to a file level, you can disallow directories or even the entire site. Anything starting with what you are disallowing will not be asked for by a compliant bot/spider.
3) Yes, you can use it with linux or any other web server. The scripting or markup languange doesn't matter at all.
4) Google Analytics has nothing to do with robots.txt that I'm aware of. But you will probably want to put it in some sort of common include file so that it appears in all your PHP pages.

As Staffa said, [robotstxt.org...] is a good resource is a good resource to read up a bit about it.

Monzilla

12:09 pm on Dec 1, 2005 (gmt 0)

10+ Year Member



Thanks

Also can u tell me how to make sitemap which can be submitted to Google Sitemap for my site?

topsites

9:45 am on Dec 7, 2005 (gmt 0)



For creating a sitemap, I use Xenu's Link Sleuth... It's a program checks all my links for me, heh... Once it is done link-checking, I save the results to a file. Then I open said file in a text editor and with a bit of searching and cut and paste, add header and footer and I'm left with a nice site map as a bonus.

Check it:
[home.snafu.de...]

p.s.: make sure you have 'sitemap' clicked under options.

As far as submitting to google, I don't think so. I link it from my main page and if google wants it, there it is.

Eljaybe

11:00 pm on Jan 3, 2006 (gmt 0)

10+ Year Member



Has anyone ever seen this on a page?

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Is this the same thing as no index/follow?

I found this code on an indexed Web page, ranked in the top 20 on Google. So, why is the page indexed if it wasn't coded to be indexed? Is it using an incorrect code?

jdMorgan

11:18 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Has anyone ever seen this on a page?
> <META NAME="ROBOTS" CONTENT="NOARCHIVE">

Yes, many.

The page is coded to be indexed, but not archived (copied or cached by Google with 'Cached' page display option shown in the search results). See the Google Webmaster Info [google.com] pages for more information.

Jim

Eljaybe

2:19 pm on Jan 4, 2006 (gmt 0)

10+ Year Member



Thanks for clearing that up, Jim.
But what would be the reason for not wanting Google to cache your page? Could the reason be that the page is updated often and you don't want Google to cache old content?