Welcome to WebmasterWorld Guest from 54.162.94.181

Forum Moderators: goodroi

Message Too Old, No Replies

New to robots.txt and how spider find robots

     
6:51 am on Apr 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 8, 2004
posts:5
votes: 0


Hi everyone,
I am totally new to the robots.txt
Can anyone show me how to put syntax in meta tag, create a small and quick robots.txt please.
Also, my website , I think never been submited to any search engine,.So if I put robots.txt in my root, is it gonna help Search Engine Find my website so next time their spiders can crawl into my webpage? PS: MY website was never submited to any search engine.
Thanks a lot for any piece of information from all of you

Jazzvn

7:04 am on Apr 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Dec 15, 2003
posts:30
votes: 0


Welcome to webmasterworld jazzvn

A robots.txt file is simply a text file created in a text editor that is stored on the server. Here is a simple robots.txt:

User-agent: *
Disallow: /cgi-bin/
Disallow: /members/

Here is a site that details what the file is and what it does [robotstxt.org ]

Hope this helps to get you started.

7:15 am on Apr 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


jazzvn,

Welcome to WebmasterWorld [webmasterworld.com]!

Here's the Robots.txt standard [robotstxt.org].

Putting the robots.txt file in your root directory will not help to get your site indexed. The purpose of robots.txt is to tell "good" robots not to index certain pages or subdirectories of your site. Bad robots, such as e-mail address harvesters, will ignore robots.txt.

Typical uses are to keep robots from requesting your scripts and/or shopping cart, to stop them from indexing or copying your images, and to keep them from listing your "semi-private" pages - although this is no guarantee that those pages will be kept private!

Another use is to keep robots from requesting pages that you don't need indexed and consuming large amounts of bandwidth.

In order to get your pages indexed in search engines, what you need to do is to get incoming links -- get other sites which are already indexed to link to your pages. Best results will be had if the sites that link to your pages share the topic of your pages. Also, links from reputable directories such as the Open Directory Project (DMOZ) are good to have.

In addition to robots.txt, which is a text file placed in the root directory of your site, you can also use the on-page HTML robots meta-tag of the form:

 <meta name="robots" content="noindex,nofollow"> 

Note that if a page is disallowed in robots.txt, then the page won't be read by robots, and the above tag will have no effect.

Also note that Ask Jeeves and Google will index your page (list it in search results) if they find any link to your page, regardless of whether that page is disallowed in robots.txt. The only way to stop them from listing your page is to allow them to fetch it (don't disallow tha page in robots.txt) and use the on-page HTML robots meta-tag shown above.

Use a simple text editor such as NotePad to create your robots.txt. Once you have written your robots.txt file, validate it here [searchengineworld.com].

Jim

4:52 pm on Apr 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 8, 2004
posts:5
votes: 0


Hi Moderator,
Does it mean that my page doesn't allow spiders,
<meta name="robots" content="noindex,nofollow">

It say no roindex and nofollow.
I really want to have these:

User-agent:*
Disallow:/album/
# Album is my whole folder of family webalbum, and I don't want people see it. Is that code right? This will let any kind of spider but not into my www.domain.com/album, right?
Thanks

4:55 pm on Apr 8, 2004 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9686
votes: 0


Welcome, Jazzvn. Those code examples should work. You can test your code with some tools found here: [searchengineworld.com...]
1:19 am on Apr 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2003
posts:955
votes: 0


I don't want people see it.

People or robots?
Robots.txt can only prevent access to robots which obey the exclusion standard. You cannot prevent browsers/people - unless you use .htacess.

If you use robots.txt, theres no need to use META tags. If you use META tags, theres no need to use robots.txt.

So I'd suggest go with the robots.txt code you posted in your last message.

Good luck!
Sid

PS; welcome to WebmasterWorld!

11:09 pm on Apr 10, 2004 (gmt 0)

Full Member

10+ Year Member

joined:May 24, 2003
posts:242
votes: 0


<Also note that Ask Jeeves and Google will index your page (list it in search results) if they find any link to your page, regardless of whether that page is disallowed in robots.txt.>

Jim
A little help please
In your opinion would this also apply to Yahoo! Slurp?

Thanks & regards
Ray

11:41 pm on Apr 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2003
posts:955
votes: 0


Jim
A little help please
In your opinion would this also apply to Yahoo! Slurp?

No, Yahoo! quits requesting pages on your site once it see's that its disallowed - and it doesn't index the thing.

However, Google does this. If you do a query for "Overture", in result #3, even though content.overture.com [overture.de] has banned all robots via robots.txt, Google still has its URL.

So theres a chance of this happening in Yahoo! results, as Yahoo! still seems to use some of Google's.

Sid

11:49 pm on Apr 10, 2004 (gmt 0)

Full Member

10+ Year Member

joined:May 24, 2003
posts:242
votes: 0


Hi sidyadav

Thank you for the feedback and info

I asked the question because Yahoo! recently listed a whole site of ours that does contain a robots.txt forbidding indexing.
The strange thing is it lists all pages but with no descriptions.
Just the url and and company name which contains the link.

Been trying to find out why without success so far.

Any ideas?
Ray