homepage Welcome to WebmasterWorld Guest from 54.167.144.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots code
hollyhats

10+ Year Member



 
Msg#: 140 posted 5:31 pm on Oct 3, 2002 (gmt 0)

Can someone please help me to understand the following that appears in my raw access files?

209.249.67.146 -"get/robots.txt HTTP/1.0"404
64.68.82.47 -"get/robots.txt HTTP/1.0"404
64.68.82.47 -"get/HTTP/1.0"200 4872

Also, how do you know who the codes belong to? ex(209.248.67.146_

 

korkus2000

WebmasterWorld Senior Member korkus2000 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 5:36 pm on Oct 3, 2002 (gmt 0)

The first 2 are robots looking for your robots.txt file. You need to have one on your root directory. Here is a tutorial so you can get one up there.
[searchengineworld.com...]

IP of requester: 209.249.67.146
What they are asking for: -"get/robots.txt HTTP/1.0"
Status: 404

hollyhats

10+ Year Member



 
Msg#: 140 posted 5:54 pm on Oct 3, 2002 (gmt 0)

I read over the link you gave me. Question. What files would I not want to be indexed? So, if you don't add the robot.txt file, isn't it crawling all files?

korkus2000

WebmasterWorld Senior Member korkus2000 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 6:04 pm on Oct 3, 2002 (gmt 0)

You want to create a file called robots.txt and add

User-agent: *
Disallow:

This will allow robots to spider all your files. Information that you didn't want robots to look at that are sensitive and not for the public or executable files that would fire on a page request would be reasons for exclusion.

hollyhats

10+ Year Member



 
Msg#: 140 posted 6:06 pm on Oct 3, 2002 (gmt 0)

Thank you and the tutorial was very helpful. I saved it for future reference.

hollyhats

10+ Year Member



 
Msg#: 140 posted 6:48 pm on Oct 3, 2002 (gmt 0)

Okay, I am trying to set this up and feel really stupid...
But
Is this what it should look like?
<meta name="Robots.txt" content="user-agent:*">
Disallow:

What is supposed to go in front of Disallow?

Quinn

10+ Year Member



 
Msg#: 140 posted 6:54 pm on Oct 3, 2002 (gmt 0)

You'll want to open a text file (with notepad, word, pico...) and type

User-agent: *
Disallow:

then save it as robots.txt and upload it to your root directory.

JamesR

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 140 posted 8:12 pm on Oct 3, 2002 (gmt 0)

You can deny robots to a page two ways, one with the robots.txt file and one with a meta tag on each page.

The meta tag format is:

<meta name="robots" content="noindex">

ratman

10+ Year Member



 
Msg#: 140 posted 9:06 pm on Oct 3, 2002 (gmt 0)

If you want to use a robots.txt file, I'm sure Brett won't mind if you use the Webmasterworld one as an example.

[webmasterworld.com ]

Also, how do you know who the codes belong to? ex(209.248.67.146_

That bit of code is known as the IP address. You cannot really trace who the individual person is, but you can trace the company/ISP. A good site for tracing IP's is:

InetCheck [dataphone.se]

Hope this helps
ratman

hollyhats

10+ Year Member



 
Msg#: 140 posted 12:13 pm on Oct 4, 2002 (gmt 0)

Yesterday, I added the robots.txt. to my file as described in the previous messages. Still today, on my raw access I see this:
193.7.255.244 robots.txt 404

This means error, right?

korkus2000

WebmasterWorld Senior Member korkus2000 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 12:27 pm on Oct 4, 2002 (gmt 0)

Hollyhats you need to create a file called robots.txt. This is going to be a text file you will view in notepad. Make sure it is completely blank. Then add:

User-agent: *
Disallow:

Do this in notepad or a text editor. Then upload it to your root directory. Make sure its at www.yoursite.com/robots.txt

Make sure the name is robots.txt also.

404 is an error code meaning the document cannot be found.

hollyhats

10+ Year Member



 
Msg#: 140 posted 12:32 pm on Oct 4, 2002 (gmt 0)

Hmmmmm, I did that. At least I thought I did. I guess I need to check it out again.

hollyhats

10+ Year Member



 
Msg#: 140 posted 1:04 pm on Oct 4, 2002 (gmt 0)

Does it make any difference if you leave the space out:

This:
user-agent: *

Or this:
user-agent:*

hollyhats

10+ Year Member



 
Msg#: 140 posted 1:09 pm on Oct 4, 2002 (gmt 0)

Or does it matter if it says domain/robots.txt.doc
Meaning does the doc matter?

korkus2000

WebmasterWorld Senior Member korkus2000 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 1:10 pm on Oct 4, 2002 (gmt 0)

Yes the .doc makes it a microsoft word document. It has to be a .txt. Create the document using notepad and not word.

hollyhats

10+ Year Member



 
Msg#: 140 posted 1:27 pm on Oct 4, 2002 (gmt 0)

Okay, that must be the problem. Thanks Korkus you are always so helpful. You for sale?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 3:32 pm on Oct 4, 2002 (gmt 0)

HollyHats,

When dealing with robots.txt and similar "special files" like .htacess, the devil is in the details. If someone says "open it in Notepad" and you can't find Notepad, then it's time for another question - MSWord won't do. After you get the file created and uploaded, then use Brett's robots.txt validator [searchengineworld.com] to test it. One typo (including a missng space) in that file can cause you major problems, including being dropped from all search engines!

eBay has the best prices on the korkus2000 model, last time I checked.

But I've heard that Service Pack 1 for KorkusXP makes it much more stable, and so it might be worth the upgrade if your system is relatively recent. :)

Jim

hollyhats

10+ Year Member



 
Msg#: 140 posted 3:44 pm on Oct 4, 2002 (gmt 0)

Jd-
Thank you for that link. But, HOLY COW! Every line say Invalid or error. You can tell that I am new to the web design. I don't even know where to start to fix things. I guess I am darn lucky that google indexed me in the first 2 months of submitting. Geez.
One more question, can you refer to question #13 and answer that for me?

korkus2000

WebmasterWorld Senior Member korkus2000 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 3:50 pm on Oct 4, 2002 (gmt 0)

It's still a doc file just renamed as a txt. You need to open a text editor like notepad or wordpad and copy the text in there. Then save it.

Notepad is in your accesories from the start menu.

<added>
can you refer to question #13 and answer that for me

use a space
</added>

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 4:24 pm on Oct 4, 2002 (gmt 0)

One typo (including a missing space) in that file can cause you major problems, including being dropped from all search engines!

You may also want to use a different filename like "robotsx.txt", while working on this. After you get it working, rename it to robots.txt. This will prevent a real robot from coming by your site and reading your robots.txt while it is invalid. The robots.txt validator allows you to check any filename, and this is the reason why.

Jim

hollyhats

10+ Year Member



 
Msg#: 140 posted 4:54 pm on Oct 4, 2002 (gmt 0)

Good point, Jim
Thanks

jacon4



 
Msg#: 140 posted 12:45 pm on Oct 5, 2002 (gmt 0)

is a robots.txt file really required? is there a down side to not having one? my programmer does not wanna put one on my site because of security issues. i have DBs with ALOT of sensitive info

lazerzubb

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 140 posted 12:49 pm on Oct 5, 2002 (gmt 0)

>> is there a down side to not having one?

You can get pages or info spidered which you don't want spidered.

Do you have unlimited Bandwidth, then no problem! otherwise you might want to add a robots.txt

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 140 posted 3:14 pm on Oct 5, 2002 (gmt 0)

jacon4,

The point being that you should at least consider a blank robots.txt or one which contains only:

User-agent: *
Disallow:

This will prevent filling up your server logs with a whole bunch of 404-Not Found errors as robots try to fetch robots.txt while they spider your site. And since it contains no filenames in Disallow directives, I doubt it poses a security risk.

You could also build the robots.txt without regard to security issues, and then use second-tier techniques to secure your site, such as using .htaccess or scripting to "trap" access attempts which should not have been made by any User-agent which obeys robots.txt. I use a mixture of these techniques, to good effect.

Jim

hollyhats

10+ Year Member



 
Msg#: 140 posted 3:17 pm on Oct 5, 2002 (gmt 0)

Jd-
Good point. I added the Robots.txt * Disallow: to my files and it has reduced the amount of 404's.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved