homepage Welcome to WebmasterWorld Guest from 54.211.47.170
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt
Kenny B




msg:1527188
 6:13 am on Apr 14, 2000 (gmt 0)

I am a litle confused on how to address spiders in my robots.txt file. Some are straightforward like ArchitextSpider, some are not. The spider Spotting Chart on this site lists the Alta Vista spider as:
Scooter/2.0 G.R.A.B. X2.0
Scooter/1.0
scooter@pa.dec.com

Can I address this spider as Scooter only? Do I need the version (2.0, 1.0)? Should I make a robots.txt file for each of these?

Also, I read some advice from a professional at the Web Position Gold forum that said to make a doorway page for each keyword, but I don't observe this in any page when looking at the source code. In fact, pages have come up on a search for my keywords that don't even list them as keywords. Would this be beneficial if I have only a half dozen or so keywords?

Any advice would be appreciated. I have been researching for months and I am ready to dig in and start submitting.

Thanks.

 

Brett_Tabke




msg:1527189
 4:35 am on Apr 22, 2000 (gmt 0)

Hi Kenny, sorry I missed this post the first time around. Yes, you should be able to do just "scooter" alone. It isn't perfect, but it does work.

I wouldn't do tranditional doorway pages with 'spam' keywords. I've never cared for them, and they are easier to spot these days than every before.

Phantamage




msg:1527190
 1:38 pm on Apr 22, 2000 (gmt 0)

Hey ya,

As this thread was already started may I please follow it up here with a similar question? I'm a bit confused about robots.txt.

I've read just about everything I can find on it. I understand how to make the file. I know where it must reside on the server. I understand about spider agent names and spider IP addresses. However, what I can't find out is how I'm supposed to monitor the robots.txt file once it's up there.

I'm already running Web Trends and Hit Box simumtaneously so I do get information about spiders visiting my site. But it sure would be nice to be able to access one space (i.e. robots.txt) where I could get a read for spider activity only.

If anyone has any advice they can give, it would be appreciated very much. I know I'm close to having this down pat, but not quite there yet.

Thank you.

p.s. Wonderful site.

Brett_Tabke




msg:1527191
 9:20 pm on Apr 22, 2000 (gmt 0)

This is rather bizzare, but I've setting here trying to get Apache to execute an ssi on a TXT file (very tricky, but doable). Once that is done, I can run a logger from the robots.txt and monitor pulls.

The only other way to monitor robots.txt is via your server logs. If you don't have a robots.txt then look in the site error log for "file not found".

Hitbox can not track a spider. Webtrends can of course since it uses your server log files, but hit box is a graphic counter only (records about 75% of your hits).

fantomaster




msg:1527192
 8:44 pm on Apr 23, 2000 (gmt 0)

Would you care to publish that Apache trick to execute SSIs from txt-files here, please? This sounds most interesting.

idiotgirl




msg:1527193
 5:04 am on Sep 25, 2001 (gmt 0)

I found this older post while I was looking for... oh heck, I forgot what I was looking for.

Anyway, Brett, I'm wondering how did you configure Apache to use SSI on a text file? Did you do AddType type via .htaccess or was it... more involved, tiresome, lengthy?

I, too, would like to view all the hits to robots.txt separate from my other logfiles. Including ban them using wildcards. Seems like I have to get too specific using my plain robots.txt file and I keep getting hit with subtle variations of the same pesky bots.

Rather than make my robots.txt look like Webster's Dictionary, I'd just like some little exec file that logged 'em, welcomed them, or banned them. It'd also be slick to re-use the same little script for multiple clients as a separate deal instead of configuring and reconfiguring robots.txt. Update one script - and let it fly.

Thanks,

Idiotgirl

ciml




msg:1527194
 1:02 pm on Sep 25, 2001 (gmt 0)

In Apache you only need AddType if you want to define .txt files to be something else, (such as HTML). IE is broken in all versions I've tried though, so don't expect it to treat HTTP Content-Type correctly. :(

If you have access to your HTTP daemon configuration files then use "AddHandler server-parsed .txt" in (usually but not always) /etc/httpd/conf/httpd.conf

If you want to use an .htaccess file (or whatever it's called in the AccessFileName directive) then you can add "AllowOverride FileInfo" (or "AllowOverride All") to httpd.conf (or ask your server admin).

Performance can suffer considerably if overrides are enabled, so don't be surprised if your admin won't let you use .htaccess

Warning: www dot apache dot org is likely to be far more accurate and reliable than me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved