Forum Moderators: goodroi
Can I address this spider as Scooter only? Do I need the version (2.0, 1.0)? Should I make a robots.txt file for each of these?
Also, I read some advice from a professional at the Web Position Gold forum that said to make a doorway page for each keyword, but I don't observe this in any page when looking at the source code. In fact, pages have come up on a search for my keywords that don't even list them as keywords. Would this be beneficial if I have only a half dozen or so keywords?
Any advice would be appreciated. I have been researching for months and I am ready to dig in and start submitting.
Thanks.
As this thread was already started may I please follow it up here with a similar question? I'm a bit confused about robots.txt.
I've read just about everything I can find on it. I understand how to make the file. I know where it must reside on the server. I understand about spider agent names and spider IP addresses. However, what I can't find out is how I'm supposed to monitor the robots.txt file once it's up there.
I'm already running Web Trends and Hit Box simumtaneously so I do get information about spiders visiting my site. But it sure would be nice to be able to access one space (i.e. robots.txt) where I could get a read for spider activity only.
If anyone has any advice they can give, it would be appreciated very much. I know I'm close to having this down pat, but not quite there yet.
Thank you.
p.s. Wonderful site.
The only other way to monitor robots.txt is via your server logs. If you don't have a robots.txt then look in the site error log for "file not found".
Hitbox can not track a spider. Webtrends can of course since it uses your server log files, but hit box is a graphic counter only (records about 75% of your hits).
Anyway, Brett, I'm wondering how did you configure Apache to use SSI on a text file? Did you do AddType type via .htaccess or was it... more involved, tiresome, lengthy?
I, too, would like to view all the hits to robots.txt separate from my other logfiles. Including ban them using wildcards. Seems like I have to get too specific using my plain robots.txt file and I keep getting hit with subtle variations of the same pesky bots.
Rather than make my robots.txt look like Webster's Dictionary, I'd just like some little exec file that logged 'em, welcomed them, or banned them. It'd also be slick to re-use the same little script for multiple clients as a separate deal instead of configuring and reconfiguring robots.txt. Update one script - and let it fly.
Thanks,
Idiotgirl
If you have access to your HTTP daemon configuration files then use "AddHandler server-parsed .txt" in (usually but not always) /etc/httpd/conf/httpd.conf
If you want to use an .htaccess file (or whatever it's called in the AccessFileName directive) then you can add "AllowOverride FileInfo" (or "AllowOverride All") to httpd.conf (or ask your server admin).
Performance can suffer considerably if overrides are enabled, so don't be surprised if your admin won't let you use .htaccess
Warning: www dot apache dot org is likely to be far more accurate and reliable than me.