Forum Moderators: open
got this thru google - [freeware.sgi.com...]
"GNU Wget (wget) is a freely available network utility to retrieve files from the World Wide Web, using HTTP (Hyper Text Transfer Protocol) and FTP (File Transfer Protocol), the two most widely used Internet protocols."
Onya
Woz
You can set robots.txt on or off (this is typical; most personal bots that even bother with robots.txt appear to have a switch that allows you to ignore this file).
The only thing wget DOESN'T have, as far as I can tell, is an optional timer to avoid hitting a site so fast with successive GETs. There's a timeout for a hung connection, but not for being polite on someone's site.
You can download wget for free on the Internet at various places. The Windows version I have works in DOS32 mode, and lacks some of the wildcard versatility of the Linux version.
As far as I can tell, the did not get robots.txt, no oide if they honor the meta tag.
Since it is a distributed system, blocling one ip also won't do much.
All I can hope is that they improve their code to at least honor robots.txt, and they change the wget/1.6 ua to grub-wget/1.6 or something.
How to enable robots such as wget to grab my html page?
My html header is:
<html><head><title>My Store Here</title>
<META NAME="description" CONTENT="A lot of content here">
<META content="text/html; charset=windows-1252" http-equiv=Content-Type></head>
And the page is located in the second level of our web site, i.e. it has a link <a href> in my homepage
But I tried to use wgets, -r, it does find this page and but does not search deeper from this page.
What is wrong with my html header?
Should I add something else?