wget question

Forum Moderators: bakedjake

Message Too Old, No Replies

wget question

I don't want it to follow our redirects to other sites

sublime1

6:00 pm on Sep 20, 2004 (gmt 0)

Hi all --

I am using wget to check for bad links on my site. We have a lot of links on our site that our servers redirect to other sites (e.g. a link to an affiliate that we want to track in our logs). They are dynamic links that look static, that is they end with a special ending like *redir.html.

I am having a hard time preventing wget from following these links. Here's an example:

wget --recursive --delete-after --no-directories --no-host-directories --reject="*redir.html" [mysite.com...]

There are two problems: first, wget doesn't seem to be honoring the --reject= command (which should prevent it from following all links with this pattern, or so I believe. Second, regardless of whether host spanning is on, off or what I have in --exclude-domains it still follows this redirect.

I know this is not the greatest place to post this, but I appreciate any help I can get :-)

Thanks --

Tom

quesera

4:53 am on Sep 22, 2004 (gmt 0)

--reject doesn't get applied to .html files.

The argument is that --recursive doesn't make any sense if you are excluding some .html files. I don't entirely agree, but I see the logic.

I didn't know that wget would follow a redirect recursively into a new site (without specifying --span-hosts). I'd call that a bug.

I've never seen that problem... Looking at a bunch of wget scripts, I notice that I always use --no-parent. It doesn't make much sense, but perhaps --no-parent causes wget to skip the link because it's not below the starting point?

It's worth a try.

sublime1

9:16 pm on Sep 24, 2004 (gmt 0)

Thanks -- I'll give that a try and let you know how it works. In subsequent research, I found I was using version?.8 and there was a version?.9 available that was supposed to fix a problem with redirects going to foreign sites. After downloading, configuring, compiling and so on, it still did the same thing. Maybe I'll dust of my C programming and fix it myself :-)

nalin

11:03 pm on Sep 24, 2004 (gmt 0)

Linklint might me be more suitable to your purposes...

wget question

I don't want it to follow our redirects to other sites

sublime1

quesera

sublime1

nalin

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week