| 5:21 am on Jul 2, 2012 (gmt 0)|
If Xenu behaves the same as w3's link checker, the IP belongs to the person checking the link* while the UA belongs to the link checker itself. If the referer slot is blank, it means that somebody, somewhere has fed your home page URL directly into the Link Checker. Which is as it should be; you don't want people running around saying "Click here to check the links on this page that has nothing to do with you".
Yes, anyone can check anyone's links. But w3 is a scrupulous follower of robots.txt and I hope Xenu is too. (Can't check, as I believe it's strictly for Windows and I don't have an emulator.)
You can easily test it yourself by running Xenu on any page of your own that links to another of your sites. See if it sends a referer. For comparison purposes I used w3-checklink on one of my own pages that links to another site I also have access to. The referer comes through loud and clear.
The /example link means that either someone typed it in directly (note that the two visits are a full minute apart) or clicked on someone else's link and had to be redirected to the correct name, or ::cough-cough:: that you've got the wrong link on your own home page. This too is easy for you to check ;)
D'oh! While continuing to play with w3's Link Checker, I see that one of its options is "Don't send the referer header". It's off by default, meaning that it will send a referer unless you tell it not to. See if Xenu has the same option.
* That is: it's my own IP if I run it from my local copy, and the University of Whatever-It-Is if I run it from the browser.
| 6:49 am on Jul 2, 2012 (gmt 0)|
When I am working on a site I usually untick the 'check external links' option in Xenu.
Only once all the internal linking and navigation is fully working do I do a final run with 'check external links' ticked.
| 6:58 am on Jul 2, 2012 (gmt 0)|
I block Xenu.
| 1:32 pm on Jul 2, 2012 (gmt 0)|
I do as well, however its remarked-out when I use Xenu on my own sites.
The tool is very useful to a webmaster on their own site (s) for verifying links.
The software is also highly configurable allowing the user to select the number of directory levels deep as desired to go in the structure.
| 7:23 pm on Jul 2, 2012 (gmt 0)|
Check the source IP. Today I blocked a server that was using xenu.
Xenu is a very useful tool. I get a small number of hits from various versions of it but, like wilderness, I block it unless it's me - which it seldom is nowadays as I use linux for most things now.
| 7:35 pm on Jul 2, 2012 (gmt 0)|
If Xenu is crawling your entire site or a large portion of it then someone is probably analysing your site. You might want to block that.
If you get a Xenu hit on just one or several URLs, someone is merely checking that the outgoing links from their site are still working. Blocking this could mean the other party stops linking to you.
| 8:02 pm on Jul 2, 2012 (gmt 0)|
If someone is linking to you, I would assume they have a website. If they don't give you a referer, it seems they are hiding. And if they are hiding, they are up to no good. You all use Xenu for a tool for your sites, but why would someone want to check MY site, unless they are linking to it. I don't mind having people link to my site, but I'd like to know who they are. I don't want unscrupulous snakes linking to my site for any reason.
| 8:12 pm on Jul 2, 2012 (gmt 0)|
Xenu doesn't send a referrer. As well as checking all of the internal links within a site, it can also check all the outgoing links to other sites.
I use it from time to time to check all of the outgoing links from sites I work on in a matter of minutes. Links that return 301 or 302, I update or delete. Links that return 404 or 410, I delete. Links that return a 5xx code, I also delete.
| 12:28 am on Jul 3, 2012 (gmt 0)|
|If they don't give you a referer, it seems they are hiding. |
You could say the same thing about any authorized robot like google. It learned about page #1 on Site A from a link on page #2 of Site B-- but when it goes to check out Page #1, it doesn't give Page #2 as referer.
w3-checklink sends a referer by default. Xenu apparently doesn't.
Honestly, I haven't seen any reason to get worried. We're not looking at an IP from some notorious Ukrainian server farm running through a list of phony UAs. They're not checking out your site for some nefarious purpose ;) They're simply verifying that it's still there so they can continue linking to it. That's why all they need is the HEAD.
If the link checker can't get in, the human in charge might check manually. But this is annoying and time-consuming, so more likely they will just delete the link.
| 4:44 am on Jul 3, 2012 (gmt 0)|
In other words, some system administrator is going over their site and are finding links to my site, and they are checking those links to see if they are viable. Nothing wrong with that. Most of the visitors are hitting the home page. Sounds OK. It was the missing referer that was a worry, but if that is a default setting with Xenu, it makes sense. Thank you all for helping me understand.
| 6:58 am on Jul 3, 2012 (gmt 0)|
I don't like things that start with the letter "X" never have, never will :)
| 7:21 pm on Jul 3, 2012 (gmt 0)|
> I don't like things that start with the letter "X"
You should read "why" on this one - it's on the web site. Give Xenu cautious approval. :)
| 7:53 pm on Jul 3, 2012 (gmt 0)|
One more question.... How often would one use Xenu to check for broken links in a day? I have the same IP coming in 6 times today, just hitting the home page. This is the second day for this. Do you have to check it that often?
| 8:39 pm on Jul 3, 2012 (gmt 0)|
I use the software about once a year to clean verify links.
With that annual process (my own internal links number in the thousands) and I might run the software 2-3 times over a few days to verify that the links have been corrected.
I would NEVER run the software routinely on external websites, however even with that said, I've been using the software for more than a decade.
A new user would be prone to simple trial and error configuration mistakes requiring re-running the software.
Stop dinking around and simply add Xenu to your UA deny list.
| 9:46 pm on Jul 3, 2012 (gmt 0)|
I'd check external outgoing links on a site several times per year and run the test several times over a few days.
| 10:02 pm on Jul 3, 2012 (gmt 0)|
I'm gonna give the dude a few more days. By the way, for those of you who are in the USA, last year a friend got too close to the fireworks on the 4th and lost an eye, so please be careful. Have a great 4th.
| 6:44 am on Jul 4, 2012 (gmt 0)|
|You should read "why" on this one - it's on the web site. Give Xenu cautious approval. :) |
I was joking. I've had Xenu installed on a couple machines, but not currently. Used it a couple times; worked as advertised, but don't need those types of tools anymore. My outgoing link model self-polices (my bot) and my internal linking is built from the server hierarchy so there's really nothing to check either way.
As for giving Xenu a cautious approval, I think not. Anyone that removes my link (thus getting his removed) solely on the report of a mindless tool without a look-see doesn't deserve link exchange IMO. I don't care about the number of links, only that they are from on-topic, information sources. I refuse most reciprocal requests.
Oh, and years ago I lived directly across from the Scientology Center in Hollywood so I'm familiar with the tale :)
| 12:42 pm on Jul 4, 2012 (gmt 0)|
I bet you the last part of that IP is 17.
2 HEAD shots on July 1st = Ate a nice 403 on one of my sites.
What are the chances that our sites have links on the same site the need to be checked? Someone got an ODP Dump file and checking dead links so they could build "next big thing" and plaster it with Ads?
I use Xenu myself for checking URL Rewrites when developing a site or make any changes to rewrite rules for URLs.
| 7:30 pm on Jul 4, 2012 (gmt 0)|
Overlooked question for those who use it: Does Xenu follow robots.txt? Do this thread's mystery visitors ask for it every time? No UA is immune from spoofing-- but most spoofers aren't bright enough to mimic the exact behavior of the spoofee.
Even if you've got a huge site with multiple links to the same external page, any link checker worth its salt will say "Been there. Done that" rather than check the exact same link all over again.
But as long as it's only the home page, further investigation is probably more trouble than it's worth.
| 7:57 pm on Jul 4, 2012 (gmt 0)|
From what I remember, it doesn't follow
robots.txt, nor should it in my opinion.
| 9:35 pm on Jul 4, 2012 (gmt 0)|
No, it doesn't follow robots.txt - linkcheckers don't. They are not bots.
| 5:23 am on Jul 5, 2012 (gmt 0)|
|very useful for verifying links |
And very useful for seeing what spies can see/access. For example are the pages in your protected/membership areas really protected?
| 6:25 am on Jul 5, 2012 (gmt 0)|
|No, it doesn't follow robots.txt - linkcheckers don't. They are not bots. |
Huh. w3's does. In fact I had to install it locally
:: insert agonizing story involving long session with Terminal ::
to check a two-volume ebook because I'd locked myself out and it got confused and refused to recognize any further changes along the lines of "Come back! All is forgiven!"
It's better now. My robots.txt lets the link checker go everywhere, and it does. Other sites aren't so welcoming. One even slaps it with a 403, which strikes me as overkill-- especially in a site that doesn't appear to have a robots.txt at all. (You're supposed to be able to see it on demand, aren't you?)
| 6:30 am on Jul 5, 2012 (gmt 0)|
Some sites will serve a different robots.txt depending on the requesting user-agent. You get to see 'your' permissions, but not everyone elses.
| 6:45 pm on Jul 5, 2012 (gmt 0)|
|One even slaps it with a 403, which strikes me as overkill-- especially in a site that doesn't appear to have a robots.txt at all. (You're supposed to be able to see it on demand, aren't you?) |
I make a special exception for robots.txt and show you're banned and serve up a 403 forbidden for all other pages.
I do show only the robots.txt permissions for just your user agent to avoid people from attempting to change the user agent to something I allow and test my validation.
Just because it says Xenu doesn't mean it's actually Xenu. I have Xenu as an option in my browser's User Agent Switcher add-on for testing. A lot of people allow Xenu because they use it themselves so it's a good example of a user agent that might slip thru someone's bot blocking defenses.
| 8:29 pm on Jul 5, 2012 (gmt 0)|
|I do show only the robots.txt permissions for just your user agent to avoid people from attempting to change the user agent to something I allow and test my validation. |
Presumably you've got a fallback for UAs that your server has not previously met-- even if it's a flat
You can hardly fault a robot for disobeying robots.txt if it can't find one. Even google will happily interpret a 404 as permission to go everywhere. They say so explicitly.
Besides, that's Big Boy stuff. I honestly can't imagine that more than 1% of websites go to the length of checking UAs and serving up different "robots.txt" files depending on who's asking. Seriously, what's the point? If you're prepared to camouflage your UA to go places you're not supposed to, why bother to ask permission in the first place?
| 8:37 pm on Jul 5, 2012 (gmt 0)|
> can't imagine that more than 1% of websites
You only have to browse through the regular complaints of site scraping in the local google forum to see that. And those are supposedly competent webmasters.
It would be VERY helpful if default setups for new web hosting included a decently composed htaccess. Plus, of course, a list of server farm IPs and an optional list of (eg) UA, CN, VN etc IPs. But dream on. :(
| 8:48 pm on Jul 5, 2012 (gmt 0)|
|How often would one use Xenu to check for broken links in a day? I have the same IP coming in 6 times today, just hitting the home page. This is the second day for this. Do you have to check it that often? |
Actually that could have been me if it had been a month ago. Last month I finally switched an older website to PHP 5.3 and had to make major modifications. I had Xenu run at least a dozen times in two days and I don't know if I had the external link checking option on or off.
So I would assume someone is just fiddling with his website and forgot to switch external link checking of.
Xenu is not a vulnerability scanner, it does not much more than check if a website is still there. I'd just ignore it as long as it does cause too much traffic or shows any signs of acting like a vulnerability scanner.
| 12:55 am on Jul 6, 2012 (gmt 0)|
|Presumably you've got a fallback for UAs that your server has not previously met-- even if it's a flat |
Since I whitelist user agents, that's the default.
|If you're prepared to camouflage your UA to go places you're not supposed to, why bother to ask permission in the first place? |
Because speed traps, volume traps and other ways of easily detecting steath UAs don't typically apply to bots that have permission which is why using one of my allowed user agents to gain access would be a breakthrough for the bad bot as validated access disables all other tests and you get a free pass.
When I use Xenu I temporarily enable it, run it, then disable it again.
No a big deal when it's just a checkbox to turn it on/off from a control panel :)
| This 39 message thread spans 2 pages: 39 (  2 ) > > |