This doesn't answer your question, but I've been in the same boat for a while. I had a blog-like site that was linking to everything using a 302 redirect so that I could count the clicks on the links. After reading here and learning about "page hijacking" and that google might really frown upon 302 redirect links because of all the hijackers, and after google decided for some reason to kick me where it counts at the beginning of February and only send 10% of the visitors my way that I was getting before, I've now given up counting the click-throughs and just use direct outbound links instead. Waiting to see if my google traffic picks up at all. So far, nothing.
Will the new rel="nofollow" attribute do the trick?
(Do a google search for rel="nofollow" to get to their blog website with the info, it won't let me post a link directly to it)
Unless you're running a DMOZ clone or a scraper kind of site, a more important consideration is to check links for the the benefit of your users.
Use an automated link checker. InfoLink is now free; that's a great piece of software to start with.
At anybrowser.com, there is a tool called 'Link Check' that will check links on a page for you. I think this is better than secretly re-directing because I believe this(secretly re-directing) is something that your site could potentially be penalized for(more so than just some broken links, etc).
Having quality, direct, outbound links is a good thing!
If you do not agree, just put the link directory in a certain folder and make that folder available to robots.
Let's say the script was removed, Is 1) really worse?
1) Having 100,000+ outbound links that pass through a redirection php script (which also makes sure they aren't 404s..)
2) Having about 25,000 (at first until it is cleaned up) links being 404 errors and about 75,000 code 200 links..
25% is a high guess but let's just use that as an example for this..
If you are going to continue to use a method to hide the links, than, yes, I think #1 is worse. While you may be using the program for good, others are not. You could get lumped in with the "bad hats" or "black hats", whatever they are currently called, as opposed to "white hats". I, personally, would not take this risk. You could be dropped completely from Google(and other search engines). I have seen this happen.
Direct links are the best way to go. But, 25% bad links(404 errors) is a lot! You could easily get this down to a much lower % by using a free program, like the one at anybrowser.com, to cut your numbers in half, or delete the bad link completely. You aren't going to be able to continue to run the link directory in future years unless you take the time to do this.
Hire someone in another country(with a lower minimum wage) to do the work for you. For 100,000 links, it would take your employee about 20 days @ 8 hours a day. It would be worth the $'s or time. My timetable is based on how long it takes me, or my own foreign employee, to check my links, a little over 5000. Either one of us can check them all in one 8 hour day.
Repeat the process about once per year.
Please. Hand checking, or even a website where you enter one URL of your site at a time, is not the way to go. Use a software package.
But you do have to combine this with manual spot checks, because many sites do not allow automated link checking or else do not generate a 404 error when the content is gone.
I tried anybrowser to check links. It did a great job of checking the top level page I gave it, but it didn't go very deep.
Do you know of any tools that will spider and check an entire site, 4 or 5 levels deep. (This is a huge directory that goes 3 sub folders deep)
Does InfoLink do this?
Try the W3C link checker: [validator.w3.org...]
There is a box where you can enter how many layers deep to check recursively.
W3C did a fine job. It even suggested improvements to my links
like leaving out index.html or adding a trailing slash and the like.
It was something of an eye-opener. - Larry
i have used xenu before. its an older program but it will get the job done for what I need.
search on google for it
I should add that it will be _slow_ going for sites with thousands of outward links
and that I only tested it with honest <a href= www .. type links. - Larry
Great, between a few of these tools I should be able to compile a nice list of broken links..
How about hiring help? Any recommendations on where to find and hire cheap but efficient help in removing broken links?
Your post doesn't make much sense to me.
If you really want to check if your links are broken then use the W3C link checker.
However, I get the suspicion that that's not what you're after. You actually don't want the sites you are linking to to get any pagerank from you. If that is the case then use...
But beware. You won't get the benefits you think you will.
Thanks larryhatch, for the link. I did notice though, don't click on anything until it finishes! I clicked on something and the entire process started over!
I found help with my site through word-of-mouth, I know someone that has a site for WAHM(work at home moms). Visit some sites like this and post a message that your looking for help. There are plenty of WAHMs looking for work.
I still like to manually check the links every now and again. If a page turns into a search engine or into a different site, link checking programs won't always catch it.
I'm about to start writing an application that will crawl a site looking for outbound links and produce an HTML summary of:
Links that produce error or redirect responses and links that have changed since the last check.
From there, I can click and see any pages that have changed and remove them manually if needed.
If you're intested in a copy when I've finished it, sticky me now and I'll let you know.
The W3C checker will make your browser hang when used recursively across many pages. At least that's my experience with Firefox as well as IE.
You can actually download it and run it from your machine. I don't remember if it's perl or php (think it's perl though).
Yes, it froze up for me too, I downloaded it, gonna try to install it, it is perl.
However, I get the suspicion that that's not what you're after. You actually don't want the sites you are linking to to get any pagerank from you.
These are the kinds of sites I avoid like the plague when searching for sites to submit links to.
My directory has 13,000 outbound links and I want to maintain a high degree of working - my target is 99% at all times working.
1. I copy my working directory to an apache htdocs on my design machine and using Ultraedit remove all the redirection scripts across the 800 pages.
2. Using Apache and Zenu, I check all the links for a head response and a redirection to a new domain and make changes to the dierctory as needed. I call each company after 2 weeks to see if they have a new domain or what is happening.
3. As NetSol no longer returns a redirection, I then locate every Netscape server and check those links for redirections through Internet Researcher
4. As that still does not find the dumb designers using html redirection (as compared to a 301 server redirection) to a new domain, I run Internet Research every month on 1/12 of the urls to locate all the refresh lines and see if it is refreshing to a new domain. (So they are checked on an annual basis)
5. As a final check I pay my father in law to check 1/12th of the links every month about 6 months out of phase of #4.
This is a lot of work, but I have 4000 user sessions a day, 150,000 referrals per month and 480 advertisers with an average per click of 30 cents for the adverisers - and I make money and have fun skiing at Deer Valley in Utah and playing golf.
What you need is the Xenu Link Sleuth, a robot that spiders your website and reports whatever you tell it to.
When you have a website with thousands of outbound links you really should be using a MySQL database or similar in combination with a scripting language, that way you will also be able to track clicks without thinking about SE's.
Have an application written that automatically removes the link anchors of 404 errors from the HTML code. That way the text stays, the hyperlinks disappears and all is done automated on your harddrive before you upload to server.
My problem is that my website has an extreme number of outbound links.
Note: The most important goal here is to stay on good graces with Google (proper webmaster guidelines so we're not penalized in any way) they are sending a large majority of our traffic. A close second goal is to be user friendly. Without Google our site wouldn't exist, so we have to make sure we are 100% sctrict (or as close as possible) with their guidelines. Sounds bad that user friendly isn't number 1 concern but their wouldn't be users if it wasn't for Google.
I'm curious to hear feedback about which of the following situations will be "better" in Google's eyes:
Note: I'm trying to find a way to deal with outbound links which because of excessive quantity can't be dealt with by hand to check for linking to bad neighbhorhoods,etc
1) PHP redirection script used for every outbound link can set it between 301 or 302.
Down side: Potential of linking to "bad neighborhoods"
-If a link goes through a redirection script and the script is blocked via robots.txt, does the link get crawled and open up the door for potential bad neighbhorhood penalization?
Down side: Google may frown upon excessive use of javscript.
-Is this cloaking?
3) Plain text outbound links
Down side: Potential of linking to "bad neighborhoods"
-Does rel="nofollow" make Google ignore the links, i.e. get rid of the bad neighborhood problem?
4) TEXT URLs, no outbound links. Example:
Title: Widgets for sale
URL: www[dot]widgets[dot]com (without the [dot]'s of course, in the form of a URL just not a hyperlink)
instead of <a href="http://www.widgets.com">Widgets for sale</a>
Down side: Usability
-Does google crawl text URLs?
-Will google penalize for text URLs?
-Is there anything wrong with text URLs in this situation over links, aside from usability?
Note: The links are NOT paid nor necessarily reciprocal, so there's no link partner obligation.
[edited by: Ept103 at 6:31 pm (utc) on Mar. 8, 2005]
I run a regional directory and use a popular directory script program where my links are using the following format:
Obviously, these re-direct/resolve to the listing's actual URL.
Is this hurting me in the eyes of Google, Yahoo, MSN, etc. when they spider my site?
The main resason I used this format so it is more difficult for folks to scrape the site for URLs. I've had this happen several times in the past. I know it won't prevent it but will make it more difficult.
Also, how about using a product that not allows copying of links on a web page?
What type of redirect is it giving you 301, 302?
why not use ASP or something?
Set HttpObj = Server.CreateObject("AspHTTP.Conn")
HTTPObj.Url = "http://www.company.com/"
check for a valid response and redirect to the site if it exists, or show some other page with related links if it doesn't exist, and mark that URL as 404 in the DB.
carguy84, would Googlebot or others be able to see those links when they crawled a site?
Maybe I don't understand, but the rather new rel=nofollow attribute seems to do exactly what you need.
Treated with some detail on Googleblog:
1) it says explicitely that it does not transfer page rank
2) it says that "This isn't a negative vote for the site where the comment was posted"
3) it does not say wether the Googlebot will crawl the linked-to site, but I presume not (after all, it says "nofollow!"). So even a 404 would not be hit by Googlebot
Am I wrong?
Some added notes on the W3C link checker:
It worked OK on my site, but I only have around 130 pages.
No, don't click on anything while its running, be patient.
It will NOT find all links you should fix/remove!
One odd link I had died and was eaten by a spammy portal page.
I found that by accident and fixed it in a hurry.
It pays to visit links in person. Tedious, I know.
Too many 404s will make your site look crappy and unmaintained,
both to visitors and to the SEs. -Larry
The links are re-directed as 302s.
In practice, most of the time it obeys robots.txt, but with various exceptions - if it saw the URL before it saw robots.txt, if there's an external link to the URL, if it just feels like it, and so on.
Of course, if your site has so many links you don't know where they go, Google may not consider to be the sort of "quality" site that it wants to rank well.
| This 43 message thread spans 2 pages: 43 (  2 ) > > |