How to handle outbound links in a directory? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to handle outbound links in a directory?

1
2
»

Ept103

9:58 pm on Mar 5, 2005 (gmt 0)

10+ Year Member

We have a large directory for our distributors.

The outbound links are in a file which counts the clicks called link.php, we have a function called url which when used in this syntax, counts the link then redirects to an outside site.

Example:

[example.com...]

This would count a click for example2.com on our script then redirect to example2.com

The problem, is that we have over 100,000 outbound links on our site and it is virtually impossible to manually check all the links.

What prevention method can I take in order to make sure I'm not penalized for broken links or any site that is involved with methods that search engines frown upon? (I'm referring to links after the url=)

I would prefer a script that simply redirects in such a way that it doesn't count as an outbound link and redirects in some manner, is this possible?

I'm seriously thinking about writing a script that put the link in a text box, since I'm so sick of seeing 404 errors everytime I use a validator. I don't want to be penalized if any of these links are bad.

Any thoughts?

Since it is a "redirect" and I don't actually have a link to anything directly, could I still be penalized for my redirected links? (every outside link on my site uses the link.php?url= format)

...I was thinking, what if I add a robots.txt file like this:

User-Agent: *
Disallow: /link.php

That should solve the whole problem, right?

I'm not sure if blocking the robots has anything to do with counting an outbound link or not?

Looking for some opinions on this subject.

idonen

5:03 pm on Mar 7, 2005 (gmt 0)

10+ Year Member

This doesn't answer your question, but I've been in the same boat for a while. I had a blog-like site that was linking to everything using a 302 redirect so that I could count the clicks on the links. After reading here and learning about "page hijacking" and that google might really frown upon 302 redirect links because of all the hijackers, and after google decided for some reason to kick me where it counts at the beginning of February and only send 10% of the visitors my way that I was getting before, I've now given up counting the click-throughs and just use direct outbound links instead. Waiting to see if my google traffic picks up at all. So far, nothing.

Ept103

5:17 pm on Mar 7, 2005 (gmt 0)

10+ Year Member

Will the new rel="nofollow" attribute do the trick?

Comments?

(Do a google search for rel="nofollow" to get to their blog website with the info, it won't let me post a link directly to it)

jomaxx

6:23 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Unless you're running a DMOZ clone or a scraper kind of site, a more important consideration is to check links for the the benefit of your users.

Use an automated link checker. InfoLink is now free; that's a great piece of software to start with.

spaceylacie

7:22 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

At anybrowser.com, there is a tool called 'Link Check' that will check links on a page for you. I think this is better than secretly re-directing because I believe this(secretly re-directing) is something that your site could potentially be penalized for(more so than just some broken links, etc).

Having quality, direct, outbound links is a good thing!

If you do not agree, just put the link directory in a certain folder and make that folder available to robots.

Ept103

7:38 pm on Mar 7, 2005 (gmt 0)

10+ Year Member

Interesting spaceylacie...

Let's say the script was removed, Is 1) really worse?

1) Having 100,000+ outbound links that pass through a redirection php script (which also makes sure they aren't 404s..)

or

2) Having about 25,000 (at first until it is cleaned up) links being 404 errors and about 75,000 code 200 links..

25% is a high guess but let's just use that as an example for this..

Other feedback?

spaceylacie

8:58 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

If you are going to continue to use a method to hide the links, than, yes, I think #1 is worse. While you may be using the program for good, others are not. You could get lumped in with the "bad hats" or "black hats", whatever they are currently called, as opposed to "white hats". I, personally, would not take this risk. You could be dropped completely from Google(and other search engines). I have seen this happen.

Direct links are the best way to go. But, 25% bad links(404 errors) is a lot! You could easily get this down to a much lower % by using a free program, like the one at anybrowser.com, to cut your numbers in half, or delete the bad link completely. You aren't going to be able to continue to run the link directory in future years unless you take the time to do this.

Hire someone in another country(with a lower minimum wage) to do the work for you. For 100,000 links, it would take your employee about 20 days @ 8 hours a day. It would be worth the $'s or time. My timetable is based on how long it takes me, or my own foreign employee, to check my links, a little over 5000. Either one of us can check them all in one 8 hour day.

Repeat the process about once per year.

jomaxx

9:40 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Please. Hand checking, or even a website where you enter one URL of your site at a time, is not the way to go. Use a software package.

But you do have to combine this with manual spot checks, because many sites do not allow automated link checking or else do not generate a 404 error when the content is gone.

Ept103

11:08 pm on Mar 7, 2005 (gmt 0)

10+ Year Member

I tried anybrowser to check links. It did a great job of checking the top level page I gave it, but it didn't go very deep.

Do you know of any tools that will spider and check an entire site, 4 or 5 levels deep. (This is a huge directory that goes 3 sub folders deep)

Does InfoLink do this?

larryhatch

11:22 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Try the W3C link checker: [validator.w3.org...]

There is a box where you can enter how many layers deep to check recursively.

W3C did a fine job. It even suggested improvements to my links
like leaving out index.html or adding a trailing slash and the like.

It was something of an eye-opener. - Larry

sandyeggo

11:23 pm on Mar 7, 2005 (gmt 0)

10+ Year Member

i have used xenu before. its an older program but it will get the job done for what I need.
search on google for it

larryhatch

11:23 pm on Mar 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I should add that it will be _slow_ going for sites with thousands of outward links
and that I only tested it with honest <a href= www .. type links. - Larry

Ept103

12:38 am on Mar 8, 2005 (gmt 0)

10+ Year Member

Great, between a few of these tools I should be able to compile a nice list of broken links..

How about hiring help? Any recommendations on where to find and hire cheap but efficient help in removing broken links?

mrMister

9:14 am on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Your post doesn't make much sense to me.

If you really want to check if your links are broken then use the W3C link checker.

However, I get the suspicion that that's not what you're after. You actually don't want the sites you are linking to to get any pagerank from you. If that is the case then use...

rel="nofollow"

But beware. You won't get the benefits you think you will.

spaceylacie

12:57 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thanks larryhatch, for the link. I did notice though, don't click on anything until it finishes! I clicked on something and the entire process started over!

I found help with my site through word-of-mouth, I know someone that has a site for WAHM(work at home moms). Visit some sites like this and post a message that your looking for help. There are plenty of WAHMs looking for work.

I still like to manually check the links every now and again. If a page turns into a search engine or into a different site, link checking programs won't always catch it.

mrMister

1:08 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm about to start writing an application that will crawl a site looking for outbound links and produce an HTML summary of:

Links that produce error or redirect responses and links that have changed since the last check.

From there, I can click and see any pages that have changed and remove them manually if needed.

If you're intested in a copy when I've finished it, sticky me now and I'll let you know.

claus

1:10 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The W3C checker will make your browser hang when used recursively across many pages. At least that's my experience with Firefox as well as IE.

You can actually download it and run it from your machine. I don't remember if it's perl or php (think it's perl though).

spaceylacie

1:41 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Yes, it froze up for me too, I downloaded it, gonna try to install it, it is perl.

Lorel

3:24 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

However, I get the suspicion that that's not what you're after. You actually don't want the sites you are linking to to get any pagerank from you.

These are the kinds of sites I avoid like the plague when searching for sites to submit links to.

4specs

3:50 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

My directory has 13,000 outbound links and I want to maintain a high degree of working - my target is 99% at all times working.

1. I copy my working directory to an apache htdocs on my design machine and using Ultraedit remove all the redirection scripts across the 800 pages.

2. Using Apache and Zenu, I check all the links for a head response and a redirection to a new domain and make changes to the dierctory as needed. I call each company after 2 weeks to see if they have a new domain or what is happening.

3. As NetSol no longer returns a redirection, I then locate every Netscape server and check those links for redirections through Internet Researcher

4. As that still does not find the dumb designers using html redirection (as compared to a 301 server redirection) to a new domain, I run Internet Research every month on 1/12 of the urls to locate all the refresh lines and see if it is refreshing to a new domain. (So they are checked on an annual basis)

5. As a final check I pay my father in law to check 1/12th of the links every month about 6 months out of phase of #4.

This is a lot of work, but I have 4000 user sessions a day, 150,000 referrals per month and 480 advertisers with an average per click of 30 cents for the adverisers - and I make money and have fun skiing at Deer Valley in Utah and playing golf.

philaweb

5:58 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

What you need is the Xenu Link Sleuth, a robot that spiders your website and reports whatever you tell it to.

When you have a website with thousands of outbound links you really should be using a MySQL database or similar in combination with a scripting language, that way you will also be able to track clicks without thinking about SE's.

Have an application written that automatically removes the link anchors of 404 errors from the HTML code. That way the text stays, the hyperlinks disappears and all is done automated on your harddrive before you upload to server.

Ept103

6:01 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

My problem is that my website has an extreme number of outbound links.

Note: The most important goal here is to stay on good graces with Google (proper webmaster guidelines so we're not penalized in any way) they are sending a large majority of our traffic. A close second goal is to be user friendly. Without Google our site wouldn't exist, so we have to make sure we are 100% sctrict (or as close as possible) with their guidelines. Sounds bad that user friendly isn't number 1 concern but their wouldn't be users if it wasn't for Google.

I'm curious to hear feedback about which of the following situations will be "better" in Google's eyes:
Note: I'm trying to find a way to deal with outbound links which because of excessive quantity can't be dealt with by hand to check for linking to bad neighbhorhoods,etc

1) PHP redirection script used for every outbound link can set it between 301 or 302.
Down side: Potential of linking to "bad neighborhoods"
Question:
-If a link goes through a redirection script and the script is blocked via robots.txt, does the link get crawled and open up the door for potential bad neighbhorhood penalization?

2) Javascript outbound links either stored in external .js file (blocked from search engines) or regular javascript links on page (uncrawlable in some way.)
Down side: Google may frown upon excessive use of javscript.
Question:
-Is this cloaking?
-Has excessive use of javascript ever been shown to cause a penalty?

3) Plain text outbound links
Down side: Potential of linking to "bad neighborhoods"
Question:
-Does rel="nofollow" make Google ignore the links, i.e. get rid of the bad neighborhood problem?

4) TEXT URLs, no outbound links. Example:
Title: Widgets for sale
URL: www[dot]widgets[dot]com (without the [dot]'s of course, in the form of a URL just not a hyperlink)
instead of <a href="http://www.widgets.com">Widgets for sale</a>
Down side: Usability
Questions:
-Does google crawl text URLs?
-Will google penalize for text URLs?
-Is there anything wrong with text URLs in this situation over links, aside from usability?

Note: The links are NOT paid nor necessarily reciprocal, so there's no link partner obligation.

[edited by: Ept103 at 6:31 pm (utc) on Mar. 8, 2005]

mgeyman

6:17 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

I run a regional directory and use a popular directory script program where my links are using the following format:

www.#*$!.com/cgi-bin/jump.cgi?ID=1111

Obviously, these re-direct/resolve to the listing's actual URL.

Is this hurting me in the eyes of Google, Yahoo, MSN, etc. when they spider my site?

The main resason I used this format so it is more difficult for folks to scrape the site for URLs. I've had this happen several times in the past. I know it won't prevent it but will make it more difficult.

Also, how about using a product that not allows copying of links on a web page?

Ept103

7:45 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

What type of redirect is it giving you 301, 302?

carguy84

7:53 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

why not use ASP or something?

Set HttpObj = Server.CreateObject("AspHTTP.Conn")
HTTPObj.Url = "http://www.company.com/"

check for a valid response and redirect to the site if it exists, or show some other page with related links if it doesn't exist, and mark that URL as 404 in the DB.

Chip-

Ept103

8:16 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

carguy84, would Googlebot or others be able to see those links when they crawled a site?

frox

9:17 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

Maybe I don't understand, but the rather new rel=nofollow attribute seems to do exactly what you need.

Treated with some detail on Googleblog:
www.google.com/#*$!/2005/01/preventing-comment-spam.html

1) it says explicitely that it does not transfer page rank

2) it says that "This isn't a negative vote for the site where the comment was posted"

3) it does not say wether the Googlebot will crawl the linked-to site, but I presume not (after all, it says "nofollow!"). So even a 404 would not be hit by Googlebot

Am I wrong?

larryhatch

9:49 pm on Mar 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Some added notes on the W3C link checker:

It worked OK on my site, but I only have around 130 pages.
No, don't click on anything while its running, be patient.
It will NOT find all links you should fix/remove!

One odd link I had died and was eaten by a spammy portal page.
I found that by accident and fixed it in a hurry.
It pays to visit links in person. Tedious, I know.

Too many 404s will make your site look crappy and unmaintained,
both to visitors and to the SEs. -Larry

mgeyman

10:13 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

Ept103,

The links are re-directed as 302s.

Just Guessing

10:34 pm on Mar 8, 2005 (gmt 0)

10+ Year Member

Google can, and sometimes does, follow and index any kind of recognisable URL, whether a hyperlink, text, javascript, or whatever.

In practice, most of the time it obeys robots.txt, but with various exceptions - if it saw the URL before it saw robots.txt, if there's an external link to the URL, if it just feels like it, and so on.

Of course, if your site has so many links you don't know where they go, Google may not consider to be the sort of "quality" site that it wants to rank well.

just guessing...

This 43 message thread spans 2 pages: 43

1
2
»