Forum Moderators: Robert Charlton & goodroi
The outbound links are in a file which counts the clicks called link.php, we have a function called url which when used in this syntax, counts the link then redirects to an outside site.
Example:
[example.com...]
This would count a click for example2.com on our script then redirect to example2.com
The problem, is that we have over 100,000 outbound links on our site and it is virtually impossible to manually check all the links.
What prevention method can I take in order to make sure I'm not penalized for broken links or any site that is involved with methods that search engines frown upon? (I'm referring to links after the url=)
I would prefer a script that simply redirects in such a way that it doesn't count as an outbound link and redirects in some manner, is this possible?
I'm seriously thinking about writing a script that put the link in a text box, since I'm so sick of seeing 404 errors everytime I use a validator. I don't want to be penalized if any of these links are bad.
Any thoughts?
Since it is a "redirect" and I don't actually have a link to anything directly, could I still be penalized for my redirected links? (every outside link on my site uses the link.php?url= format)
...I was thinking, what if I add a robots.txt file like this:
User-Agent: *
Disallow: /link.php
That should solve the whole problem, right?
I'm not sure if blocking the robots has anything to do with counting an outbound link or not?
Looking for some opinions on this subject.
Having quality, direct, outbound links is a good thing!
If you do not agree, just put the link directory in a certain folder and make that folder available to robots.
Let's say the script was removed, Is 1) really worse?
1) Having 100,000+ outbound links that pass through a redirection php script (which also makes sure they aren't 404s..)
or
2) Having about 25,000 (at first until it is cleaned up) links being 404 errors and about 75,000 code 200 links..
25% is a high guess but let's just use that as an example for this..
Other feedback?
Direct links are the best way to go. But, 25% bad links(404 errors) is a lot! You could easily get this down to a much lower % by using a free program, like the one at anybrowser.com, to cut your numbers in half, or delete the bad link completely. You aren't going to be able to continue to run the link directory in future years unless you take the time to do this.
Hire someone in another country(with a lower minimum wage) to do the work for you. For 100,000 links, it would take your employee about 20 days @ 8 hours a day. It would be worth the $'s or time. My timetable is based on how long it takes me, or my own foreign employee, to check my links, a little over 5000. Either one of us can check them all in one 8 hour day.
Repeat the process about once per year.
But you do have to combine this with manual spot checks, because many sites do not allow automated link checking or else do not generate a 404 error when the content is gone.
There is a box where you can enter how many layers deep to check recursively.
W3C did a fine job. It even suggested improvements to my links
like leaving out index.html or adding a trailing slash and the like.
It was something of an eye-opener. - Larry
If you really want to check if your links are broken then use the W3C link checker.
However, I get the suspicion that that's not what you're after. You actually don't want the sites you are linking to to get any pagerank from you. If that is the case then use...
rel="nofollow"
But beware. You won't get the benefits you think you will.
I found help with my site through word-of-mouth, I know someone that has a site for WAHM(work at home moms). Visit some sites like this and post a message that your looking for help. There are plenty of WAHMs looking for work.
I still like to manually check the links every now and again. If a page turns into a search engine or into a different site, link checking programs won't always catch it.
Links that produce error or redirect responses and links that have changed since the last check.
From there, I can click and see any pages that have changed and remove them manually if needed.
If you're intested in a copy when I've finished it, sticky me now and I'll let you know.
1. I copy my working directory to an apache htdocs on my design machine and using Ultraedit remove all the redirection scripts across the 800 pages.
2. Using Apache and Zenu, I check all the links for a head response and a redirection to a new domain and make changes to the dierctory as needed. I call each company after 2 weeks to see if they have a new domain or what is happening.
3. As NetSol no longer returns a redirection, I then locate every Netscape server and check those links for redirections through Internet Researcher
4. As that still does not find the dumb designers using html redirection (as compared to a 301 server redirection) to a new domain, I run Internet Research every month on 1/12 of the urls to locate all the refresh lines and see if it is refreshing to a new domain. (So they are checked on an annual basis)
5. As a final check I pay my father in law to check 1/12th of the links every month about 6 months out of phase of #4.
This is a lot of work, but I have 4000 user sessions a day, 150,000 referrals per month and 480 advertisers with an average per click of 30 cents for the adverisers - and I make money and have fun skiing at Deer Valley in Utah and playing golf.
When you have a website with thousands of outbound links you really should be using a MySQL database or similar in combination with a scripting language, that way you will also be able to track clicks without thinking about SE's.
Have an application written that automatically removes the link anchors of 404 errors from the HTML code. That way the text stays, the hyperlinks disappears and all is done automated on your harddrive before you upload to server.
Note: The most important goal here is to stay on good graces with Google (proper webmaster guidelines so we're not penalized in any way) they are sending a large majority of our traffic. A close second goal is to be user friendly. Without Google our site wouldn't exist, so we have to make sure we are 100% sctrict (or as close as possible) with their guidelines. Sounds bad that user friendly isn't number 1 concern but their wouldn't be users if it wasn't for Google.
I'm curious to hear feedback about which of the following situations will be "better" in Google's eyes:
Note: I'm trying to find a way to deal with outbound links which because of excessive quantity can't be dealt with by hand to check for linking to bad neighbhorhoods,etc
1) PHP redirection script used for every outbound link can set it between 301 or 302.
Down side: Potential of linking to "bad neighborhoods"
Question:
-If a link goes through a redirection script and the script is blocked via robots.txt, does the link get crawled and open up the door for potential bad neighbhorhood penalization?
2) Javascript outbound links either stored in external .js file (blocked from search engines) or regular javascript links on page (uncrawlable in some way.)
Down side: Google may frown upon excessive use of javscript.
Question:
-Is this cloaking?
-Has excessive use of javascript ever been shown to cause a penalty?
3) Plain text outbound links
Down side: Potential of linking to "bad neighborhoods"
Question:
-Does rel="nofollow" make Google ignore the links, i.e. get rid of the bad neighborhood problem?
4) TEXT URLs, no outbound links. Example:
Title: Widgets for sale
URL: www[dot]widgets[dot]com (without the [dot]'s of course, in the form of a URL just not a hyperlink)
instead of <a href="http://www.widgets.com">Widgets for sale</a>
Down side: Usability
Questions:
-Does google crawl text URLs?
-Will google penalize for text URLs?
-Is there anything wrong with text URLs in this situation over links, aside from usability?
Note: The links are NOT paid nor necessarily reciprocal, so there's no link partner obligation.
[edited by: Ept103 at 6:31 pm (utc) on Mar. 8, 2005]
www.#*$!.com/cgi-bin/jump.cgi?ID=1111
Obviously, these re-direct/resolve to the listing's actual URL.
Is this hurting me in the eyes of Google, Yahoo, MSN, etc. when they spider my site?
The main resason I used this format so it is more difficult for folks to scrape the site for URLs. I've had this happen several times in the past. I know it won't prevent it but will make it more difficult.
Also, how about using a product that not allows copying of links on a web page?
Treated with some detail on Googleblog:
www.google.com/#*$!/2005/01/preventing-comment-spam.html
1) it says explicitely that it does not transfer page rank
2) it says that "This isn't a negative vote for the site where the comment was posted"
3) it does not say wether the Googlebot will crawl the linked-to site, but I presume not (after all, it says "nofollow!"). So even a 404 would not be hit by Googlebot
Am I wrong?
It worked OK on my site, but I only have around 130 pages.
No, don't click on anything while its running, be patient.
It will NOT find all links you should fix/remove!
One odd link I had died and was eaten by a spammy portal page.
I found that by accident and fixed it in a hurry.
It pays to visit links in person. Tedious, I know.
Too many 404s will make your site look crappy and unmaintained,
both to visitors and to the SEs. -Larry
In practice, most of the time it obeys robots.txt, but with various exceptions - if it saw the URL before it saw robots.txt, if there's an external link to the URL, if it just feels like it, and so on.
Of course, if your site has so many links you don't know where they go, Google may not consider to be the sort of "quality" site that it wants to rank well.
just guessing...