|Salvaging Old Links|
Almost as useful as new link development
| 4:48 am on Jul 23, 2010 (gmt 0)|
I've just been digging in someone's Google Webmaster Tools account, and discovered some gold nuggets in the error reports. The reports identify hundreds of old links on external sites pointing to URLs that no longer exist on his site. Some individual URLs had dozens of links.
So, I've compiled a list of the old broken URLs, matched them with suitable new URLs, and requested that his web team set up 301 redirects. This will give users a better experience, and of course I'm keen to salvage whatever Page Rank can be gained here, too. The links are mostly "little" ones, but there are lots of them.
ACTION STEPS: Log into Google Webmaster Tools for the site, and go to the "Not Found" Crawl Errors report. At the bottom of the page, there's a link to "Download all sources of errors on this site."
That gives you a .csv file which lists all the problem URLs Google has found, and also lists the source of each error. If your site is young, your report might have few entries, but if it has been around for a while there might be hundreds.
If any of the errors are coming from within your own site, or a source within your control, make it a priority to locate and update those broken links right away. Broken links waste Page Rank, and won't help your signals of quality, either.
Then, gather the URLs that have links coming in from external sites, weed out any duplicates, and decide on new URLs to redirect them to. Set up 301 redirects from the old to the new, taking care to avoid chains of redirects; make the redirect be one step only.
There's no way to measure the value of Page Rank that could be salvaged this way, but it's better to have it than not, right? :)
| 7:25 am on Jul 23, 2010 (gmt 0)|
Another way to salvage links from within a site are dead end pages, pages that don't link to anything else. These are commonly PDF, Word Docs, and other non-HTML pages. Rather than let the PR flow end there, modify those files so they link back to the home page.
| 3:52 pm on Jul 23, 2010 (gmt 0)|
Those outbound links from pdfs and docs are just as powerful as any - and they're more laser focused, too.
| 5:05 pm on Jul 23, 2010 (gmt 0)|
Agreed, it is better to have it.
For some reason many web devs (at least the ones I've run into won't 301 those pdfs/docs/and non HTML pages), but if there's a lot of quality links pointing to it, its a good idea to 301 those as you said or put the file back (and in the process educate the dev so you don't have to keep asking).
| 5:29 pm on Aug 10, 2010 (gmt 0)|
With PDFs created from Word doc it's simple to include a link at the end of the doc or in the content, but what would you recommend with PDFs that are scans? Link to 'return to xyz site' from the bookmarks panel that's set to always show?
| 8:01 am on Sep 19, 2010 (gmt 0)|
|what would you recommend with PDFs that are scans? |
Haven't tried this, but shouldn't it be possible to edit scanned PDFs to include links (and/or other matter)?
| 3:13 am on Sep 20, 2010 (gmt 0)|
This might be a stupid question, but...
|but what would you recommend with PDFs that are scans? |
How would such a pdf be indexed? Wouldn't it just be an image (like a .jpg or something) embedded in a .pdf file?
Or am I missing something?
| 7:05 am on Sep 20, 2010 (gmt 0)|
I think Alex might be talking about a document that is scanned to a PDF file. Web_savvy answered the question. If you have Adobe Acrobat then it should be a simple matter to highlight the text and insert a URL so that it's a live URL. It has to be done manually.
When the document is scanned using a software like PaperPort [nuance.com], the OCR (Optical Character Recognition) part of the software will "read" the image and change it into editable words. PaperPort can also scan a document directly to PDF.
A side note. As I learned it in digital arts class many years ago, PDF files are related to Adobe Illustrator files. They share(d) a common foundation.
| 4:49 pm on Sep 21, 2010 (gmt 0)|
This is a great idea, Buckworks. Up until two days ago, WMT was showing a lot of 404's. Now it doesn't show any.
Guess I'll have to wait.