Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Links from PDF's on high quality sites like NASA.gov

How do my competitors do that?

7:05 pm on Sep 6, 2009 (gmt 0)

New User

5+ Year Member

joined:Sept 6, 2009
posts: 2
votes: 0

I am using a tool to analyze what backlinks my competitors have and I was quite surprised to find out that they had links from places like NASA.gov and other quite trustworthy websites with a high PR.

The thing is I would like to understand how they do it. All these high quality links are from PDF files - so my analysis tools tells me. But when visiting the PDF's I cannot find the specific the link - also it would not make sense if my competitor had a link in these PDF's.

Obviously something sneaky is going on. But what? Anyone know anything about this?


7:48 am on Sept 10, 2009 (gmt 0)

New User

5+ Year Member

joined:Aug 20, 2009
posts: 18
votes: 0

Difficult to take a call on your claim of sneaky tactic without seeing the actual url.

But Google started using technology called optical character recognition ( OCR ) to extract text out of the PDF’s from late 2008 onwards.

What it basically does is that it takes the snapshots of PDF’s as input, runs optical character recognition on them and index the text just like regular text.

If it can see the text, it would be seeing the links too?

If you want to know geek details about the open source OCR software that Google sponsers, OCROPUS –
refer to: [code.google.com...]

(If you have Acrobat Pro 9, you can see the option under Documents => OCR Text Recognition => Recognize Text using OCR)


3:10 am on Sept 14, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
votes: 0

If your tool is showing you something that you can't verify by manual inspection, especially something that doesn't seem logic or rational...

Well, I would think about the value of that tool for a minute, and just perhaps ... I'd ask the tool maker what was up with that before I scrapped it.

6:56 am on Sept 14, 2009 (gmt 0)

New User

5+ Year Member

joined:Sept 6, 2009
posts: 2
votes: 0

Hi guys, thx for taking the time to giving your thoughts on the subject. I think you're both on to something. As I can see that all the PDF's actually contain the brand name of my competitor (he has a generic name: lets say it was "stool"). So OCR technology atleast recognizes the keyword "stool" to be important for the pdf - however how the connection to my competitors website is made I don't know (there are no visible links). Perhaps like Claus says it is an error (I just dont believe this as my competitor work in the field of SEO). There must be more to it - but what?
8:08 am on Sept 14, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Feb 7, 2009
votes: 0

I do not know how your competitors are getting links from this. But i know for sure such links do work. Try to get some links from internal pages of high domain PR sites and you will see the results.
9:14 am on Sept 14, 2009 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
votes: 122

Crosscheck your tool with [search.yahoo.com...]

In the following search, replace example.com with the domain name in question.

linkdomain:example.com site:.gov

Does it show links from those PDF files?