Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Links from PDF's on high quality sites like NASA.gov

How do my competitors do that?



7:05 pm on Sep 6, 2009 (gmt 0)

5+ Year Member

I am using a tool to analyze what backlinks my competitors have and I was quite surprised to find out that they had links from places like NASA.gov and other quite trustworthy websites with a high PR.

The thing is I would like to understand how they do it. All these high quality links are from PDF files - so my analysis tools tells me. But when visiting the PDF's I cannot find the specific the link - also it would not make sense if my competitor had a link in these PDF's.

Obviously something sneaky is going on. But what? Anyone know anything about this?



7:48 am on Sep 10, 2009 (gmt 0)

5+ Year Member

Difficult to take a call on your claim of sneaky tactic without seeing the actual url.

But Google started using technology called optical character recognition ( OCR ) to extract text out of the PDF’s from late 2008 onwards.

What it basically does is that it takes the snapshots of PDF’s as input, runs optical character recognition on them and index the text just like regular text.

If it can see the text, it would be seeing the links too?

If you want to know geek details about the open source OCR software that Google sponsers, OCROPUS –
refer to: [code.google.com...]

(If you have Acrobat Pro 9, you can see the option under Documents => OCR Text Recognition => Recognize Text using OCR)



3:10 am on Sep 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

If your tool is showing you something that you can't verify by manual inspection, especially something that doesn't seem logic or rational...

Well, I would think about the value of that tool for a minute, and just perhaps ... I'd ask the tool maker what was up with that before I scrapped it.


6:56 am on Sep 14, 2009 (gmt 0)

5+ Year Member

Hi guys, thx for taking the time to giving your thoughts on the subject. I think you're both on to something. As I can see that all the PDF's actually contain the brand name of my competitor (he has a generic name: lets say it was "stool"). So OCR technology atleast recognizes the keyword "stool" to be important for the pdf - however how the connection to my competitors website is made I don't know (there are no visible links). Perhaps like Claus says it is an error (I just dont believe this as my competitor work in the field of SEO). There must be more to it - but what?


8:08 am on Sep 14, 2009 (gmt 0)

5+ Year Member

I do not know how your competitors are getting links from this. But i know for sure such links do work. Try to get some links from internal pages of high domain PR sites and you will see the results.


9:14 am on Sep 14, 2009 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Crosscheck your tool with [search.yahoo.com...]

In the following search, replace example.com with the domain name in question.

linkdomain:example.com site:.gov

Does it show links from those PDF files?


Featured Threads

Hot Threads This Week

Hot Threads This Month