homepage Welcome to WebmasterWorld Guest from 54.211.97.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
Forum Library, Charter, Moderators: phranque

SEM Research Topics Forum

    
Links from PDF's on high quality sites like NASA.gov
How do my competitors do that?
webjuice

5+ Year Member



 
Msg#: 3985181 posted 7:05 pm on Sep 6, 2009 (gmt 0)

I am using a tool to analyze what backlinks my competitors have and I was quite surprised to find out that they had links from places like NASA.gov and other quite trustworthy websites with a high PR.

The thing is I would like to understand how they do it. All these high quality links are from PDF files - so my analysis tools tells me. But when visiting the PDF's I cannot find the specific the link - also it would not make sense if my competitor had a link in these PDF's.

Obviously something sneaky is going on. But what? Anyone know anything about this?

/webjuice

 

abilitydesigns

5+ Year Member



 
Msg#: 3985181 posted 7:48 am on Sep 10, 2009 (gmt 0)

Difficult to take a call on your claim of sneaky tactic without seeing the actual url.

But Google started using technology called optical character recognition ( OCR ) to extract text out of the PDF’s from late 2008 onwards.

What it basically does is that it takes the snapshots of PDF’s as input, runs optical character recognition on them and index the text just like regular text.

If it can see the text, it would be seeing the links too?

If you want to know geek details about the open source OCR software that Google sponsers, OCROPUS –
refer to: [code.google.com...]

(If you have Acrobat Pro 9, you can see the option under Documents => OCR Text Recognition => Recognize Text using OCR)

-AD

claus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3985181 posted 3:10 am on Sep 14, 2009 (gmt 0)

If your tool is showing you something that you can't verify by manual inspection, especially something that doesn't seem logic or rational...

Well, I would think about the value of that tool for a minute, and just perhaps ... I'd ask the tool maker what was up with that before I scrapped it.

webjuice

5+ Year Member



 
Msg#: 3985181 posted 6:56 am on Sep 14, 2009 (gmt 0)

Hi guys, thx for taking the time to giving your thoughts on the subject. I think you're both on to something. As I can see that all the PDF's actually contain the brand name of my competitor (he has a generic name: lets say it was "stool"). So OCR technology atleast recognizes the keyword "stool" to be important for the pdf - however how the connection to my competitors website is made I don't know (there are no visible links). Perhaps like Claus says it is an error (I just dont believe this as my competitor work in the field of SEO). There must be more to it - but what?

stephen186

5+ Year Member



 
Msg#: 3985181 posted 8:08 am on Sep 14, 2009 (gmt 0)

I do not know how your competitors are getting links from this. But i know for sure such links do work. Try to get some links from internal pages of high domain PR sites and you will see the results.

martinibuster

WebmasterWorld Administrator martinibuster us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3985181 posted 9:14 am on Sep 14, 2009 (gmt 0)

Crosscheck your tool with [search.yahoo.com...]

In the following search, replace example.com with the domain name in question.

linkdomain:example.com site:.gov

Does it show links from those PDF files?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved