Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Rogue outbound links: how to find them?

Site broken in. I suspect links planted. How to find them?

         

1script

4:10 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Picture a situation: VERY old VERY big site with almost 10 year history of CGI programming, all represented and still in use. Many scripts were (some may still be) susceptible to XSS attacks and a couple of rogue outbound links have been found (by chance) in the past. The site was also broken into a couple times in its history and bad links might have been left behind. The site apparently lost the notorious G Trust despite all the history and links from authority sites and such. PR dropped to zero on the last update. I suspect I was not able to find all bad links.

What would be the best way to search for those bad outbound links on your site?

tedster

5:37 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can use MSN's Live Search to build a list of outbound links from your domain - they support a LinkFromDomain:example.com operator.

Then you would evaluate them one by one to find bad neighborhoods. Only a hand check of the unbroken outbound links will do this, because you've got to evaluate the quality of the neighborhood you are linking out to.

TheSeoDude

5:46 pm on Aug 29, 2007 (gmt 0)



Is your site dynamic? PHP? If so you could check links easy by using output buffering and parsing it for links.

For easier approach:
If not verify your databases (table by table, field by field) with regexp and extract all links.
Or just write a script and parse all static html files from disk for links.

It's kind of easy ... for a coder.

Good luck!

buckworks

6:01 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can also get useful reports from a number of tools that check for broken links. Check out any URLs you don't recognize, and take special note of links that are redirecting.

1script

1:36 am on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@buckworks:

I guess in order to be a rogue link the link must be to a live page, so dead link checkers would not help at all.

@TheSEODude:
I am a coder and I could write something like that but the problem is: in case of an XSS attack, the rogue link does not exist on your site! It is only being created for Googlebot and only exists when a particularly crafted URI is fed into it ON ANOTHER SITE. It makes any reporting that Google provides so much more valuable than any testing tool I can write because I need to know how Google (or Y! or MSN for that matter) PERCEIVE my site, not what I (and other "normal" visitors) think it contains.

Trouble is: they don't provide any reporting for this particular site. In their view the site is clearly abusing something (only G knows what), it's stripped of its former PR5 and Webmaster Central says that Googlebot was on my homepage last on Jan 01, 2007 even though I see the hits every other day.

I wish MSN could be used as tedster suggests but their reach is laughable and they did not manage to index more than 300 pages of this 300,000-page site in all these years.

Bottom line: not knowing exactly what's going on with G banning the site makes me really paranoid and makes me look for and blame things that are out of my control. A couple of confirmed cases of XSS links (long since fixed) is just pouring more oil into the flames, so to speak.

buckworks

3:36 am on Aug 30, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



1script, you missed my point; sorry if I wasn't clear enough.

Some link checkers generate reports that show a lot more than just the broken links, and that's what you're interested in. Those reports can help you spot links that shouldn't be there.

jd01

4:34 am on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a big job, but I would consider changing all my links to canonical if they are not already, running XENU and sorting all links alphabetically. It should make it so you can (relatively) easily check your outbounds and not have to worry about internals, because they should be 'grouped' with internals together, which can be skipped when checking.

Justin

TheSeoDude

6:48 am on Aug 30, 2007 (gmt 0)



1script, then write a small php & curl robot, give it a gogoblot useragent and parse your own site with it.
And regexp all [*...] out of it. Then add them to a db and use sql queries to keep playing with them. Also remember where each link was found.

If you can't use any mentioned method on this thread you should not have asked the question unless you just needed reassuring words from us like:

I'm sure your site is clean and pretty!
I'm sure googlebot / others won't mind the extra links!
I'm sure attackers could not break in!
I'm sure hacking links into your site is not possible!

Good luck.

dublinmike

7:08 am on Aug 30, 2007 (gmt 0)

10+ Year Member



Anyone using OStickets for support can have thousands of rogue outbound links, make sure that attachments are off on submit ticket as it's easy to submit a php file with a million links you would never see.