Welcome to WebmasterWorld Guest from 54.226.179.247

Forum Moderators: incrediBILL

Message Too Old, No Replies

PLEASE HELP>>> Understanding anchor link syntax

a href link syntax

     
9:35 am on Oct 22, 2009 (gmt 0)

Junior Member

5+ Year Member

joined:Oct 20, 2009
posts:70
votes: 0


I've been working on a crawl script that pulls all links on a page. I can successfully obtain the link value with my script, but to properly format the results. I need to better understand anchor (href) syntax.

what I've concluded so far are:
1. http or https -> direct link to another page (GOOD)
2. mailto:, javascript:, #, -> browser interactivity (IGNORE)
3. / -> site base (GOOD)

the issues arise when my crawler is on a page link:
html://www.example.com/folder/thisone.html

and link result is:
search.php?some=value

I think a refresher of how browsers properly format links to get them to their target would very much help me. PLEASE ANYONE, been working on this for 2 days now. Cant find anything through google for anchor or link syntax and structures.

1:12 pm on Oct 22, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member swa66 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 7, 2003
posts:4783
votes: 0


If you want the real knowledge of how a URL needs to be parsed:
ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt
ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt

your 1. is wrong in the sense that it must start with http:// or https://

There are far more possibilities and when writing a parser to put in a bot: it needs to understand them all.

Hence fall back onto the BNF syntax in the RFCs.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members