Not sure I understand your question. You don't want your Site's links to it's own pages to be structured 'http://link.html', you want them to be structured simply 'link.html'
The links should be seen on an SE as 'http://www.mysite.com/link.html, but they are appearing as only 'link.html', because that's the way I've linked them within my site under the same domain...
I am a bit new to SEO, and was wondering the same thing. If a link is written "page1.html" as opposed to "http://www.mydomain.com/page1.html", are spiders still able to follow the link successfuly?
Thanks in advance for any help.
Sure. Any relatively modern spider is going to understand relative links.
and great site btw.
I saw this as well today. I was looking at a site with the SimSpider, and they have just 1 link from the index.html. The tag looks like this :
All this lives in a table <td>, and I have picked up a dangling <p> tag inside the <td> that the link is in. Could this cause problems with SimSpiders ability to correctly parse the link?
If you hover over, or use the link, its fine. SimSpider reports is as pointing to "http://navigation.html" though. I also note with interest that the site in question now has a PR of 0
Any thoughts anyone?
I'm going out on a limb here to add or haze things up a degree or two.
In my experiences, it is easier to create sites with relative links - and let the SE's figure it out.
Does that yield the best results? Not necessarily. I am with Brett in believing that the SE's spiders, bots and crawlers can understand where the link goes without any issues.
However, looking at engines like Google - who index AND cache pages for their users - perhaps they prefer absolute links over relative links, as it make the archiving of pages easier.
Again, just a theory in the most undeveloped sense.
Having had a lot more experience since my original post in June, I don't think it matters. Relative links are the same as absolute links in a spider's eyes. Spider technology is quite advanced...
True, I'm just curious as to why the SimSpider isn't reading a perfectly good (at least it LOOKS perfectly good) relative link on this other site correctly. It insists it is being pointed to "http://navigation.html"
I just find it odd that this site ALSO has a PR of 0, AND the effect of the fault would be to make virtually the entire site unreachable, if an SE spider behaves the same way as the SimSpider.
<a little later>
Ding, got it!
Spidering www.domain.co.uk gets the link wrong
Spidering www.domain.co.uk/ gets the link right
I bet SimSpider constructs relative paths from the entered URL (not unreasonably), by just tagging the filename called on the end of the base URL without checking for a terminal /. Also, entering .../index.html gets it right, so I bet its smart enough to sub the filename correctly, cos it'll have a / there.
So, my fault for being sloppy <grin> Who'd have guessed that? ;) It may have been a total waste of 1/2 an hour, but it makes me happy, y'know?
Exactly. I'm using a stock module to extract links. It works on whatever it is fed. I looked at fixing it, but the exceptions were many (base hrefs...etc). I preferred to leave it stock.