Forum Moderators: open

Message Too Old, No Replies

More spider and relative link problems

Spiders ignoring ../../

         

johneagle

6:21 pm on May 5, 2001 (gmt 0)



Have been having some 404 errors generated by a few search engines including "wanadoo.fr"
I use Frontpage 2000 and it generates all the relative links OK. No browser problems. (checked with linkbot)
The search engine does the following:
Read index page..
Generate all links to next level and read..
Generate from last pages links to next level..
BUT in the process it generates incorrect relative links. i.e.
the page has html: <a hef="../../index.shtml">
The generated link should be:
[mydomain.com...]
the serach engine gets:
[mydomain.com...]
The page where the link is:
[mydomain.com...]

Any ideas about what the problem is?
I really dont want to put absolute links everywhere.
Thanks

paynt

6:43 pm on May 5, 2001 (gmt 0)



Hello johneagle,
In my experience that's Frontpage. I have used Frontpage 2000 in the past for editing and experienced this. I really suggest switching to Dreamweaver and saving yourself a lot of hassle. No offense to FrontPage or those of you using it to create your pages, but the code is not clean and there are often problems like this. You'll probably have to do a mass search and replace to be sure this error is fixed and always watch it in the future if you continue to use FrontPage.

johneagle

3:41 am on May 6, 2001 (gmt 0)



Thanks paynt for your reply but I must be dense as I cannot understand your answer. As I said I think Frontpage has actually done the correct html. The problem is, I think, either that I do not understand relative links or the spiders do not. If anybody can direct me as to the solution I would very much appreciate it. I do hope that I do not have to change all my relative links to absolute links.

Marcia

2:15 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



johneagle, welcome to WebmasterWorld.

The way Front Page does certain kinds of code is quite often "different" from the accepted standard, and therefore creates difficulties.

To be honest, most often I'll use absolute URLs, but not always. I have never, ever had a problem with standard coded relative URLs.

I'd suggest running your pages through an HTML validator. If it's not working, I'm afraid there's no choice but to make some changes. There is software available to do global search and replace, but there is always difficulty with software such as Front Page that uses proprietary features. It's OK to stay with the software as long as you avoid its features that differ from accepted, commonly used HTML standards.

Here's an interesting article on relative vs. absolute addressing:

[searchengineworld.com...]

Here's the HTML validator:

[searchengineworld.com...]

Woz

2:23 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>The way Front Page does certain kinds of code is quite often "different" from the accepted standard.

Sorry Marcia but that statement doesn't hold water in my experience. I would be interested in seeing some examples to the contrary.

I refer to JohnEagles original statement:- "I use Frontpage 2000 and it generates all the relative links OK. No browser problems. (checked with linkbot)"

John, exactly which engines are causing the problems? Perhaps that will help us identify the cause.

Onya
Woz

Marcia

2:31 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Woz, I've only ever worked with one site done with FP, which could not get spidered - which is why the gentleman requested assistance.

The only site navigation was some kind of FP-generated navigation bar across the top. The site owner replaced it with regular HTML links, which looked the same when done, and added regular text links to the page bottoms, in addition.

The site then got crawled, and got the rankings he wanted (all still there). There's no example to show, however, since it was changed.

Woz

3:48 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, then that goes to my many previous statements that FP produces clean code as long as you stay away from the bots. However, I believe that from what John is saying that he would not be using the bots - John please correct me if I am wrong - as he has already stated that the site checks out OK by a link checker.

Therefor, the problem seems to be a more subtle one rather than being an FP one simply because he uses FP.

Let's await John's reply for more information.

paynt > You'll probably have to do a mass search and replace to be sure this error is fixed...

paynt, can you give us an example of what you think the problem is please?

Onya
Woz

johneagle

2:06 pm on May 7, 2001 (gmt 0)



Hey, Thanks for all the help going on here. I really appreciate it.
I will try to answer some of the questions:
Marcia: I read the links you gave me and tried the searchenginewatch code checker and although it came up with several errors none were related to the relative links.
Woz: I should explain that I am tracking down 404 errors.. the spiders that I am having trouble with are "analysis.he.net" "wanadoo.fr". The main spiders do not seem to have a problem. I have only figured out in the last couple of days what some of the errors are by looking at the log files. (I have been using webtrends to tell me. and it does not show most of the errors)
Yes. I have checked the links using Frontpage and Linkbot. I do not use bots. I am not using the Navigation links that FP provides.
I cannot give anymore info as I have recently moved my server and built a new web site (with the relative links) so information is kinda sketchy. Sorry.

I am using FP as it is the easiest to quickly get code up. I use Dreamweaver 3 to validate and check some of the code.

I guess the big question is: As a new webmaster (self taught) it seemed easier to use relative links especially as I am working on a family of web sites that share some data. Should I really be doing this? Most of the sites I have looked at do not seem to use many, if any , relative links.

paynt

2:21 pm on May 7, 2001 (gmt 0)



woz,
Maybe I didn't understand his problem correctly but from my experience I have had FrontPage change my links, if I had not set them as absolutes initially, to this .../.../ and then have problems come up from that. I also saw this [mydomain.com...] problem where it put those periods in and I’ve had to fix that.

For me I like clean code and straight directions. I’ve never had problems like this with Dreamweaver and when I used FrontPage previously I would then open the page in Arachnophilia and clean up code before uploading the page. I just learned not to trust the FrontPage wouldn’t insist on screwing up my code.

Marcia’s comment
>>…The way Front Page does certain kinds of code is quite often "different" from the accepted standard…>>

Does ring true in my experience. I know this wasn’t a FrontPage specific question but with the engines focus on clean code I feel it’s a very important issue to point out to folks who may not know that FrontPage could be causing them problems. I just spent a very hectic month removing FrontPage Publishing from a client’s site. Oh my goodness you cannot possibly realize what a mess FrontPage creates when you publish with it.

Anyway, I may not have captured the true essence of the original question. I do appreciate the reference to the article you pointed out Marcia, Relative Addressing vs Absolute. That does appear to answer questions regarding addressing.

johneagle

2:36 pm on May 7, 2001 (gmt 0)



paynt: I am sorry but I was not clear about my original post.
The links that I was showing was in my error log. They were generated by the spider after seeing the correct html for the relative link.

i.e. <a href="../../index.shtml"> is MY code in page [mydomain.com...]
It generates if you click it, in a browser, [mydomain.com...]

the spider gets: [mydomain.com...]

I am sorry I misled you.

John

Marcia

1:15 am on May 8, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"../../index.shtml

I have had a problem with this. It would not work this way, I had to use ../filename.ext

From a directory called /Graphics/

This code works for table background on /Graphics/index.html:
background="../images/bg_blue1.gif"

All of the links are absolute except this one, which also works:
<A HREF="Tiles/FabricBackgrounds">
So that leads to www.domain.com/Graphics/Tiles/FabricBackgrounds/index.html

Wow - how could I have missed noticing shtml, not html! John, are you using server side includes?

johneagle

1:36 pm on May 8, 2001 (gmt 0)



Yes, I am using SSI. Is that wrong too? Its probably to late to change as we are listed on the search engines.

Marcia

3:31 pm on May 8, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



John, it's a tremendous convenience and time saver. The only thing I can suggest with ../../ is to try using ../dir/filename.ext in a couple of different ways and see which works.