homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

repeated 'page not found' results

 2:22 pm on Sep 20, 2011 (gmt 0)

I have a message board installed on my site (like the one here at Webmaster World) and repeatedly find 404 ("Page not Found") error codes in my server stats. The URL's always follow the same pattern:


The correct URL should be: www.mysite.com/board/page.php

I've checked my pages over & over for links to "/board/board/..." but never find any. My internal link codes (from pages w/in the message board) are like so: <a href="/board/page.php">link</a>

The site works fine when I navigate, and I've never received any email complaints, so I'm wondering if this is from some bot that spits out bad URLs for whatever reason.

Any ideas? (PS - My site redirects these errors to the correct page, but I'd still like to eliminate the repeated error if possible.)



 4:18 pm on Sep 20, 2011 (gmt 0)

It's got to be a relative link somewhere?


 4:27 pm on Sep 20, 2011 (gmt 0)

See if WMT lists any links pointing to those duff URLs.

Run Xenu LinkSleuth over the site to see if it finds anything.

Who requests the duff URLs? Look in your logs.


 7:29 pm on Sep 20, 2011 (gmt 0)

rocknbil: are you asking me or telling me? If you're asking, I'd say yes, since by now I'm sure I have no absolute URLs like that.

g1smd: I can't find anywhere in WMT that gives that info; can you be more specific?

I've checked the links w/other diagnostic tools -- no problems.

My host doesn't post those details in my server stats, so I have no way to know. But it seems from the sheer number of them that it must be from an automated source, either a search bot or spammer.


 9:29 pm on Sep 20, 2011 (gmt 0)

I think I know what it is -- since my relative links start with "/board/..." some bots(?) are trying to add that on to the directory URL of the current page. So if they're on:


and read a relative link, they strip off the /page1.php (leaving "mysite.com/board"), then add my relative link beginning with "/board/..."

I thought browsers/bots are supposed to go back to the root directory for relative links beginning with "/" but maybe some don't.

I'll take out the "/board/" in my link code and see what happens.


 9:40 pm on Sep 20, 2011 (gmt 0)

You should link by using a URL that begins with a slash and counts from the root.

The place to look for these errors in WMT is in the "crawl errors" section. You're relying on Google having seen those URLs, and having recorded the URL of the page that supposedly pointed at that duff URL.

You also need the raw log files for your site, not some noddy "stats" data. The logs are usually in a folder called logs, above the web root, when you look via FTP.


 11:19 pm on Sep 20, 2011 (gmt 0)

I'll take out the "/board/" in my link code and see what happens.

Don't do that. The links are correctly formed, so it's the other end's fault if they're not getting it right. Google does a lot of demented things, but it knows how to read links. It's only when they're getting the link from some third party that they fail.

If your host does not allow you to see your raw logs, change hosts.


 11:41 pm on Sep 20, 2011 (gmt 0)

Thanks, I'll check into all that...and leave my code as is.


 3:18 pm on Sep 21, 2011 (gmt 0)

I checked WMT - only 1 crawl error listed, and it wasn't the /board/board/ issue.

I also checked my logs. Almost all the bogus links are from foreign sources (crawlers?). Examples:

RIPE Database query service (3: Ukraine, UK, Poland)
Internet Service Provider in Hong Kong
China Unicom Jiangsu province network
LG DACOM Corporation (Korea)

And the referrer is invariably a page on my site with the same URL error; it's also not just "/board/board/" -- sometimes it's "/board//board/", sometimes it's "/board/board/board/".

Obviously it's not on my end; should I just ignore it?


 6:52 pm on Sep 21, 2011 (gmt 0)

How badly do you want to keep unwanted robots happy? :)

(That was a rhetorical question. I assume yours was too.)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved