Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Search Engine Spider Indexing Urls incorrectly

Help needed on the way spiders are displaying my url

8:58 pm on Oct 1, 2005 (gmt 0)

New User

10+ Year Member

joined:Sept 3, 2005
votes: 0

I know there is a way to solve this issue through htaccess automatically because I was reading a post last night but had to go to sleep because it was 4.30am. Now I have searched for two hours and cannot find the same post.

I just moved my site from static html pages to a php based CMS/mysql database. Everything is going well so far. Spiders can get to my site and can follow through the internal links but are omitting the domain name in their indexing.
Instead of indexing: http://www.example.com/directory/article1
they index it as:
How can I resolve this through a general rule without having to create a Rewrite Rule for each piece of content.

[edited by: jdMorgan at 1:19 am (utc) on Oct. 2, 2005]
[edit reason] Example.com, de-linked. [/edit]

1:19 am on Oct 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
votes: 0

You're going to have to go over your site with a fine-toothed comb, and figure out why your cms is publishing malformed URLs like that. Once you fix the root problem things should settle out by themselves.

If you cannot 'see' the malformed URLs using your regular browser settings, then try spoofing a search engine robot user-agent with your browser or with something like WannaBrowser.

There is no way to 'repair' these search engine listings for the simple reason that a URL like 'http://directory/article1' won't resolve to your host (or to any host). If the spider can't follow that link to your host, then you can't intercept the request and redirect it to 'fix' the URL in the search engine's index.


9:11 pm on Oct 2, 2005 (gmt 0)

New User

10+ Year Member

joined:Sept 3, 2005
votes: 0

Thanks Jim. I tried the spoofing and I can browse all my links through the spider simulators. The concern I got was due to running the spider simulator on