Welcome to WebmasterWorld Guest from 54.242.175.98

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Getting rid of www.mysite.com/page.htm%22

Site:www.mysite.com showing strange dead urls

     
7:47 am on May 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 6, 2005
posts:1837
votes: 95


Hi all

For the last few weeks my site listings started showing strange dead urls of the kind:

www.mysite.com/page.htm%22
Similar pages

When clicking the link I get:
---------
Not Found
The requested URL /page.htm" was not found on this server.
Apache/1.3.33 Server at www.mysite.com Port 80
------------------

I have successfully removed some of them using Google url removal tools (it takes around 4 days). However whenever I get rid of some of those links, new dead links of the same kind show again (:(

Is there a permanent solution for this problem.

Thanks!

7:35 pm on May 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0



You have an internal link somewhere on your site, with a simple typo in it. Google follows the typo as if it were a real link and indexes it.

You need to find the typo. There are millions of %20 entries on the end of URLs in Google - that is a space. I don't recall what %22 is, I think it is a double quote. You might have a stray one in your script.

Run your site through the W3C HTML validator and is likely that it will find it.

10:50 pm on May 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 6, 2005
posts:1837
votes: 95


g1smd

Thanks.

Almost all my internal links are on menu bar on most of my pages and I haven't made any change to it during the last two months or so.

As you suggested I ran my home page and several other pages on W3C HTML validator. There were few errors in tables and colors tags, but not in links. Then I took a look at all the internal links on my menu bar and they look ok.

11:44 pm on May 4, 2005 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5749
votes: 96


There might be someone linking to you with an "oops" in the link. That's happened to me a couple of times.

Check in every search engine you can think of to see if the problem URL has any backlinks, and if you find such a link send a polite request to the webmaster to fix it.

12:07 am on May 5, 2005 (gmt 0)

Junior Member

joined:Nov 1, 2003
posts:101
votes: 0


reseller

I am experiencing the same problem as you.

Ignore these suggestions of this problem being a typo or an external link. That is not the case at all.

I believe its a spidering problem. Google creates the %20 for some reason.

Is your site getting spidered regularly? Is your traffic down? Can you find your site when you search for it by domain name?

12:26 am on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It isn't %20 this time, instead it is %22 in the URL.

I often see %20 and that is a space. Everytime I have seen that it has been a typo in a link somewhere.

8:39 am on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 6, 2005
posts:1837
votes: 95


Friends!

Thanks for your feedback which kept me busy for few hours this morning checking different things as you suggested.

Her is an "interim report" of the problem.

Googlebot visits my site around twice aday, and when I add new pages they show up on the index within around two days.

- I checked Google for backlinks to the affected page running command allinurl:www.mysite.com/affected_page.htm

and I saw that Google see two pages of the same, the original page and the dead link url as per the followings:

----------------
Title: Free bla..bla bla.. ...
snippet: Get free bla...bla...
www.mysite.com/affected_page.htm - 63k - Cached - Similar pages

www.mysite.com/affected_page.htm%22
Similar pages
---------------------------

I fear that Google see that as duplicate.

- Yahoo, running command
linkdomain:www.mysite.com -site:www.mysite.com

has returned about 10.300 links and I have no energy to follow all these links (:(

then I ran the command
link:http://www.mysite.com/affected_page.htm -site:www.mysite.com

and got 24 links, which I went through all of them. All were ok unless one which got me upset. Maybe you wish also to check your own site for this one.

Somebody has framed all my site under a free redirect service with a banner window popping up on each redirect to my site..

So I went to the redirecting service site and clicked the "edit or delete" to remove my site but I have no password. So I requested pass w (Forgot your password?) to be emailed to me (in case that somebody used one of my emails accounts to register me to the redirecting service). It displayed a message:
--------------
Done
Your password has been mailed to your e-mail address, detroiterz@strange_site.com
------------------------------

And there I could see that it wasn't I who ordered such service. because I have never had such email. This is not a nice thing to do to my site at all.

Then I went to www.strange_site.com to investigate that email but ended in a redirect to another site "This site under reconstruction and coming soon".

I have sent this morning a request to the free redirecting service asking to remove the redirect to my domain.

However, the problem is still there with:
www.mysite.com/affected_page.htm%22

So Im gonna remove it by Google url removal tool which takes around 4 days, and hope no other %22 dead links shall show up again in my site listings on Google.

9:52 am on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
posts:1551
votes: 10


The %22 is a double quote. This can be caused by links like this:

<a href=page.html">a link</a>

Note that there is a " missing after the href=. If this happens, then the spider will assume that the final " is part of the URL, which would result in the symptoms you're seeing.

Of course it might also be a spider bug, but it seems more likely that Googlebot is seeing this kind of broken links somewhere (doesn't have to be on your own site).

1:37 pm on May 5, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:10
votes: 0


One of my sites has had this problem except with %23's with Google for the past month. Nothing has changed in that time, of course. In Google's SERPs, the results look like:

www.mysite.comprivacy.asp%23cookies/

Since the beginning, I've validated all my pages (including this one) with CSE HTML Validator. There're definitely no typo's in any of the links.

Almost the entire first page of Google's SERPs are these crap pages. In an attempt to fix the problem, I've deleted privacy.asp and removed all links on my site to it. That was about a month ago and despite daily spidering they haven't gone away.

So, thanks to Google I'm not posting a privacy policy any longer. I'm hopeful that considering how slow Google is that it will disappear in the next 6 months or so. Amazing what web search has become.

2:17 pm on May 5, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 31, 2005
posts:11
votes: 0


If these address don't actualy lead anywhere can they do any harm?

We have:
1) www.www.oursite.co.uk
2) www.oursite.co.uk%20
3) www.oursite.co.uk/page.htm. (i.e. an extra full point at the end of the URL)

Neither 1) or 2) actually leads to a real page, or generates any sort of header code. 3) leads to our custom 404 and generates a 404 header response.

Do we need to ask for them to be deleted, are any of them likely to cause a problem with Google?

2:48 pm on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 6, 2005
posts:1837
votes: 95


bird

Thanks.

<The %22 is a double quote. This can be caused by links like this:
<a href=page.html">a link</a>

Note that there is a " missing after the href=. If this happens, then the spider will assume that the final " is part of the URL, which would result in the symptoms you're seeing.>

Have checked all my internal links again and cant see missing (") on any of them.

<Of course it might also be a spider bug, but it seems more likely that Googlebot is seeing this kind of broken links somewhere (doesn't have to be on your own site).>

Im afraid that the reason might be links outside my site, which make the problem more complicated to solve. Better if it is just a spider bug, of course.

3:30 pm on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 23, 2003
posts:801
votes: 0


Panacea write
I believe its a spidering problem. Google creates the %20 for some reason.

That's a very confident statement that seems to be flying in the face of many of the other posters.
Any idea why you think Google is broken in this way?

I go with the belief that there's a bad link on another website, but it's hard to find inside a search engine because you can't syntactically call up a search with a separator character in it.

DerekH

4:04 pm on May 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> 2) www.oursite.co.uk%20 <<

Believe me, that IS a space. Somewhere there is a
<a href="http://www.oursite.co.uk ">
link pointing to you.

Do a search for "mailto:%20" and http://%20 and you'll find millions.

1:19 pm on May 6, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 24, 2005
posts:10
votes: 0


Assuming that someone would create a link to a non-existent page, why would Google add such a non-existent page to their listing?

What you're saying doesn't make sense to me.

I think there are a lot of Google-apologists running around thinking that Google never makes mistakes and that if you're not in their index, it's your fault and not theirs. I'd like to see if this attitude continues once their site's SERPs solely return a bunch of non-existent pages.

There's an increasing theme in these forums of "What is wrong with Google?" I guess there's nothing wrong with Google - everyone else is wrong.

4:20 pm on May 6, 2005 (gmt 0)

Full Member

10+ Year Member

joined:June 21, 2003
posts:240
votes: 0


I'm trying to remove a www.mysite.com%60/ URL. When I submit it to the removal tool, however, I get this message:
"That URL does not seem to be in the correct format. Please try entering the URL again."

Am I doing something wrong?

7:06 pm on May 6, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


By chance, today, I was visiting someone that uses Freeserve/Wanadoo internet in the UK. He had just received and email from a friend that said:

>> Have a look at this page:
>> "http://www.somesite.com/somepage.htm"

with the URL in quotes.

He used webmail to read the message. When he clicked the link he got a "Page Not Found" error and the URL in the browser URL bar said:

http://www.somesite.com/somepage.htm%22

So, if you see these URLs in your logs the visitor might be reading email. If you see them in Googles SERPs then the link is on another site somewhere.

7:21 pm on May 6, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> why would Google add such a non-existent page to their listing? <<

Google adds to its database every URL that it has ever seen in links on pages. If the link leads to content then that gets indexed. If the link doesn't work, it still gets into the list as a URL-only listing. These can give you warning about faulty links on, or pointing to, your site.

.

>> I'm trying to remove a www.mysite.com%60/ URL. <<

If that %60 was after a filename (rather then directly after the domain) then you could submit it under the "Remove an outdated (404) link" option, since no content will be returned by your server.

If that does not work, then add the URL to the Disallow list in your robots.txt file and then submit the URL of the robots.txt file to the "Remove a URL using robots.txt" option.

I would assume that Disallow: /%60 is what you need.

9:09 pm on May 6, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 6, 2005
posts:1837
votes: 95


g1smd

<http://www.somesite.com/somepage.htm%22

So, if you see these URLs in your logs the visitor might be reading email. If you see them in Googles SERPs then the link is on another site somewhere.>

Thanks. Very interesting observation.

It seems that "It Comes With the Territory". The more interesting your content is which people might link to or mention in Ezines, the more %22 things you get :-)

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members