homepage Welcome to WebmasterWorld Guest from 54.161.192.61
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
GWT reporting Double Slash ( // ) internal links
GWT false reporting
kartoshka



 
Msg#: 4594207 posted 11:47 am on Jul 18, 2013 (gmt 0)

After a small redesign by our developers we started to get strange messages from Google Webmasters Tools.

There are three types

First it says there are links to URLs with now lang parametr

Then it says tehre are URLs to pages with directories missing where they should have been

And lastly it says I have links to wrong URLs from pages that I never had (those always have been redirects to homepage (for some reason they were in sitemap), and now they are not in the sitemap)


Could you please help me to find out what's wrong?

[edited by: phranque at 1:13 pm (utc) on Jul 18, 2013]
[edit reason] no screenshots, please [/edit]

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4594207 posted 1:25 pm on Jul 18, 2013 (gmt 0)

welcome to WebmasterWorld, kartoshka!


i assume GWT was reporting these as 404 or 410 (Not Found) "errors"?

if they don't exist and you're not linking to those urls internally then it's not a problem.
it could be that google discovered some of these urls elsewhere.
if it is a problem on your end (such as the erroneous inclusion in your sitemap) they are reporting these for your convenience so you may fix the problem.

since it occurred right after a redesign, i would suggest using a site crawler to see if you are linking to any of these urls internally.
if you return a 410 Gone status code instead of a 404 Not Found they should disappear soon(er).

kartoshka



 
Msg#: 4594207 posted 2:11 pm on Jul 18, 2013 (gmt 0)

I was using MOZ for 2 months, and analyzed the crawl results before and after each modification. Have tried almost all possible crawlers, neither reports any problems like GWT.

Here is the thing.
The URL www.mywebsite.com/en/haiti is a 301 redirect to my homepage, yet somehow GWT says it links to www.mywebsite.com/haiti which is 404 page actually.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594207 posted 9:55 pm on Jul 18, 2013 (gmt 0)

www.mywebsite.com/haiti which is 404 page actually

Watch out here. The form
www.example.com/anyname
will always be a 404, because there is no such file on your server. As an URL, it can work in one of two ways.

EITHER you've got a physical directory named
www.example.com/anyname/
with mod_dir --or IIS equivalent-- slapping on the final slash and then finding a file named index.something
OR the extensionless "anyname" is quietly rewritten to "anyname.php" or "anyname.html" or whatever it happens to be.

If you don't use this URL, then a 404 is the correct response and you don't need to think about it... UNLESS you've unintentionally got something in your own pages that links to this nonexistent page. Which brings us to:

What do you mean by "GWT says it links to"? If an URL redirects, it can't link to anything. A robot can choose not to follow a redirect-- but it can't simply ignore it and crawl the originally requested URL. Did you mean that it "redirects to" /haiti when it should be redirecting to / root? If so, that's a concrete problem in the wording of your redirects.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4594207 posted 11:49 pm on Jul 18, 2013 (gmt 0)

the form www.example.com/anynam will always be a 404

on most server filesystems extensionless filenames are valid.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594207 posted 4:40 am on Jul 19, 2013 (gmt 0)

In theory, maybe, but have you ever personally met someone who used them? If you're going to pull the wool over someone's eyes, let it be the end user rather than your own server ;)

kartoshka



 
Msg#: 4594207 posted 6:55 am on Jul 19, 2013 (gmt 0)

lucy24, it appears in google webmasters tools' NOT FOUND reporting page that www.mywebsite.com/en/haiti links to www.mywebsite.com/haiti.

I had the URL www.mywebsite.com/en/haiti in my sitemap for a long time, but it has always been redirect to www.mywebsite.com/en.

Now on every page on my website in the footer section I have links from
www.mywebsite.com/en/AnyPage
to multilingual versions of the same page
www.mywebsite.com/de/AnyPage
www.mywebsite.com/it/AnyPage
www.mywebsite.com/ru/AnyPage

for some reason GWT says I have URL linking to www.mywebsite.com/haiti from the page www.mywebsite.com/en/haiti.

in some cases it says I have URLs linking to
www.mywebsite.com/en//rastalife
from the page
www.mywebsite.com/en/haiti/rastalife

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4594207 posted 9:43 am on Jul 19, 2013 (gmt 0)

have you verified that googlebot has crawled these urls since your redesign?

what do the "Detected" dates look like in GWT relative to the redesign deployment?

kartoshka



 
Msg#: 4594207 posted 11:07 am on Jul 19, 2013 (gmt 0)

URL:
http://www.example.com/haiti


Linked from:
http://www.example.com/ge/haiti
http://www.example.com/en/haiti

Last crawled: 7/15/13
First detected: 7/15/13

As I said - the linked from URLs were in the sitemap 1 month ago.
Now (back then too) these are 301 redirects to the homepage, hence they can't have links to any pages, that's the problem.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4594207 posted 12:22 pm on Jul 19, 2013 (gmt 0)

Double slash URLs should redirect to the correct URL if such a page exists, or directly return '410 Gone' if there is no such page to redirect to.

There should be no mass redirect to a single target as that's a 'soft 404' scenario that you want to avoid.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4594207 posted 12:40 pm on Jul 19, 2013 (gmt 0)

maybe your 301 is causing confusion and the GWT reporting is a result of that.

http://support.google.com/webmasters/answer/2409439 [support.google.com]:
Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Such pages are called soft 404s, and can be confusing to both users and search engines.

(my emphasis)

kartoshka



 
Msg#: 4594207 posted 12:53 pm on Jul 19, 2013 (gmt 0)

oh sorry, it's 302 redirect (as soon as we lunch the country it becomes live).

the thing is we should have never had those URLs in our sitemap, and now we don't have them neither in our sitemap nor we have any page linking to them.

it could have returned soft 404 if we had link to it, but it says the URL exists and there is a URL from it to a 404 page.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594207 posted 8:41 pm on Jul 19, 2013 (gmt 0)

If you've checked your own site and verified that you have no links to the nonexistent pages, you've done enough. Sometimes google will simply invent something out of its own fevered imagination, and all you can do is click "Fixed" and wait for the issue to disappear.

I think some servers may ignore superfluous slashes at the end of the domain name. I once found a human visitor in logs whose entire visit-- page plus all supporting files-- came through as www.example.com//et cetera. (Directory-specific images, so it was all done with relative links.) It was a one-off, but it played havoc with log-wrangling so it's the kind of thing I remember.

it could have returned soft 404 if we had link to it, but it says the URL exists and there is a URL from it to a 404 page.

I do not understand this sentence.

kartoshka



 
Msg#: 4594207 posted 6:17 am on Jul 20, 2013 (gmt 0)

lucy24, in GWT Not found tab:

there is a link to
http://www.example.com/haiti
from the pages
http://www.example.com/ge/haiti
http://www.example.com/en/haiti

but the last 2 pages do not exist and are simply 302 redirects to the homepage.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594207 posted 8:37 am on Jul 20, 2013 (gmt 0)

:: forehead-slap ::

302.
302.
302.
NOT 301?

If that was right, and wasn't a typo, it may mean that anything the search engine finds on the "target" page (the page you're redirecting to) is attributed to the original page (the one you're redirecting from).

Time to fine-tooth-comb that home page.

kartoshka



 
Msg#: 4594207 posted 8:55 am on Jul 20, 2013 (gmt 0)

I get that it's not a right thing to do, but anyway we don't have any links to those pages.
For some reason our CTO decided it would be right to not return 404 for these pages (e.g. google might crawl them just because it thinks they exist). And when we lunch the country the page becomes live.

Do you think it's stupid and we shall change it?

However I don't think it's the reason for the problem described in the beginning of the page (and I did not see you saying it is).

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4594207 posted 1:38 pm on Jul 20, 2013 (gmt 0)

the last 2 pages do not exist and are simply 302 redirects to the homepage

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic.

For some reason our CTO decided it would be right to not return 404 for these pages...
Do you think it's stupid and we shall change it?

do you think?

kartoshka



 
Msg#: 4594207 posted 1:46 pm on Jul 20, 2013 (gmt 0)

I see your point, but the problem here is not solely that, unfortunately.

As I said I also have this URL reported as 404 in GWT
www.mywebsite.com/en//content
and GWT reports it is coming from this page
www.mywebsite.com/en/anypage/content
which exists and when I look into its source I can't find any problem, when I fetch it as google again the last does not find any problem etc.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594207 posted 12:07 am on Jul 21, 2013 (gmt 0)

As I said I also have this URL reported as 404 in GWT
www.mywebsite.com/en//content
and GWT reports it is coming from this page
www.mywebsite.com/en/anypage/content
which exists and when I look into its source I can't find any problem, when I fetch it as google again the last does not find any problem etc.

Is that an exact quotation of the offending URLs? (Changing the names, duh.) What I see now is not doubled slashes but a null directory name. They look identical // to the naked eye, but they might come from entirely different causes. Do you ever have a .* where a .+ is needed? Or php that generates an url even if some variable is empty?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4594207 posted 8:32 pm on Jul 21, 2013 (gmt 0)

Yes, the faulty URL is looking more and more like the components are:

/en + /null + /content
kartoshka



 
Msg#: 4594207 posted 6:29 am on Jul 23, 2013 (gmt 0)

lucy24, thanks for describing the problem in a more accurate way, I am not a developer myself.

do you think it's solely a problem from our side? if yes what could cause such problem?

[edited by: bill at 7:39 am (utc) on Jul 23, 2013]
[edit reason] see sticky [/edit]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved