Forum Moderators: phranque

Message Too Old, No Replies

Malformed URLs being rendered same as correct ones.

         

geekay

4:11 pm on May 4, 2023 (gmt 0)

10+ Year Member



On a very small static site Googlebot has found out that
https://www.example.com/folder///page.html

(i.e. with two or more dashes between folders or between a folder and a page) results in the same page as the correct
https://www.example.com/folder/page.html

The only difference I notice is that the malformed URLs do not display the inline images.
In the Search Console this veird error results in the notice "Duplicate without user-selected canonical. First detected: 9/14/22." Now these duplicates outnumber the indexed pages and could grow endlessly.
Any idea what might cause this? All internal links are relative but correct. Server is Apache and its configuration is not in any way under my control.

not2easy

5:01 pm on May 4, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



When you say that the server configuration is not under your control, I hope you mean above the root domain's public_html folder. If you use SFTP to upload and download to/from the server you should see where you are on that 'tree'. If you cannot edit your .htaccess file you may need to consult with the host to ensure that all URLs land on the same version via 301 redirect.

If you are using CPanel to upload/download you may need to look in your File Manager and enable 'invisible' files to find the .htaccess file,

Without that 301 in place, Google sees it as 4 different domains:
http://example.com
https://example.com
http://www.example.com
https://www.example.com

You may have a single GSC account, that is the form to redirect to. If it is set up as https://www.example.com, ensure that visits to http://example.com will land on the preferred URL. You can test it by pasting in an incorrect example and then looking at your logs to see the server response.

lucy24

5:59 pm on May 4, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, I know this one. Years ago, I goofed in some auto-generated links, resulting in things like
https://example.com/paintings//catsrats/
To this day, I get occasional search-engine requests for those extra slashes, which have to be redirected. This led to the interesting discovery that you can't simply put // in the pattern of a RewriteRule, because the duplication isn't recognized. It has to go in a Condition, like this:
RewriteCond %{REQUEST_URI} /paintings//+(.*)
RewriteRule ^paintings https://example.com/paintings/%1 [R=301,L]


Psst! not2easy! I don't think this is really an html question. Apache maybe?

geekay

6:02 pm on May 4, 2023 (gmt 0)

10+ Year Member



My .htaccess file is working. All four possible domain versions are rewritten to one canonical. The problem is inside the site, below the .com level. The site has been online since late 1990s and it has never before last year happened that folder///page.html is being treated same as folder/page.html. There are no such malformed external backlinks. Of course it could be just Googlebot making up URLs, but folder///page.html should not work at all on my site. Therefore I wonder if it could be a server configuration issue.

(When you talk about control above the public_html folder I would be happy to have some control of that level too. cPanel log in is one level above public_html and some blocked spiders have found out that they can mess up my raw access logs by requesting my cPanel log in page... All the files of the log in page show up each time in the raw log. The .htaccess file does not work above public_html.)

geekay

6:05 pm on May 4, 2023 (gmt 0)

10+ Year Member



I love you, Lucy24!

not2easy

7:10 pm on May 4, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sorry, I was trying to hit a bunch of bases there. Now, with the extra information I agree with lucy24.
(and we all love her!)

lucy24

11:32 pm on May 4, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Aw, shucks.

:: blushing modestly ::