homepage Welcome to WebmasterWorld Guest from 54.198.42.105
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess (bad links containing dots)
normcdn




msg:4412322
 3:10 pm on Jan 30, 2012 (gmt 0)

Hi,

I always look at my webmaster tools to see if there are any broken links. Since couple months, I am keep getting error 404 of miscoded links. Instead of linking to my site with good filename, they use truncated filename ending with .. or ...
Ex: instead of excursions.html it is linked to excurs...
I keep getting more and more error 404, and I am really mad about these bad links. I used to redirect manually each bad links to the good ones, but there is no end of doing this.
My question is (and I tried different scenarios and was not successful) How can I use wildcards to redirect bad links containing .. or ... to the index.page of the site.

It will be soooooooooooo great to fix this problem.

Many thanks,

Normand

 

g1smd




msg:4412327
 3:32 pm on Jan 30, 2012 (gmt 0)

Since Google has itself internally "invented" these bad links you should not adopt them by redirecting.

They should continue to return the HTTP 404 status. Google will eventually fix their issue.

penders




msg:4412344
 4:06 pm on Jan 30, 2012 (gmt 0)

Ex: instead of excursions.html it is linked to excurs...


Some forum software can mess up and truncate links in this way.

g1smd




msg:4412348
 4:12 pm on Jan 30, 2012 (gmt 0)

Google are looking at links on the page and extracting both the URL in the href and the anchor text. Where the anchor text "looks" like a URL they are treating it as a URL and requesting it from your server. In many cases, forum and blog software has truncated the anchor text. With Google's new way of doing things "be very nosey", they end up requesting millions of junk URLs. These requests should be rejected with a 404 response.

wilderness




msg:4412397
 5:48 pm on Jan 30, 2012 (gmt 0)

g1smd,
Pardon the intrusion.

The inquire doesn't say that these requests actually came from google, rather that they are showing up in webmaster tools (i. e., logs), is that a correct assumption?

There have long been malformed bots/harvesters that make these types of requests.
Here are some older examples (one of which is obviously a botnet)(I've more of these logs, however these served the example purpose):

/index.php?topage=../../../../../../../../../../../../../../../../../../../../../../../../../../../proc/self/environ

/MyFolder/../SubFolder/MyImage.gif

/../MyFile.html HTTP/1.0" 200 85618 "-" "Mozilla/3.0 (compatible)"

g1smd




msg:4412400
 5:57 pm on Jan 30, 2012 (gmt 0)

The clue in the original post was that the poster was looking in their WMT report.

We discussed URLs ending with "..." appearing in WMT, just a few weeks ago.

The likely method of their "discovery" was proposed at that time.

Bots do request a huge range of other random junk. Much of it is quite easy to block with a few simple rules.

normcdn




msg:4412413
 6:21 pm on Jan 30, 2012 (gmt 0)

Thanks for your help.
g1smd, do you know where I can find the post you are referring to ?

The problem I have is recent. My websites are online for many years, BUT it is only recently that I see this problem. In the WMT, I checked where the bad links come from, and I found out the links it is not written correctly.

wilderness




msg:4412414
 6:31 pm on Jan 30, 2012 (gmt 0)

Need help forbidding URLs with "./" [webmasterworld.com]

normcdn




msg:4412471
 9:55 pm on Jan 30, 2012 (gmt 0)

Hi,
To make simple, I added these lines in the .htaccess file ...

RewriteRule \.\. "http://www.mywebsite.com/" [R=301,L]

This way, all bad links will be redirect to the main page.
Thanks again!

g1smd




msg:4412482
 10:22 pm on Jan 30, 2012 (gmt 0)

You don't need the quotes.

I would also strip all parameters in the redirect otherwise duff requests with parameters will "ghost" those parameter based URLs as duplicates of your home page.

You also need another rule that detects URLs with .. in parameters and fixes those.

normcdn




msg:4412488
 10:37 pm on Jan 30, 2012 (gmt 0)

OK, so the best way to write the line should be ...

RewriteRule \.\. [mywebsite.com...]

Am I right?
I have many websites, and all of them contains redirect function. ON EVERY LINE, I use the quotes and ending with [R=301,L]
Should I change them all ?

Also, you mentionned ...
[You also need another rule that detects URLs with .. in parameters and fixes those]
I do not understand, the line I setup is not enough by itself ?

Many thanks!
This forum is a great source of info !

g1smd




msg:4412499
 11:01 pm on Jan 30, 2012 (gmt 0)

Add a question mark to the end of the target URL to supress parameters otherwise they are re-appended in the redirect.

Your redirect will not redirect requests like example.com/foobar?some../../../thing

The RewriteRule RegEx pattern looks only at path information.

You need another separate RewriteRule that has a preceding RewriteCond that tests %{QUERY_STRING} for " .. " requests. This ruleset will redirect those requests. This rule MUST suppress parameters in the redirect otherwise it will generate an infinte redirect loop.

Use example.com in this forum to stop URL auto-linking.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved