homepage Welcome to WebmasterWorld Guest from 107.21.163.227
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Google crawling my Query String URLs not one I created with the .htacc
Google crawls my URLs which is with query string. Well I like to crawl URLs
sweetguyzzz




msg:4305230
 7:32 am on Apr 28, 2011 (gmt 0)

Hi to all,

Please let me know the solution to this problem as this is very important for me:

I am using the Query String in my site for the pages of categories and others.
I used this .htaccess code:

#Redirect .com/main/page.php to the variable of $category_type and $category_name
RewriteRule ^(main)/([^.,^/]+)\.php$ /index.php?category_type=main&category_name=$2 [L]


And also in my site, all links are going towards main/link-name.php but problem is there Google is crawling my links index.php?category_type=name&category_name=link-name

What the problem could be?
I am waiting for the good response.

Thanks in Advance
sweetguyzzz

 

g1smd




msg:4305232
 7:36 am on Apr 28, 2011 (gmt 0)

Presumably the site used to link to URLs with parameters? Google has a long memory and while those URLs return "200 OK", Google will continue to crawl them.

You need to add a redirect to get them to change to over to the new URLs. That question is asked several times per week and as such there are several THOUSAND prior threads with example code in this forum.

g1smd




msg:4305233
 7:38 am on Apr 28, 2011 (gmt 0)

Describe in words what this pattern is supposed to do, what characters it matches or does not match:

^(main)/([^.,^/]+)\.php$
sweetguyzzz




msg:4305254
 8:43 am on Apr 28, 2011 (gmt 0)

Thanks g1smd for your reply.

This pattern is basically our category of MAIN and after the / is the pattern which contains any character except . and / ending with the extension .php

When anybody tries to open mysite.com/main/page-name.php then it will basically ask for mysite.com/index.php?category_type=main&category_name=page-name

I used the same rewrite in my first site and Google crawls the links which I want to be crawled and in my first site I did not use any of the redirect for this pattern but now in this site?

Please clear it.

Thankful to you.
sweetguyzzz

sweetguyzzz




msg:4305261
 8:53 am on Apr 28, 2011 (gmt 0)

My site is new and these URLs like index.php?category_type=main&category_name=name-used is never used in any internal linking but Google crawls them. I used only .com/main/name-used.php but how Google crawls those I do not know.

g1smd




msg:4305282
 10:31 am on Apr 28, 2011 (gmt 0)

The
[^.,^/]+ pattern is a problem.

The correct syntax is
[^/.]+ here.
g1smd




msg:4305283
 10:33 am on Apr 28, 2011 (gmt 0)

It is likely that you have redirects and rewrites in your .htaccess file and that you have the rules in the wrong order.

You should list all redirects before you list any internal rewrites.

If you get the order wrong then the redirect will expose the previously rewritten internal pointer back out on the web as a new URL.

sweetguyzzz




msg:4305290
 10:54 am on Apr 28, 2011 (gmt 0)

Options +FollowSymLinks
RewriteEngine On


#Redirect any / to index.html
RewriteCond %{REQUEST_URI} /[A-Za-z0-9-\./]+/$
RewriteRule (.*) /$1index.html [R=301,L]



#Redirect .com/main/page.php to the variable of $category_type and $category_name
RewriteRule ^(main)/([^.,^/]+)\.php$ /index.php?category_type=main&category_name=$2 [L]

#Redirect .com/main/payment-verify.php&id=has221 to the variable of $category_id and $category_name and others
RewriteRule ^(main)/([^.,^/]+)\.php(&.*)$ /index.php?category_type=main&category_name=$2&$3 [L]

#Redirect .com/123-categoryname/index.html to the variable of $category_id and $category_name
RewriteRule ^([0-9]*)\-([^.,^/]+)/index\.html$ /index.php?category_id=$1&category_name=$2 [L]

#Redirect .com/123-categoryname/12-pagename to the variable of $category_id, $category_name and others
RewriteRule ^([0-9]*)\-([^.,^/]+)/([0-9]*)\-([^/]+)$ /index.php?category_id=$1&category_name=$2&product_id=$3&product_name=$4 [L]



#For redirect user to www.example.com if they ask for example.com
RewriteCond %{http_host} ^example.com [nc]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,nc]


My whole .htacccess

g1smd




msg:4305300
 11:10 am on Apr 28, 2011 (gmt 0)

The non-www to www redirect as the very last item is a major problem. This should be the SECOND ruleset on the page (after the index vs. slash ruleset and before the first internal rewrite). It also needs the [L] flag.

The index / slash ruleset is completely wrong. Never redirect TO a named index page. The canonical URL for a folder or the index page of a folder ends in a trailing slash. Additionally, a redirect should include the domain name as part of the target URL.


This pattern is concerning:
(main)/([^.,^/]+)\.php(&.*)

Do you really have requests like
example.com/main/payment-verify.php&id=has221 on your site?

The & and = should only be used in a query string. There is no query string here.

Finally, four of the things that you call redirects are actually internal rewrites.

[edited by: g1smd at 11:19 am (utc) on Apr 28, 2011]

sweetguyzzz




msg:4305303
 11:13 am on Apr 28, 2011 (gmt 0)

Thanks for your response.

I am sending you the results which I want from each of given rewrite rule. Please if there is any mistake in it. Kindly tell me what is the solution to that.

1) Rewrite condition and rule is used to send any visitor to /folder-name/index.html if they ask for /folder-name/ only.

2) This is used to show page index.php?category_type=main&category_name=any-category-name when anybody visits our link .com/main/any-category-name.php

3) The line which I have problem but I use some alternative but I know that this alternate is not a perfect one. Basically use of this line is when anybody opens link .com/main/any-category-name.php?a=123 so the request goes to index page like this === index.php?category_type=main&category_name=any-category-name&a=123
BUT I found that question mark in line .php?a=123 is making a problem so I change it to ampersand sign (&) so it is working now but actually I want it to be a question mark their. If you have solution to this then it will be very much good and learning point for me.

4) When URL open is .com/12-my-category/index.html then ask page from index.php?category_id=12&category_name=my-category

5) Same as 4 but some more query strings

6) Force www in url.

Waiting for your response.
Thanks

sweetguyzzz




msg:4305305
 11:21 am on Apr 28, 2011 (gmt 0)

Now, I do following changes:
1) Shift forcing of www to second number.
2) Add [L] Flag to the www forcing rewrite rule
3) Replace [^.,^/]+ with [^/.]

It looks like this:
Options +FollowSymLinks
RewriteEngine On



#Redirect any / to index.html
RewriteCond %{REQUEST_URI} /[A-Za-z0-9-\./]+/$
RewriteRule (.*) /$1index.html [R=301,L]



#For redirect user to www.example.com if they ask for example.com
RewriteCond %{http_host} ^example.com [nc]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,nc,L]



#Redirect .com/main/page.php to the variable of $category_id and $category_name
RewriteRule ^(main)/([^/.]+)\.php$ /index.php?category_type=main&category_name=$2 [L]

#Redirect .com/main/payment-verify.php?id=has221 to the variable of $category_id and $category_name and others
RewriteRule ^(main)/([^/.]+)\.php(&.*)$ /index.php?category_type=main&category_name=$2&$3 [L]

#Redirect .com/123-categoryname/ to the variable of $category_id and $category_name
RewriteRule ^([0-9]*)\-([^/.]+)/index\.html$ /index.php?category_id=$1&category_name=$2 [L]

#Redirect .com/123-categoryname/12-pagename to the variable of $category_id, $category_name and others
RewriteRule ^([0-9]*)\-([^/.]+)/([0-9]*)\-([^/]+)$ /index.php?category_id=$1&category_name=$2&product_id=$3&product_name=$4 [L]


Please let me know if any other thing is to be changed ?

I am very much thankful for this help.
sweetguyzzz

g1smd




msg:4305307
 11:28 am on Apr 28, 2011 (gmt 0)

The parameter handling needs a RewriteCond looking at QUERY_STRING.


RewriteRule ^([0-9]*)\-([^/.]+)/([0-9]*)\-([^/]+)$

The above pattern allows this to be a valid URL:

example.com/-words/-words

so change the * to + in the patterns.



Hyphens should not be escaped.


New code:

DirectoryIndex index.html index.php

Options +FollowSymLinks
RewriteEngine On

# Externally redirect only direct client requests for /index.php
# and /index.html and /index.htm to URL ending with slash.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

# Externally redirect to canonicalize the domain name if a non-canonical
# hostname is requested, in order to prevent duplicate-content problems
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

#REWRITE .com/main/page.php to the variable of $category_type and $category_name
RewriteCond %{QUERY_STRING} !.
RewriteRule ^(main)/([^/.]+)\.php$ /index.php?category_type=$1&category_name=$2 [L]

#REWRITE .com/main/payment-verify.php?id=has221 to the variable of $category_id and $category_name and others
RewriteCond %{QUERY_STRING} (^|&)id=([^&]+)(&|$)
RewriteRule ^(main)/([^/.]+)\.php$ /index.php?category_type=$1&category_name=$2&id=%2 [L]

#REWRITE .com/123-categoryname/ to the variable of $category_id and $category_name
RewriteRule ^([0-9]+)-([^/.]+)/$ /index.php?category_id=$1&category_name=$2 [L]

#REWRITE .com/123-categoryname/12-pagename to the variable of $category_id, $category_name and others
RewriteRule ^([0-9]+)-([^/.]+)/([0-9]+)-([^/.]+)$ /index.php?category_id=$1&category_name=$2&product_id=$3&product_name=$4 [L]

[edited by: g1smd at 11:30 am (utc) on Apr 28, 2011]

sweetguyzzz




msg:4305308
 11:28 am on Apr 28, 2011 (gmt 0)

g1smd I have send a request on your skype. Can you talk on it, because it will be instant and fast. After you give a solution, I will post it here...

g1smd




msg:4305310
 11:31 am on Apr 28, 2011 (gmt 0)

No Skype on this netbook.

sweetguyzzz




msg:4305315
 11:46 am on Apr 28, 2011 (gmt 0)

Hi,

Thanks again,
Can you please explain some points in your new code?

1) Use of DirectoryIndex index.html index.php
2) Use of 1st Rewrite condition, please give sample so I can understand it quickly. May be I am annoying you but I have to understand this Apache Rewrite so next time I will not annoying you.
3) On 3rd Rewrite Cond, what is use of %{QUERY_STRING} !.
4) 4th Rewrite condition you send is ok, but I want the follwoing result from it.

When any one request .com/main/category-name.php?any-query-string-name=any-query-string-value so it will show the page index.php?category_type=main&category_name=category-name&any-query-string-name=any-query-string-value

Very much thankful to you

g1smd




msg:4305318
 11:57 am on Apr 28, 2011 (gmt 0)

1. This allows request for
www.example.com/ to show the index page content without needing "index" in the URL.

2. Redirects
www.example.com/<anything>/index.php to example.com/<anything>/ ending in a trailing slash. Does same for index.html and index.htm too.

3. The %{QUERY_STRING} can see the query string data. The test in 3. makes sure there is no query string ( !. is "not something" which is the same as "nothing") otherwise rule 4 could never run for requests containing the id parameter.

4. Try this code instead:
#REWRITE .com/main/payment-verify.php?id=has221 to the variable of $category_id and $category_name and others
RewriteCond %{QUERY_STRING} .
RewriteRule ^(main)/([^/.]+)\.php$ /index.php?category_type=$1&category_name=$2 [QSA,L]


The QSA flag re-appends the original query string.

sweetguyzzz




msg:4305331
 12:20 pm on Apr 28, 2011 (gmt 0)

ok,

In RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
What a {THE_REQUEST} returns ? I do not know about it?
What [A-Z]{3,9} matches with?
What \ HTTP/ means ?

Kindly clear them as it will be my last question.

Thanks in Advance
sweetguyzzz

g1smd




msg:4305332
 12:24 pm on Apr 28, 2011 (gmt 0)

Use the Live HTTP Headers extension for Firefox and you will see that when you type:
www.example.com/folder/index.html

your browser sends:
Server: www.example.com
GET /folder/index.html HTTP/1.1

to the server.

The
^[A-Z]{3,9}\ matches the GET, POST, HEAD, etc, part of the request.
Literal spaces have also been escaped.


Your next move is to spend a few days reading previous threads where all of this stuff has been posted in great detail many times before, and the Apache manual and related tutorials.

sweetguyzzz




msg:4305336
 12:31 pm on Apr 28, 2011 (gmt 0)

Very much thankful for this great help. Will see previous stuff for these details.

sweetguyzzz




msg:4305480
 4:31 pm on Apr 28, 2011 (gmt 0)

Hi again,

I am too much confused because what I am trying to get is some different which I talk above because some of variables changes made me to do this. I try very much but can not to the point. Please can you let me know the code for the following result.

1) If request is for .com/index.html or .com/index.php or .com/index.htm then redirect it to .com/
2) If request is for .com/12-category-name/ then redirect it to .com/12-category-name/index.html and basically it is getting data from .com/index.php?category_id=12&category_name=category-name
3) If .com/12-category-name/index-2.html then get data from .com/index.php?category_id=12&category_name=category-name&page_number=2
4) If .com/12-category-name/54-page-name.html then get data from .com/index.php?category_id=12&category_name=category-name&page_id=54&page_name=page-name
5) If .com/main/page-name.html then get data from .com/index.php?category_type=main&page_name=page-name
6) If .com/main/page-name.html?qry_strg_name=qry_strg_value then get data from .com/index.php?category_type=main&page_name=page-name&qry_strg_name=qry_strg_value
7) If .com/css/ or .com/scripts/ or .com/includes/ , these are all folders which are uploaded on our site so if any of these folders requested so to save our files we need it to redirect it to .com/css/index.html or .com/scripts/index.html and others.

Please help me to solve this problem,

I will be very thankful to you.

Thanks
sweetguyzzz

sweetguyzzz




msg:4305565
 7:05 pm on Apr 28, 2011 (gmt 0)

DirectoryIndex index.html index.php

Options +FollowSymLinks
RewriteEngine On


#Redirect .com/index.html .php .htm to .com/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(html?|php)\ HTTP/
RewriteRule ^(.*)$ http://www.example.com/ [R=301,L]


#Redirect .com/12-category-name/ to .com/12-category-name/index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([0-9]+)-([^/.]+)/\ HTTP/
RewriteRule ^(.*)/$ http://www.example.com/%1-%2/index.html [R=301,L]
#Rewrite .com/12-category-name/index.html to index.php?category_id=12&category_name=category-name
RewriteRule ^([0-9]+)-([^/.]+)/index\.html$ /index.php?category_id=$1&category_name=$2 [L]


#Rewrite .com/12-category-name/index-2.html to index.php?category_id=12&category_name=category-name&page_number=2
RewriteRule ^([0-9]+)-([^/.]+)/index-([0-9]+)\.html$ /index.php?category_id=$1&category_name=$2&page_number=$3 [L]


#Rewrite .com/12-category-name/54-page-name.html to index.php?category_id=12&category_name=category-name and further
RewriteRule ^([0-9]+)-([^/.]+)/([0-9]+)-([^/.]+)\.html$

/index.php?category_id=$1&category_name=$2&page_id=$3&page_name=$4 [L]


#Rewrite .com/main/page-name.html to index.php?category_name=main&page_name=page-name
RewriteCond %{QUERY_STRING} !.
RewriteRule ^(main)/([^/.]+)\.html$ /index.php?category_name=$1&page_name=$2 [L]


#Rewrite .com/main/page-name.html?any=any to index.php?category_name=main&page_name=page-name&any=any
RewriteCond %{QUERY_STRING} .
RewriteRule ^(main)/([^/.]+)\.html$ /index.php?category_name=$1&page_name=$2 [QSA,L]


#Redirect any .com/css/ to .com/css/index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)+\ HTTP/
RewriteRule ^([^/]+/)+$ http://www.example.com/$1index.html [R=301,L]


# Externally redirect to canonicalize the domain name if a non-canonical
# hostname is requested, in order to prevent duplicate-content problems
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


For getting above results, I create this above code. It is giving all results right but I am not sure that it is totally correct or is there some mistake of order in it.

Waiting for your reply.

Thanks
sweetguyzzz

g1smd




msg:4305596
 8:13 pm on Apr 28, 2011 (gmt 0)

Check the code we worked on earlier because this new code has many of the same errors that I thought we had already discussed and fixed.

The first two (.*) patterns are incorrect. They need to be changed.

The rules are in the wrong order. Redirects always go first.

Redirecting TO index.html will likely come back to haunt you as a big problem at some time in the future.

sweetguyzzz




msg:4305753
 6:30 am on Apr 29, 2011 (gmt 0)

Hi,

Ok, I understand your points, in new code I change those first two patterns,
I arrange rules in order like below:
1) THE_REQUEST (in it according to my needs)
2) HTTP_HOST
3) QUERY_STRING

But can you tell me about which index.html you are talking that it will create problem in future, about .com/12-category-name/index.html or .com/css/index.html ?

If it create problem in future so I want to change it, but I do not know the solution to both of them, if you can also tell the solution then it will be very much good.

Now I change the code and new code is as follows:

DirectoryIndex index.html index.php

Options +FollowSymLinks
RewriteEngine On


#Redirect .com/index.html .php .htm to .com/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(html?|php)\ HTTP/
RewriteRule ^index\.(html?|php)$ http://www.example.com/ [R=301,L]


#Redirect .com/12-category-name/ to .com/12-category-name/index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([0-9]+)-([^/.]+)/\ HTTP/
RewriteRule ^([0-9]+)-([^/.]+)/$ http://www.example.com/$1-$2/index.html [R=301,L]


#Redirect any .com/css/ to .com/css/index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)+\ HTTP/
RewriteRule ^([^/]+/)+$ http://www.example.com/$1index.html [R=301,L]


# Externally redirect to canonicalize the domain name if a non-canonical
# hostname is requested, in order to prevent duplicate-content problems
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


#Rewrite .com/main/page-name.html to index.php?category_name=main&page_name=page-name
RewriteCond %{QUERY_STRING} !.
RewriteRule ^(main)/([^/.]+)\.html$ /index.php?category_name=$1&page_name=$2 [L]


#Rewrite .com/main/page-name.html?any=any to index.php?category_name=main&page_name=page-name&any=any
RewriteCond %{QUERY_STRING} .
RewriteRule ^(main)/([^/.]+)\.html$ /index.php?category_name=$1&page_name=$2 [QSA,L]


#Rewrite .com/12-category-name/index.html to index.php?category_id=12&category_name=category-name
RewriteRule ^([0-9]+)-([^/.]+)/index\.html$ /index.php?category_id=$1&category_name=$2 [L]


#Rewrite .com/12-category-name/index-2.html to index.php?category_id=12&category_name=category-name&page_number=2
RewriteRule ^([0-9]+)-([^/.]+)/index-([0-9]+)\.html$ /index.php?category_id=$1&category_name=$2&page_number=$3 [L]


#Rewrite .com/12-category-name/54-page-name.html to index.php?category_id=12&category_name=category-name and further
RewriteRule ^([0-9]+)-([^/.]+)/([0-9]+)-([^/.]+)\.html$ /index.php?category_id=$1&category_name=$2&page_id=$3&page_name=$4 [L]


Can you please check this of my test, how much marks I got in it?

Waiting for my result.

Thanks
sweetguyzzz

g1smd




msg:4305774
 7:51 am on Apr 29, 2011 (gmt 0)

It looks OK, except that the rule for /css/ does not mention "css". It will redirect other folders too. Is that what you really want?

The next step is to thoroughly test the code, and watch the responses using the Live HTTP Headers extension for Firefox.

sweetguyzzz




msg:4305775
 7:53 am on Apr 29, 2011 (gmt 0)

Hi,

Yes, I want to redirect all folders with the request of / in end towards the index.html not only the CSS except those I define before this Redirect rule.

BUT you are saying that:
Redirecting TO index.html will likely come back to haunt you as a big problem at some time in the future.


Could you please clear this for me?

Thanks

g1smd




msg:4306002
 5:10 pm on Apr 29, 2011 (gmt 0)

Google usually prefers to index
www.example.com/folder/ and not www.example.com/folder/index.html.

Your redirect will clearly signal they need to do the reverse.

The danger comes if you ever change the name (as referenced in the URL) of the default document. In that case you'll need to change the existing redirect to point to that new document and you will also need an extra redirect from index.html to the new document. Searchengines may well be confused by that for a while.

Serving the content at
www.example.com/folder/ and redirecting all index document URLs to that canonical URL ending with a slash, alleviates that problem and all future problems.

What you have now will work OK. Just be aware of the limitations if you change things around in the future.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved