Forum Moderators: phranque

Message Too Old, No Replies

url not exist

still crawling by google

         

experienced

8:32 am on Aug 13, 2007 (gmt 0)

10+ Year Member



i am using mod rewrite in one of my website where by i have 25 directory and 1 pages inside all of them. everything is written. problem is google is crawling the pages does not exists. correct exp

domain.com/folder1/subfolder1/index.html
domain.com/folder2/subfolder2/index.html

but google got url like

domain.com/folder1/subfolder1/folder2/index.html
domain.com/folder1/subfolder1/folder2/subfolder2/index.html which is totally wrong and does not exist. and because everything is in mod rewrite, page open even after no page exist. i have only 50 pages in the site and google count reach till 147 pages indexed.

any help will be appreciated.

Although i have set a custom error page 404 pages but that does not work for these urls that works only on the file name not folder.

exp.

domain.com/folder1/subfolder1/folder2/ind11ex.html - error page
domain.com/folder1/subfolder1/folder2/su22bfolder2/index.html page opens.

experienced

11:28 am on Aug 13, 2007 (gmt 0)

10+ Year Member



anybody Pls..

jdMorgan

1:26 pm on Aug 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



  • i am using mod rewrite
  • google is crawling pages that do not exist
  • because everything is in mod rewrite, a page will open even when no page exists (More correctly, no *file* exists, but the page (the URL) does indeed exist, because it opens)
  • i have set a custom error page 404 pages but that works only on the file name not folder

    Looking at this list, I'd say you have a bug in your mod_rewrite code... But you did not post the code.

    Jim

  • experienced

    4:53 am on Aug 14, 2007 (gmt 0)

    10+ Year Member



    here is the code. I am not sre that i sud paste all this here or not. repeating the problem again like when i type a wrong folder name url opens but there is no as such file present. exaple wrong url

    correct
    domain.com/folder1/index.html
    domain.com/folder2/index.html
    domain.com/folder3/index.html

    wrong but page opens
    domain.com/folderrr1/index.html
    domain.com/folderrr2/index.html
    domain.com/folderrr3/index.html
    domain.com/folderrr1/folder123/index.html
    domain.com/folderrr1/folder374/index.html

    RewriteEngine on

    RewriteRule ^index.html$ index.php [L]
    RewriteRule ^aboutus.html$ aboutus.php [L]
    RewriteRule ^write-us.html$ write-us.php [L]
    RewriteRule ^list-your-business.html$ list-your-business.php [L]
    RewriteRule ^info.html$ info.php [L]
    RewriteRule ^advertiser-center.html$ advertiser-center.php [L]
    RewriteRule ^advertiser-login.html$ advertiser-login.php [L]
    RewriteRule ^featured-listings.html$ featured-listings.php [L]
    RewriteRule ^news.html$ news.php [L]

    RewriteRule ^toc.html$ toc.php [L]

    RewriteRule ^disclaimer.html$ disclaimer.php [L]

    RewriteRule ^sitemap.html$ sitemap.php [L]

    RewriteRule ^([a-z-]+)/index.html$ categories.php?cat_name=$1 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/index.html$ innercat1.php?cat_name=$1&innercat1_name=$2 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/([a-z-]+)/index.html$ innercat2.php?cat_name=$1&innercat1_name=$2&innercat2_name=$3 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/([a-z-]+)/([a-z-]+)/index.html$ finalcat.php?cat_name=$1&innercat1_name=$2&innercat2_name=$3&finalcat_name=$4 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/more([0-9]+)(.html)$ innercat1.php?cat_name=$1&innercat1_name=$2&start=$3 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/([a-z-]+)/more([0-9]+)(.html)$ innercat2.php?cat_name=$1&innercat1_name=$2&innercat2_name=$3&start=$4 [L]

    RewriteRule ^([a-z-]+)/([a-z-]+)/([a-z-]+)/([a-z-]+)/more([0-9]+)(.html)$ finalcat.php?cat_name=$1&innercat1_name=$2&innercat2_name=$3&finalcat_name=$4&start=$5 [L]

    jdMorgan

    5:01 am on Aug 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I am not sure how literally to interpret your "folderrr1" in this URL:

    domain.com/folderrr1/index.html

    Does it actually contain the number "1"?

    If not, then this rule will be applied:

    RewriteRule ^([a-z-]+)/index.html$ categories.php?cat_name=$1 [L]

    and the request will be rewritten to categories.php, with cat_name=folderrr1

    The same is true for the other 'wrong' URLs as well.

    If all 'wrong' URLs are rewritten to your script, then your script must decide if the cat_name is valid, and if not, then the script must return a 410-Gone or 404-Not Found response.

    Jim

    experienced

    5:24 am on Aug 14, 2007 (gmt 0)

    10+ Year Member



    basically this folder1 is just an example i have all the folder category names like
    domain.com/computers/index.html
    domain.com/real-estate/index.html
    domain.com/internet/index.html

    and if i try to open this url domain.com/real-estate(any character)/index.html it gets open instead of returning an 404 error. and critical problem is that this open is loading the main page of the website with all the category inside this real estate foler

    domain.com/real-estate-junk-text/computers/index.html
    domain.com/real-estate-junk-text/internet/index.html
    domain.com/real-estate-junk-text/education/index.html

    wud be helpful if you can guide.

    jdMorgan

    5:35 am on Aug 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The rewriterules are rewriting these URLs to your scripts, so in effect they all 'exist'.

    Therefore, I would suggest modifying your scripts to fully-validate the parameters passed to them by the URL rewrites and return correct error responses if the catgories, innercategories, finalcategories, etc. are not valid (that is, they are not found in the database).

    Jim

    experienced

    5:51 am on Aug 14, 2007 (gmt 0)

    10+ Year Member



    thanks a lot for your advise :-)