|It took a few days to index those 600 pages ( due a bug when I upgraded joomla ) but you are telling me it can take years to remove that is terrible... |
Perhaps not years, but certainly much longer than it takes to index them.
|In other words, what is best to use, the disallow in the robots.txt, URL removal tool or URL Parameter or should I use the 3 of them at once to give myself the most chance and get the penalty I have removed as quickly as possible. |
If you want to remove these pages from Google index, you should either noindex these pages (see above from JD_Toims) or block them in robots.txt AND use URL Removal tool.
There have been several threads on the problem you had with your Joomla upgrade. Have you considered becoming supporter and putting your site for review in the Review my site [webmasterworld.com] forum? In that forum you can post URL to your site and you would probably get more focused responses.
To comment on some questions not related to the URL Parameter function, but to other questions I see you're asking....
|I currently have pages indexed in google with the following description : " A description for this result is not available because of this site's robots.txt – learn more. " is it because of the disallow I have |
These generally result when a url is disallowed by robots.txt, but (as JD_Toims mentioned) there are existing links to the url from pages that are accessible to Googlebot.
The urls could be urls generated by earlier versions of your CMS which had gotten indexed. I have no idea if this is the case. It would require someone familiar with your CMS to identify the patterns. If this is the case, though, I'm not sure how you would use meta robots noindex to remove them, as these are variants of "pages" that no longer exist, and there would be nowhere to place the meta robots tags. It seems to me that you should have the requests for such urls 301 redirected by the server to your current preferred "canonical" versions.
Note also, that you can't combine meta robots noindex and robots.txt. A robots.txt disallow would prevent Googlebot from spidering the url, either to discover the meta robots noindex on the page... or else to discover that the page is gone and the request returns a 404.
Again, this suggests to me that using 301 redirects on the server to a single canonical form might be the most efficient way of handling this... assuming that you can identify all of the patterns and likely url variants. If this is a problem that occurred widely during a Joomla upgrade, it's very likely that the patterns have been catalogued somewhere in the Joomla community.
PS: I'd be very wary about using the url removal tool.
|There have been several threads on the problem you had with your Joomla upgrade. Have you considered becoming supporter and putting your site for review |
At a minimum there's the Content Management [webmasterworld.com] forum on the free side. It's littered with joomla-related questions.
I get the impression-- based partly on information from outside this thread-- that the underlying problem has to do with the CMS returning valid pages when given invalid values for legitimate parameters. This can't be fixed in gwt; it's a combination of htaccess (for existing problems) and fixing the upgrade.
Thank you for your answer about the line of code to add but this issue is that I don't know which directory yhe issue is coming from because googlebot has surfed our FTP in a certain way and created pages that I think random ( I am sure it is not but it is impossible to figure out which way it surfed and why ), is it still possible to use your method ?
Can you give a example of what you would replace " the-path " and " to-the-directly " by on for example www.cnn.com so that I understand.
Header set X-Robots-Tag: "noindex"
This morning my URL Parameter is showing more URL monitored than a week ago when I decided to NO URL all my itemid ? does it mean google is finding new duplicate content pages ?
|Thank you for your answer about the line of code to add but this issue is that I don't know which directory the issue is coming from because googlebot has surfed our FTP in a certain way and created pages that I think random |
Personally, if it was for files/directories I was not using I would likely go with a negative match along the lines of !^/something/i-use/ where ! is *not* a match to the pattern I normally use. Hope that makes sense.
|googlebot has surfed our FTP in a certain way |
Can someone translate, please? :(
|<LocationMatch "^/the-path/to-the-directory/to-noindex"> |
Header set X-Robots-Tag: "noindex"
What the bleep? I thought all this was happening in htaccess on shared hosting.
The answer to this question:
"if selecting the NO URL in the URL Parameter will remove those"
Using the parameter tool doesn't effect what's already indexed.
Using robots.txt to Dissallow: /folder
We can then use removal tool to remove the directory from index...
As in "domain/folder/"
Same process for "domain/folder/pages", "domain/folder/pages/2" and so on
| This 38 message thread spans 2 pages: < < 38 ( 1  ) |