Forum Moderators: Robert Charlton & goodroi
aakk9999: 301 would be a long job, i am not too much into coding and as you said there might be some other spelling or some other kind of mistakes and such URLs can pop out.
www.example.com/1428383-this-fantastic-product as then you can then deliberately post links like example.com/1428383 to Twitter and other places knowing that your site will automatically redirect the request to the correct URL.
In webmastertools, under crawl diagnostics, it lists something like 700 URLs blocked by Robots.txt, and if it is something that is being measured by google, I can't help but think that they are somehow using that information for something.
if it is something that is being measured by google, I can't help but think that they are somehow using that information for something.
For years, I have simply blocked the long query string URLs via the robots.txt file
the net result would be webmasters not blocking pages and letting google bot crawl zillions of extra pages.
Once googlebot finds a non-canonical version of a url, it will continue to crawl it forever
Googlebot crawls them now when listed in robots.txt. They don't index, but they crawl. I've never been fond of robots.txt because Google interprets the guidelines literally. I've seen sites show thousands, hundreds of thousands of URI only entries due to this crap.
If you are certain the application does not generate abnormal links then you're ok because you don't care what others are injecting or if external sites manipulate your site's links.
The only way to get dupe content is if your domain somehow generates or recreates the duplicated links or if it's prone to URL poisoning.
Googlebot follows the line directed at it, rather than the line directed at everyone.
You have made this bold statement in several recent threads. It isn't true. If Google requests a URL and it returns "200 OK" then it is fair game for indexing.
I 301 redirect to remove *every* unknown or un-needed parameter.
As long as you remove the problematic content on your site, you can ignore the external, spammy links to that content. Our algorithms understand that you can't control everything outside of your site, but we do expect that you could do that within your site
Google lists the parameters they’ve found in the URLs on your site
www.example.com/12345-this-cool-widget example.com/12345-this-cool-widget
example.com/12345-the-old-name
example.com/12345-an-old-typo
example.com/12345-some-random-junk
www.example.com/12345-this-coo
www.example.com/12345-this-cool-widget/a>
www.example.com/12345-this-cool-widget?random-appended-junk
www.example.com:80/12345-this-cool-widget, example.com/12345 example.com/this-coo a typo for example.com/this-cool-widget or example.com/this-cool-gadget? Without the record number, we'll never know. www.example.com/23456-exact-text-match is redirected to the canonical URL.