homepage Welcome to WebmasterWorld Guest from 54.227.222.235
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google Webmaster Tools Is Reporting Wordpress URL Redirects
Sgt_Kickaxe




msg:4466924
 10:09 pm on Jun 18, 2012 (gmt 0)

I have been combining and redirecting pages in order to alleviate some Panda effects on a small subset of my pages and have, apparently, stumbled onto a problem with a wordpress feature that I'm not sure how to best resolve.

If you didn't know, Wordpress attempts to find the correct url when someone makes a mistake and it automatically redirects visitors to what it believes to be the correct url. This isn't a problem in general, it does a good job, but when I am redirecting the CORRECT url manually GWT is getting confused with the leftover non-existant(never existed) ones and is reporting them as 404.

To visualize: an example of a CORRECT url

example.com/my-really-cool-and-super-useful-page

Example of urls that wordpress will automatically redirect via 301 to the above

example.com/my-really-cool-and-super-useful
example.com/my-really-cool-and-super
example.com/my-really-cool-and
example.com/my-really-cool

The problem
GWT does NOT report the extra versions of the same page because they redirect and resolve properly. When you redirect that page manually however GWT is left with a bunch of shortened URL versions that wordpress is NO LONGER automatically redirecting. I don't know if Google is automatically shortening urls to see if they resolve but if given anything but a 404 code it's reasonable to assume Google records data about it, even if it never existed(because it's just a wordpress quirk).

Some concerns
- What, if anything, are all these extra redirected url versions doing to my rankings? To my Panda scores? etc
- What, if anything, can I do to get Google OR Wordpress to knock it off?
- Does Google think I have a ton of 301 pages because it checks for shortened url versions?

 

legaleagle




msg:4466930
 10:31 pm on Jun 18, 2012 (gmt 0)

Lots of cms systems have redirects built in for shortened urls and e-commerce systems as well. I wouldn't stress it, as long as you are not linking to the shortened urls I don't see why google would spider them in the first place

netmeg




msg:4466933
 10:46 pm on Jun 18, 2012 (gmt 0)

I dunno; I have a boatload of WP sites and I haven't had any problems with this type of redirect.

g1smd




msg:4466934
 10:57 pm on Jun 18, 2012 (gmt 0)

I'm trying to figure something vaguely similar out on a CMS (not WordPress though) driven site.

Site has a couple of thousand products. Redesigned site had URL scheme changed from multiple parameters including product ID, category details and parent category details to simple extensionless URL with product ID and product name.

In the old URLs only the ID parameter decided which content was displayed, and the other parameter values went mostly unchecked (maybe altering just one or two words on the page or in the metadata) leading to infinite duplicate content on the old site. The other parameters could contain random values and the page would still display. Old URLs now redirect to new format via a rewrite to a special PHP script hooked up to the database to check which old product IDs are still valid and where they redirect to. Removed products now return 410. Invalid IDs now return 404.

WMT reports and analytics data for the last few years indicated there were a number of duplicate content URLs for each product, in a variety of different formats. These all now redirect.

It's become apparent from looking at the logs since the new site launched that each product has 40 or 50 or more (sometimes a LOT more) old format URLs that now redirect to a single new URL per product. After a few weeks of spidering the new site eating a massive number of redirects, Googlebot crawling suddenly flatlined to a couple of hundred URLs per day. Google still requests large numbers of redirected URLs as well as hitting the odd old URL that returns 404 or 410 and just a few of the new URLs.

I'm wondering if the large number of redirects has triggered some sort of crawl budget limiter. I have not seen this behaviour before, but then again I've never seen a site with quite such a mess of a (previous) URL structure as this one.

Sgt_Kickaxe




msg:4466965
 12:36 am on Jun 19, 2012 (gmt 0)

I wouldn't stress it, as long as you are not linking to the shortened urls I don't see why google would spider them in the first place


I wasn't stressing it, until I made some manual redirects and have a boat load of shortened urls being reported in GWT as 404, meaning Google had crawled them even though I never linked to them. It's not until I changed things that they showed up and if I had never made any manual changes I would never have known about all the shortened urls Google knows about.

After doing some digging: adding this to the sites functions.php file wordpress stops redirecting shortened versions of urls.

remove_filter('template_redirect', 'redirect_canonical');

g1smd




msg:4467100
 5:57 am on Jun 19, 2012 (gmt 0)

Does WMT report where the shortened URLs are supposedly linked from?

I see a number of truncated URLs and URLs with appended random junk appearing in the reports for a number of different sites. These links are usually from junk sites using scraped content. Sometimes they aren't even links on the other site but plain text that looks like a URL and Google has decided to request that to see if it works.

phranque




msg:4467107
 6:22 am on Jun 19, 2012 (gmt 0)

what exactly is a "manual redirect"?

have you checked your server access logs to see if you are getting any referred traffic for the truncated urls?

lucy24




msg:4467112
 6:31 am on Jun 19, 2012 (gmt 0)

Obvious question: Is this absolutely and definitely unrelated to mod_speling?

Sgt_Kickaxe




msg:4467245
 4:14 pm on Jun 19, 2012 (gmt 0)


Obvious question: Is this absolutely and definitely unrelated to mod_speling?

It has nothing to do with mod_spelling. Wordpress has a canonical function that redirects pages to the correct url if you, for example, try to visit a shortened version or version with missing words/parameters. It does this quick check before returning a 404 error code and if it finds a likely suitor it redirects via 301 instead of serving the 404. The result is a lot of redirects for pages that have never and will never exist.

what exactly is a "manual redirect"?

When you assign an old url a 301 redirect code and point it to a new url. I do this manually via htaccess though there are some popular plugins that can do this for you.

Does WMT report where the shortened URLs are supposedly linked from?

No, it doesn't show where the url was linked from OR that it ever appeared in a sitemap file.

Added information
: Many of the urls being reported in GWT are EXACTLY the same as the old ones with .. added to the end. I don't know why Google would be testing urls with a double period at the end of the url, or any other characters, but they are in GWT so...

I'd like to think none of this has any bearing on serps but from an SEO standpoint I don't want the extra 301's, and resulting 404's when I make changes, on pages that never existed.

Sgt_Kickaxe




msg:4467266
 4:59 pm on Jun 19, 2012 (gmt 0)

Update

I decided not to use the filter shown above and to allow wordpress to act like wordpress despite creating a lot of 301's on my behalf. If a visitor lands on the right page it's beneficial, imo.

I dealt with the trailing dots with htaccess, is there a more efficient way?
RewriteRule ^(.+)\.$ http://www.example.com/$1 [R=301,L]

I'm still uneasy about the sheer volume of 404's being reported in GWT, and 301's I don't even know exist not being reported, but as long as the user lands on the right page I can live with it.

phranque




msg:4467307
 6:21 pm on Jun 19, 2012 (gmt 0)

that RewriteRule pattern should be designed to handle an arbitrary number of trailing dots - as is it looks like 2-301s to solve your trailing '..' problem.

phranque




msg:4467309
 6:25 pm on Jun 19, 2012 (gmt 0)

i would also be looking at page source for extraneous dots in href parameter values or any obvious (internal) url citations.

have you done a site crawl to see if you are internally linking to those urls that are getting redirected?

g1smd




msg:4467321
 7:13 pm on Jun 19, 2012 (gmt 0)

RewriteRule ^(.+)\.$ http://www.example.com/$1 [R=301,L]

The RegEx pattern reads to the end of the requested URL, then has to back off one step and retry. The redirect is issued and one trailing period is stripped.

For multiple periods there will be multiple chained redirects. This is a disaster.


I would do something completely different.
This RewriteRule goes near the beginning of the htaccess file:
RewriteRule \.$ /fixer.php [L]

In your non-www/www redirect add this line:
RewriteCond %{REQUEST_URI} !^/fixer\.php
otherwise you will expose "fixer.php" as a new URL when there is a non-www request with trailing periods.

In fixer.php detect the requested URL, clean it up using preg_match and preg_replace. Finally use the HEADER directive to send the 301 status and the new location including protocol and domain name.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved