homepage Welcome to WebmasterWorld Guest from 50.19.169.37
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
SEO-Friendly URLs Killed My Site - But I Overcame
wesmaster




msg:4321885
 9:01 am on Jun 4, 2011 (gmt 0)

I wanted to share a success I had with getting Google to reindex my SEO-friendly URLs that they didn't seem to like when I created them 2 years ago:

I run a website that has 4 distinct categories, Movies, Books, Music, and Websites. Two years ago I decided to make my URLs SEO-friendly for the Website portion of my site. This section of the site lists websites that are similar to other websites. My URLs originally were www.example.com/view-sites.php?id=1000. I changed them to www.example.com/view-sites/domain.com. I 301'd the pages and modified my XML sitemaps, etc.

Google quickly REMOVED almost all of my indexed pages that I had modified the URL of. For some reason, though, Google did keep the index of a few hundred pages with an extended querystring versions of the URLs. So for some websites Google listed www.example.com/view-sites.php?id=1000&param2=2000 but never listed the fancy URL or the original URL. They deindexed probably 98% of the URLs for this section of the website.

I posted on this forum when this happened and most people said that I just needed to wait it out. 6 months later and no improvement. Traffic had dropped dramatically and I submitted a reinclusion request every few months. I never got a response. I also used the Sitemaps settings to "block" some of the querystring options that Google was listing in the version of the URL they were showing, hoping that they would instead list the friendly URL. But nothing was successful in getting Google to list the fancy version of the URL, or to relist the thousands of website pages that they had deindexed. I tried a few on-page changes that might have caused me to be filtered, like removing outbound links that could be to bad neighborhoods, etc, but nothing I did ever made a difference.

It was over 2 years since I made the first URL change. I had given up, but recently I decided that making a big URL change wouldn't hurt me, since I had very little traffic to this section of the website, and just might get Google to index the pages. I decided to completely change the URL and hope that Google would re-index the pages. I took advantage of this time to make the URLs what I wished I had made them the first time around. Previously my SEO-friendly URLs did not end in a forward slash. Other sites with similar content were formatted this way, with success, but I decided since I'm trying to get Google to completely change its mind on what it thinks of my URLs I added the slash to the new URLs (with 301 redirects if the slash is not added). I of course 301'd the old URLs to the new URLs.

So now the URLs are www.example.com/website/domain.com/.

Immediately Google began to index the new URLs. It's only been a few weeks but now we have 15K URLs indexed that had disappeared for 2 years!

I just wanted to share my final success in case anyone else has had a similar problem.

 

Robert Charlton




msg:4322068
 8:35 pm on Jun 4, 2011 (gmt 0)

wesmaster - Congratulations, and thanks for sharing.

For those who have been following along on your adventure and might still be trying to sort it out, here are your earlier posts on the topic, starting at the beginning, back in August, 2009....

Rewritten URLs removed from Google
http://www.webmasterworld.com/google/3968906.htm [webmasterworld.com]

Google Won't Index My Rewritten URLs
http://www.webmasterworld.com/google/4102593.htm [webmasterworld.com]

Sub-page Ranks Best for Domain Search
http://www.webmasterworld.com/google/4106826.htm [webmasterworld.com]

I'm sure there are still some unanswered questions about what the problem might have been.

brinked




msg:4322072
 9:03 pm on Jun 4, 2011 (gmt 0)

I find your situation fascinating.

If I understand correctly, you originally changed your URL's to:

www.example.com/view-sites/domain.com

and google did not index them at all until you changed them again to:

www.example.com/website/domain.com/

I have read that having a forward slash at the end of URL's is best practice however I do see URL's without the forward slash at the end ranking all the time.

Perhaps google was viewing it that you were trying to stuff un-needed keywords in the URL with view-sites vs. websites ?

Are there any other variables in play here? Anything you may have done differently this time than the first attempt?

Congrats on your recovery...it really is a wonderful feeling.

g1smd




msg:4322080
 9:23 pm on Jun 4, 2011 (gmt 0)

I can only assume the previous version had some technical errors in the implementation. There's not one correct way to do this stuff, but there are many wrong ways!

As for URL formats, note this from the HTTP specifications:

A URL ending in a trailing slash is for a folder or for the index page in a folder.

The URL for a page should not end in a trailing slash, and it may or may not have an optional extension.

URLs for images, stylesheets and files (ZIP, PDF, etc) and so on usually require an extension.

wesmaster




msg:4322123
 2:53 am on Jun 5, 2011 (gmt 0)

Robert, thanks for the links.

G1smd, your assumption is wrong, unfortunately. There was no technical error or issue with the previous implementation. Every other search engine correctly indexed the new rewritten URLs, although the traffic from them is obviously not worth discussing. G sitemaps never reported any problems, etc. I run/own more than 10 websites and half of them are URL rewritten. This, being one of our first websites - developed back in 2006, wasn't rewritten from the start. But the implementation of the rewrite was definitely not flawed.

I did think that G might have thought the pages were over SEO'd once the URLs became friendly. I did, over time, lessen some of the on-page SEO (nothing that was in violation) to see if that made a difference. It didn't.

Regardless, I think my point is that if all else fails, a "big" change might actually be worth trying even though it's believed that letting G sort out the issue is best.

One thing I forgot to mention, I made the URL change after my last reinclusion request actually got some sort of response saying that there was no manual filter of the pages.

deadsea




msg:4322159
 8:59 am on Jun 5, 2011 (gmt 0)

Clearly Google doesn't like urls ending in .com. I have a theory.

.com files are windows executable script files. Google wouldn't want to index files that are a easy attack vector for hacking windows computers. Internet explorer is famous for ignoring the mime type in favor of the extension on a url. Maybe Google filters out such urls for the safety of windows IE users. If they didn't, you could hack together a web page with a .com extension and a text/html mime type that would still look enough like a script that IE would execute it in some case.

This theory isn't as good if you have .net and .org website pages that also got no traffic. If you also had .net and .org website pages that got no traffic, Google would have to be filtering all unknown extensions.

I've played around with the Google appliance boxes that they sell for intranet search. They have a pared down version of the Google algorithm on them that you can use to make your company's intranet searchable. The configuration for those boxes certainly have options to filter by file extension instead of just by mime-type.

martinibuster




msg:4322161
 9:16 am on Jun 5, 2011 (gmt 0)

As I read your post I also thought along the same lines as deadsea, that the .com file name was the issue, and that the WHY was the matter for speculation.

wesmaster




msg:4322163
 9:26 am on Jun 5, 2011 (gmt 0)

I considered that early on, but I found my competitors ranked well with their URLs ending in ".com", with the exact same format (domain/page/sitename). In fact I even researched some or the prominent websites that do nothing but list whois-type information on domains and many of them had the same format. I ruled that out early on based on finding tons of rewritten URLs ending in .com that performed well.

g1smd




msg:4322169
 9:52 am on Jun 5, 2011 (gmt 0)

The .com filename certainly was an issue in the past. There was a write up on seo moz about certain file extensions causing problems some 2 or 3 years ago. There was also a post saying the problem was being fixed.

Robert Charlton




msg:4322321
 11:34 pm on Jun 5, 2011 (gmt 0)

There's a Matt Cutts 2008 blog post, prompted by an excellent bit of research by Jane Copland on the seomoz blog [seomoz.org...] ...that covers Google's avoidance of some filename extensions....

Don't end your urls with .exe
http://www.mattcutts.com/blog/dont-end-your-urls-with-exe/ [mattcutts.com]

The seomoz post observed that Google wasn't indexing pages with filenames ending in the number zero. As Matt noted, Google avoids crawling pages with file extensions "that are mostly binary data, such as .exe", and suggests using the filetype: query to check what Google will crawl and index....

There's a simple way to check whether Google will crawl things with a certain filetype extension. If you do a query such as [filetype:exe] and you don't see any urls that end directly in ".exe" then that means either 1) there are no such files on the web, which we know isn't true for .exe, or 2) Google chooses not to crawl such pages at this time usually because pages with that file extension have been unusually useless in the past. So for example, if you query for [filetype:tgz] or [filetype:tar], you'll see urls such as "papers.ssrn.com/pape.tar?abstract_id" that contain ".tar" but no files that end directly in .tar. That means that you probably shouldn't make your html pages end in .tar.

The SEOmoz folks stumbled across this when they had a url that ended with "/web2.0" . It looks like previously they had a url looked like "/web2.0/" (note the trailing slash), which we were happy to crawl/index/rank. But when their linkage shifted enough that "/web2.0" became their preferred url, Google wouldn't crawl urls ending in ".0", so the page became uncrawled.

The above examples do explain the extended querystring versions of urls that wesmaster noted.

Matt also mentions that Google decided to change this particular situation and to revisit similar crawling preferences....

Google is willing to revisit old decisions and test them again, which is what we're doing with the ".0" filetype extension.

It's possible that Google's avoidance of binary extensions may have been involved in wesmaster's original situation. Currently, though, files ending in ".com" are common in Google. There are roughly 120-million results reported to satisfy the query [filetype:com].

So it could be a question of timing... that Google has made changes, but that wesmaster's changed fixed his problem before Google changed things... or it may be that wesmaster's changes fixed some errors in his code.

I'm curious about feedback on timing, and whether wesmaster was seeing other pages like his ending with .com in the serps before he made his last changes. There are clearly plenty of .com page url extensions now. A date for when wesmaster's changes went live would help track this down.

With regard to the trailing slash as a general practice....

A URL ending in a trailing slash is for a folder or for the index page in a folder.

The URL for a page should not end in a trailing slash, and it may or may not have an optional extension.

I completely agree with this, but it's worth noting that it's been common practice for some blog software to do it the other way... ie, to end page urls with trailing slashes. I feel it can lead to problems, but it's often what's done.

g1smd




msg:4322326
 11:58 pm on Jun 5, 2011 (gmt 0)

Ah, yes, it was exe rather than com, but interesting circumstances none the less.

deadsea




msg:4322452
 12:07 pm on Jun 6, 2011 (gmt 0)

I looked through the top 30 results on Google for
allinurl:webmasterworld.com

About 10 of them are webmasterworld.com itself
About 15 of them are review sites have pages for webmasterworld.com with the ".com" extension (no trailing slash).

Clearly ending urls in ".com" can work. Either it is something else about the implementation, or ending pages in ".com" is something that Google allows for some (but not all) sites.

wesmaster




msg:4322583
 7:15 pm on Jun 6, 2011 (gmt 0)

Indexed pages, as reported in G Sitemaps, is up to 38K. 75K total pages submitted in sitemaps for this category. Now I wonder if G will ever restore my ranking for these pages. My site often showed up right under the target site's listing when you Google'd their domain name. Only time will tell. So far traffic increases have been minor.

martinibuster




msg:4322699
 12:40 am on Jun 7, 2011 (gmt 0)

I looked through the top 30 results on Google for
allinurl:webmasterworld.com


allinurl searches, like the backlink searches, don't really tell you what is going on with the actual SERPs (backlink searches are samples and not representative of what is powering a ranking). ;)

It used to be a popular sport to ask why a site ranks so well for allinurl, allintitle etc. but not in the SERPs. Well, that's why. Those searches are not directly relevant to the SERPs.

brinked




msg:4325280
 8:54 pm on Jun 12, 2011 (gmt 0)

I just want to report back on this. When I read this thread I took a particular interest in it because about 2 months ago I changed my URL structure in a very similar fashion that wemaster has. My new URL's were something like www.widgets.com/widget/details/widget-name-here and previously I was ranking top ten for many of these pages, but since doing the new URL structure I was nowhere to be found, not even in the top 10 pages.

I added a forward slash to the end of the URL's like wesmaster has and today I am starting to recover for these URL's. Granted they are still being crawled but today I have seen a recovery for a lot of these terms being on page 3 and I expect them to move up in the weeks to come as well.

A big thank you to wesmaster for sharing his experience.

g1smd




msg:4325297
 10:23 pm on Jun 12, 2011 (gmt 0)

I added a forward slash to the end of the URLs like wesmaster has and today I am starting to recover for these URL's.

I wouldn't want WebmasterWorld to be the creator of an urban myth, so unless there's some crazy bug in Google's system, this is merely coincidence. :)

brinked




msg:4325298
 10:28 pm on Jun 12, 2011 (gmt 0)

g1smd,

I dont think a coincidence can explain it away. I changed the URL's over 2 months ago and they all never ranked. After I made the change and added the trailing forward slash to the end of the URL as soon as these URL's were crawled and cached in google, they start ranking again.

There are likely other factors at play here, but this is hardly a coincidence.

walkman




msg:4325299
 10:56 pm on Jun 12, 2011 (gmt 0)

I dont think a coincidence can explain it away. I changed the URL's over 2 months ago and they all never ranked. After I made the change and added the trailing forward slash to the end of the URL as soon as these URL's were crawled and cached in google, they start ranking again.

There are likely other factors at play here, but this is hardly a coincidence.

since you're sure I'm going to speculate that maybe Google included some code to stop proxies and site review /what's-this-site-worth types use the /site.com as URL? The algo must have lots and lots of lines by now and some may no longer make sense and can trap others.

I know you Brinked don't like speculation, or "too much speculating and theorizing" [webmasterworld.com...] , but that's all I can offer.

anand84




msg:4325873
 12:18 pm on Jun 14, 2011 (gmt 0)

I'm a bit naive in this area. But could someone explain if this "theory" should also hold true - if it indeed does - for URLs ending with extensions like .php and .html?

wesmaster




msg:4326137
 9:10 pm on Jun 14, 2011 (gmt 0)

I don't know what to attribute this to, but, G Sitemaps is reporting a higher percentage of indexed pages than I EVER had before, even with querystring URLs. Of the 75,657 submitted URLs in this section of the website 72,086 are index. Previously G would ignore about 40-50% of the URLs for some reason.

brinked




msg:4331076
 9:04 pm on Jun 25, 2011 (gmt 0)

I just want to report back that all new 301'd URL's have recovered and are now ranking to pretty much the exact same positions as before.

It all started happening as soon as I added a forward slash to the end of the URL's. Coincidence or not, it worked for me and that is all I care about.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved