homepage Welcome to WebmasterWorld Guest from 54.166.122.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
Wordpress Trackbacks Causing Duplicate Content in Google
SOLVED: Disqus causing trackback URLs to get indexed
S0ha1L



 
Msg#: 4130044 posted 9:38 am on May 10, 2010 (gmt 0)

Hi, i have a WordPress blog. Yesterday i came to know that Google is indexing trackbacks of my almost every post. For example

http://example.com/2010/05/post-title/trackback/

I am very worried about the situation as this is creating duplicate content issues. I have already installed "no-self-pings" plugin and have disabled the following two options in Discussions tab in wp-admin:
-->Attempt to notify any blogs linked to from the article (slows down posting.) [Disabled]
-->Allow link notifications from other blogs (pingbacks and trackbacks.) [Disabled]

I have also blocked trackbacks are feed URLs via my robots.txt

Disallow: */trackback
Disallow: */feed
Disallow: */comments

I am also using canonical URL for my each post :S

How can i completely disable this trackback and ping option in WordPress?

 

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 4:30 pm on May 10, 2010 (gmt 0)

I checked one of my blogs and I don't see any pages that show up with a
"site:example.tld inurl:trackback"

search. So normally, I don't think it's an issue.

You could try to have a 301 redirect for any */trackback URL. You'll need to do that in your .htaccess or httpd.conf since, by default, it looks like Wordpress returns a "HTTP/1.1 302 Found" header when you click on a Trackback. I can't believe they haven't gotten this right yet!

The robots.txt solution will only get you so far. Just because a page doesn't get crawled, doesn't mean it won't potentially get indexed based on backlinks. You still might get indexed based on those and you still might get some dupe content problems because of the snippets that get thrown into the trackback.

Of course, you can't stop anyone else from using your trackback URL, so that's not a complete solution either. You can, however, remove it from your theme.

S0ha1L



 
Msg#: 4130044 posted 6:17 pm on May 10, 2010 (gmt 0)

Ok. Is there any chance that Google may index the trackback URLs of my future posts after restricting the crawl of Googlebot with robots.txt?

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 6:54 pm on May 10, 2010 (gmt 0)

I don't know as I would say that. As i mentioned, do not confuse "disallow" with "noindex". They are not the same thing. Google can and will index URLs even to pages that it is not allowed to crawl if that page has inbound links with anchor text relevant to the user's search.

I think it's rare to end up with trackback URLs indexed. I did a random survey of some popular blogs running WP and the vast majority have no results for the inurl:trackback search, but some do. I'm not sure why as some of the blogs I checked are definitely popular enough to have lots of inbound links and no doubt some that are to the /trackback URL.

When I look at one blog in particular, he has a lot of /trackback URLs indexed and when I find one of those and search for it using a URL keyword (so searching on site:example.com inurl:keyword, it returns *only* the trackback page. When you click on the link, Wordpress redirects and sends a "302 Found" (yes, no typo) header to the URL without the /trackback.

I would think that if most of your inbound links are to the /trackback page, you can help crawlers out with

1. The "canonical" meta tag (I'm sure there's a plugin for WP that does just that, but you can try Headspace2 which I think offers this along with dozens of other functions).

2. change your theme to not publish a trackback URL (look for trackback_url()) in your theme.

3. Make sure that /trackback URLs get 301 redirects at the server level before they get handled by WP with 302s.

S0ha1L



 
Msg#: 4130044 posted 9:50 pm on May 10, 2010 (gmt 0)

Have a look at this [google.com...]

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 5:00 am on May 11, 2010 (gmt 0)

Yup, that's what I noticed too. Some of the most popular blogs have the same problem as you. Surprising.

S0ha1L



 
Msg#: 4130044 posted 7:27 am on May 11, 2010 (gmt 0)

Thanks ergophobe. Now if my blog is completely sealed by using robots.txt. I was wondering someother website is using my trackback URL. I only use two services, one is Disqus Comments (I disabled the trackback feature) and the other is Retweet via TweetMeMe. I think TweetMeMe don't send trackbacks to wordpress but TOPSY do.

"Checking Topsy, it seems to already be sending trackbacks to wordpress. So they should appear as normal trackbacks. However IDC has a tendency to send them in moderation"

Quoted from [getsatisfaction.com...]

S0ha1L



 
Msg#: 4130044 posted 7:43 am on May 11, 2010 (gmt 0)

[topsy.com...]

Watch for the source:

<form action="/trackback" method="get" id="filter-form">

<input type="hidden" name="url" value="http://example.com/example-page/"/>
<div class="text-field">
<input type="text" name="contains" id="contains-input" value="" title="filter tweets"/>
<a href="#" id="filter-box-hide" class="filter-box-btn">Clear</a>
</div>
</form>

[edited by: ergophobe at 3:45 pm (utc) on May 11, 2010]
[edit reason] generalized example; formatting adjustment [/edit]

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 3:54 pm on May 11, 2010 (gmt 0)

some other website is using my trackback URL


That's what I meant about the differences between controlling the crawl with robots.txt and controlling index with noindex. They are not equivalent.'

Looking at the Topsy example, it always seems to generate urls in the format

http://topsy.com/tb/example.com/blah/blah/trackpage=trackback

So I'm not seeing where Topsy itself is creating an inbound link to example.com/trackback

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 3:55 pm on May 11, 2010 (gmt 0)

Are you signed up for Google Webmaster Tools? What does it tell you about your inbound links?

Have you checked your referrer logs to see if you have any IBLs to /trackback pages?

S0ha1L



 
Msg#: 4130044 posted 6:39 pm on May 11, 2010 (gmt 0)

Yup em using webmaster tools. Well just noticed something really interesting this morning. As soon as i removed over 600+ 404 pages (last week) by using G-Webmaster Removal Tool, my indexed pages in Google count raised to 831. There are 840 URLs in my sitemap. Before removing these 404 pages, Google was showing only 781 indexed URLs.

Should i remove trackback restriction from my robots.txt? I will allow the crawler to see a 302 redirect to original post. I wonder Google will index my page if it will see a 302 redirect?

Moreover, i haven't pinged/tweeted my recent post. Lets see if Google indexes its trackback URL or not. I am sure its due tweetmeme or topsy.

PS: My blog traffic is improving dramatically. The duplicate content was really doing bad damage to my blog.

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 7:23 pm on May 11, 2010 (gmt 0)

>>see a 302 redirect

It should be a 301

S0ha1L



 
Msg#: 4130044 posted 7:16 am on May 12, 2010 (gmt 0)

Ops, Google indexed the trackback to new post again. It was never tweeted or pinged. Trackback indexing from my site is blocked via robots. Now the only possibility left is the feeds. I have also taken appropriate precautions to block Google from indexing my feeds from robots as well as some options in feedburner itself. As far as i think, some website is keeping track of my feed as soon as i publish a post, this website is copying my content and sending a trackback. I think FriendFeed is the first one to look for.

S0ha1L



 
Msg#: 4130044 posted 7:17 am on May 12, 2010 (gmt 0)

I would also like to tell you that i started getting my trackbacks indexing by Google as soon as i installed Disqus comments on my blog.

S0ha1L



 
Msg#: 4130044 posted 9:59 pm on May 19, 2010 (gmt 0)

[SOLVED] Disqus Comment System was the real culprit. 0 trackbacks indexed after removal of Disqus from my blog.

ergophobe

WebmasterWorld Administrator ergophobe us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4130044 posted 10:41 pm on May 19, 2010 (gmt 0)

Thanks for following up with the confirmation

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved