Welcome to WebmasterWorld Guest from 54.166.46.226

Forum Moderators: ergophobe

Message Too Old, No Replies

Wordpress Trackbacks Causing Duplicate Content in Google

SOLVED: Disqus causing trackback URLs to get indexed

   
9:38 am on May 10, 2010 (gmt 0)

5+ Year Member



Hi, i have a WordPress blog. Yesterday i came to know that Google is indexing trackbacks of my almost every post. For example

http://example.com/2010/05/post-title/trackback/

I am very worried about the situation as this is creating duplicate content issues. I have already installed "no-self-pings" plugin and have disabled the following two options in Discussions tab in wp-admin:
-->Attempt to notify any blogs linked to from the article (slows down posting.) [Disabled]
-->Allow link notifications from other blogs (pingbacks and trackbacks.) [Disabled]

I have also blocked trackbacks are feed URLs via my robots.txt

Disallow: */trackback
Disallow: */feed
Disallow: */comments

I am also using canonical URL for my each post :S

How can i completely disable this trackback and ping option in WordPress?
4:30 pm on May 10, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I checked one of my blogs and I don't see any pages that show up with a
"site:example.tld inurl:trackback"

search. So normally, I don't think it's an issue.

You could try to have a 301 redirect for any */trackback URL. You'll need to do that in your .htaccess or httpd.conf since, by default, it looks like Wordpress returns a "HTTP/1.1 302 Found" header when you click on a Trackback. I can't believe they haven't gotten this right yet!

The robots.txt solution will only get you so far. Just because a page doesn't get crawled, doesn't mean it won't potentially get indexed based on backlinks. You still might get indexed based on those and you still might get some dupe content problems because of the snippets that get thrown into the trackback.

Of course, you can't stop anyone else from using your trackback URL, so that's not a complete solution either. You can, however, remove it from your theme.
6:17 pm on May 10, 2010 (gmt 0)

5+ Year Member



Ok. Is there any chance that Google may index the trackback URLs of my future posts after restricting the crawl of Googlebot with robots.txt?
6:54 pm on May 10, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I don't know as I would say that. As i mentioned, do not confuse "disallow" with "noindex". They are not the same thing. Google can and will index URLs even to pages that it is not allowed to crawl if that page has inbound links with anchor text relevant to the user's search.

I think it's rare to end up with trackback URLs indexed. I did a random survey of some popular blogs running WP and the vast majority have no results for the inurl:trackback search, but some do. I'm not sure why as some of the blogs I checked are definitely popular enough to have lots of inbound links and no doubt some that are to the /trackback URL.

When I look at one blog in particular, he has a lot of /trackback URLs indexed and when I find one of those and search for it using a URL keyword (so searching on site:example.com inurl:keyword, it returns *only* the trackback page. When you click on the link, Wordpress redirects and sends a "302 Found" (yes, no typo) header to the URL without the /trackback.

I would think that if most of your inbound links are to the /trackback page, you can help crawlers out with

1. The "canonical" meta tag (I'm sure there's a plugin for WP that does just that, but you can try Headspace2 which I think offers this along with dozens of other functions).

2. change your theme to not publish a trackback URL (look for trackback_url()) in your theme.

3. Make sure that /trackback URLs get 301 redirects at the server level before they get handled by WP with 302s.
9:50 pm on May 10, 2010 (gmt 0)

5+ Year Member



Have a look at this [google.com...]
5:00 am on May 11, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yup, that's what I noticed too. Some of the most popular blogs have the same problem as you. Surprising.
7:27 am on May 11, 2010 (gmt 0)

5+ Year Member



Thanks ergophobe. Now if my blog is completely sealed by using robots.txt. I was wondering someother website is using my trackback URL. I only use two services, one is Disqus Comments (I disabled the trackback feature) and the other is Retweet via TweetMeMe. I think TweetMeMe don't send trackbacks to wordpress but TOPSY do.

"Checking Topsy, it seems to already be sending trackbacks to wordpress. So they should appear as normal trackbacks. However IDC has a tendency to send them in moderation"

Quoted from [getsatisfaction.com...]
7:43 am on May 11, 2010 (gmt 0)

5+ Year Member



[topsy.com...]

Watch for the source:

<form action="/trackback" method="get" id="filter-form">

<input type="hidden" name="url" value="http://example.com/example-page/"/>
<div class="text-field">
<input type="text" name="contains" id="contains-input" value="" title="filter tweets"/>
<a href="#" id="filter-box-hide" class="filter-box-btn">Clear</a>
</div>
</form>

[edited by: ergophobe at 3:45 pm (utc) on May 11, 2010]
[edit reason] generalized example; formatting adjustment [/edit]

3:54 pm on May 11, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



some other website is using my trackback URL


That's what I meant about the differences between controlling the crawl with robots.txt and controlling index with noindex. They are not equivalent.'

Looking at the Topsy example, it always seems to generate urls in the format

http://topsy.com/tb/example.com/blah/blah/trackpage=trackback

So I'm not seeing where Topsy itself is creating an inbound link to example.com/trackback
3:55 pm on May 11, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Are you signed up for Google Webmaster Tools? What does it tell you about your inbound links?

Have you checked your referrer logs to see if you have any IBLs to /trackback pages?
6:39 pm on May 11, 2010 (gmt 0)

5+ Year Member



Yup em using webmaster tools. Well just noticed something really interesting this morning. As soon as i removed over 600+ 404 pages (last week) by using G-Webmaster Removal Tool, my indexed pages in Google count raised to 831. There are 840 URLs in my sitemap. Before removing these 404 pages, Google was showing only 781 indexed URLs.

Should i remove trackback restriction from my robots.txt? I will allow the crawler to see a 302 redirect to original post. I wonder Google will index my page if it will see a 302 redirect?

Moreover, i haven't pinged/tweeted my recent post. Lets see if Google indexes its trackback URL or not. I am sure its due tweetmeme or topsy.

PS: My blog traffic is improving dramatically. The duplicate content was really doing bad damage to my blog.
7:23 pm on May 11, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>>see a 302 redirect

It should be a 301
7:16 am on May 12, 2010 (gmt 0)

5+ Year Member



Ops, Google indexed the trackback to new post again. It was never tweeted or pinged. Trackback indexing from my site is blocked via robots. Now the only possibility left is the feeds. I have also taken appropriate precautions to block Google from indexing my feeds from robots as well as some options in feedburner itself. As far as i think, some website is keeping track of my feed as soon as i publish a post, this website is copying my content and sending a trackback. I think FriendFeed is the first one to look for.
7:17 am on May 12, 2010 (gmt 0)

5+ Year Member



I would also like to tell you that i started getting my trackbacks indexing by Google as soon as i installed Disqus comments on my blog.
9:59 pm on May 19, 2010 (gmt 0)

5+ Year Member



[SOLVED] Disqus Comment System was the real culprit. 0 trackbacks indexed after removal of Disqus from my blog.
10:41 pm on May 19, 2010 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks for following up with the confirmation