| 4:30 pm on May 10, 2010 (gmt 0)|
I checked one of my blogs and I don't see any pages that show up with a
search. So normally, I don't think it's an issue.
You could try to have a 301 redirect for any */trackback URL. You'll need to do that in your .htaccess or httpd.conf since, by default, it looks like Wordpress returns a "HTTP/1.1 302 Found" header when you click on a Trackback. I can't believe they haven't gotten this right yet!
The robots.txt solution will only get you so far. Just because a page doesn't get crawled, doesn't mean it won't potentially get indexed based on backlinks. You still might get indexed based on those and you still might get some dupe content problems because of the snippets that get thrown into the trackback.
Of course, you can't stop anyone else from using your trackback URL, so that's not a complete solution either. You can, however, remove it from your theme.
| 6:17 pm on May 10, 2010 (gmt 0)|
Ok. Is there any chance that Google may index the trackback URLs of my future posts after restricting the crawl of Googlebot with robots.txt?
| 6:54 pm on May 10, 2010 (gmt 0)|
I don't know as I would say that. As i mentioned, do not confuse "disallow" with "noindex". They are not the same thing. Google can and will index URLs even to pages that it is not allowed to crawl if that page has inbound links with anchor text relevant to the user's search.
I think it's rare to end up with trackback URLs indexed. I did a random survey of some popular blogs running WP and the vast majority have no results for the inurl:trackback search, but some do. I'm not sure why as some of the blogs I checked are definitely popular enough to have lots of inbound links and no doubt some that are to the /trackback URL.
When I look at one blog in particular, he has a lot of /trackback URLs indexed and when I find one of those and search for it using a URL keyword (so searching on site:example.com inurl:keyword, it returns *only* the trackback page. When you click on the link, Wordpress redirects and sends a "302 Found" (yes, no typo) header to the URL without the /trackback.
I would think that if most of your inbound links are to the /trackback page, you can help crawlers out with
1. The "canonical" meta tag (I'm sure there's a plugin for WP that does just that, but you can try Headspace2 which I think offers this along with dozens of other functions).
2. change your theme to not publish a trackback URL (look for trackback_url()) in your theme.
3. Make sure that /trackback URLs get 301 redirects at the server level before they get handled by WP with 302s.
| 9:50 pm on May 10, 2010 (gmt 0)|
Have a look at this [google.com...]
| 5:00 am on May 11, 2010 (gmt 0)|
Yup, that's what I noticed too. Some of the most popular blogs have the same problem as you. Surprising.
| 7:27 am on May 11, 2010 (gmt 0)|
Thanks ergophobe. Now if my blog is completely sealed by using robots.txt. I was wondering someother website is using my trackback URL. I only use two services, one is Disqus Comments (I disabled the trackback feature) and the other is Retweet via TweetMeMe. I think TweetMeMe don't send trackbacks to wordpress but TOPSY do.
"Checking Topsy, it seems to already be sending trackbacks to wordpress. So they should appear as normal trackbacks. However IDC has a tendency to send them in moderation"
Quoted from [getsatisfaction.com...]
| 7:43 am on May 11, 2010 (gmt 0)|
Watch for the source:
<form action="/trackback" method="get" id="filter-form">
<input type="hidden" name="url" value="http://example.com/example-page/"/>
<input type="text" name="contains" id="contains-input" value="" title="filter tweets"/>
<a href="#" id="filter-box-hide" class="filter-box-btn">Clear</a>
[edited by: ergophobe at 3:45 pm (utc) on May 11, 2010]
[edit reason] generalized example; formatting adjustment [/edit]
| 3:54 pm on May 11, 2010 (gmt 0)|
|some other website is using my trackback URL |
That's what I meant about the differences between controlling the crawl with robots.txt and controlling index with noindex. They are not equivalent.'
Looking at the Topsy example, it always seems to generate urls in the format
So I'm not seeing where Topsy itself is creating an inbound link to example.com/trackback
| 3:55 pm on May 11, 2010 (gmt 0)|
Are you signed up for Google Webmaster Tools? What does it tell you about your inbound links?
Have you checked your referrer logs to see if you have any IBLs to /trackback pages?
| 6:39 pm on May 11, 2010 (gmt 0)|
Yup em using webmaster tools. Well just noticed something really interesting this morning. As soon as i removed over 600+ 404 pages (last week) by using G-Webmaster Removal Tool, my indexed pages in Google count raised to 831. There are 840 URLs in my sitemap. Before removing these 404 pages, Google was showing only 781 indexed URLs.
Should i remove trackback restriction from my robots.txt? I will allow the crawler to see a 302 redirect to original post. I wonder Google will index my page if it will see a 302 redirect?
Moreover, i haven't pinged/tweeted my recent post. Lets see if Google indexes its trackback URL or not. I am sure its due tweetmeme or topsy.
PS: My blog traffic is improving dramatically. The duplicate content was really doing bad damage to my blog.
| 7:23 pm on May 11, 2010 (gmt 0)|
>>see a 302 redirect
It should be a 301
| 7:16 am on May 12, 2010 (gmt 0)|
Ops, Google indexed the trackback to new post again. It was never tweeted or pinged. Trackback indexing from my site is blocked via robots. Now the only possibility left is the feeds. I have also taken appropriate precautions to block Google from indexing my feeds from robots as well as some options in feedburner itself. As far as i think, some website is keeping track of my feed as soon as i publish a post, this website is copying my content and sending a trackback. I think FriendFeed is the first one to look for.
| 7:17 am on May 12, 2010 (gmt 0)|
I would also like to tell you that i started getting my trackbacks indexing by Google as soon as i installed Disqus comments on my blog.
| 9:59 pm on May 19, 2010 (gmt 0)|
[SOLVED] Disqus Comment System was the real culprit. 0 trackbacks indexed after removal of Disqus from my blog.
| 10:41 pm on May 19, 2010 (gmt 0)|
Thanks for following up with the confirmation