Tracking variable showing up

Forum Moderators: open

Message Too Old, No Replies

Tracking variable showing up

for homepage searches!

webpundit

6:38 pm on Jul 16, 2004 (gmt 0)

Our domain is keyword.com and usually the first result for "keyword" searches in Google would put our domain at the top. However, I just noticed that Google was now showing up the first result as keyword.com?source=Magazine instead of just keyword.com. Naturally this is terrible for tracking and payout purposes. Why is this happening?

Please help!

webpundit

12:45 pm on Jul 19, 2004 (gmt 0)

*bump*

Any insights?

my3cents

4:46 pm on Jul 19, 2004 (gmt 0)

Since the end of May I have had my index page in the index several times with tracking urls and dynamic string that they opbviously picked up by spidering and indexing results from other search engines.

Not sure why but I wish they would stop, all of them show the same title and description. This causes two problems for me:

1. screws up my roi tracking
2. duplicate content, main page in index 6 times, but there is only one page.

webpundit

5:24 pm on Jul 19, 2004 (gmt 0)

Exactly, my3cents. That sums it up beautifully. I wish we could get some more information on this from somewhere. I haven't found any yet. Any theories on why this may be happening?

jam13

6:06 pm on Jul 19, 2004 (gmt 0)

We had this problem last year with our adwords tracking codes somehow getting picked up by Google (never did find out where from).

Our solution to prevent new tracking codes being picked up was to put all inbound tracking links through a script which extracts the tracking code and then 301's the requestor to a clean URL. So far this has seemed to work OK and we haven't seen any new codes appearing in Google.

Unfortunately once the pages _are_ in the index it seems to be very hard to get them out again. The only way I could think of (aside from asking Google nicely to do it for you) was to detect requests for pages using the old tracking methods and redirect them to a clean URL in the same way as the tracking script. The problem is that you need to do this for _all_ the possible landing pages :(

If anyone has a better solution I'd like to hear it because we still have pages in the index that use tracking codes which we stopped using 6 months ago!

my3cents

6:34 pm on Jul 19, 2004 (gmt 0)

jam, I actually 301'd some of the ones that were in the index, but with every update or new version of the index, I keep finding more, some are not even tracking codes we use, but are dynamic strings that are results from some other search engine.

Any idea how I can protect from google indexing these?

also, I see google indexing incorrect urls, some with spaces, some totally wrong urls like: www.domain.com/%1Fnmasbdkjabsdlkvjb%20slkhs876d98fyts

all of my pages are static html, so I have no idea how this stuff is getting in?

jam13

7:59 am on Jul 22, 2004 (gmt 0)

Any idea how I can protect from google indexing these?

The only protection I know of is to make sure that if those pages are requested from your server by Google, the response is either a 404 (if the page is actually invalid) or to 301 them to a clean URL that you don't mind appearing in the SERPs. Not sure how other SEs handle 301s though.

also, I see google indexing incorrect urls, some with spaces, some totally wrong urls like: www.domain.com/%1Fnmasbdkjabsdlkvjb%20slkhs876d98fyts
all of my pages are static html, so I have no idea how this stuff is getting in?

Google sometimes shows spaces in the SERPs URLs even when there are none in the link - my guess is that this is protection from screen scraping.

As to the totally wrong URLs - I can't explain that one.

My suspicion is that these bad URLs that are being picked up by google are down to the increase in so called "directories" that are filled with affiliate and PPC links and scrape content from search engines. The scraping seems to be a bit hit and miss sometimes and things get mangled.

AthlonInside

9:06 pm on Jul 22, 2004 (gmt 0)

it takes less than a day to remove the page, but do it at your own risk because you might lost your position if your original page (without tracking) is not index or google does not favour it.

Put a robots.txt to block the page with tracking code or other ways explained here
[google.com...]

Then go to
[services.google.com:8882...]

I have removed many pages before when google crawled a lot of useless redirect scripts. It took around 1 day.

jam13

9:21 am on Jul 24, 2004 (gmt 0)

That's fine if the page in question is a specific redirect script, but I didn't think you could do this for URLs with query strings in them. For example, if you're homepage is listed in Google as:

www.mydomain.com/?ref=123

www.mydomain.com/index.html?ref=123

You can't put this in robots.txt:

User-agent: *
Disallow: /?ref=123
Disallow: /index.html?ref=123

can you? I'm happy to be proved wrong on this :)

I guess you could do it using the robots metatag, provided your page is active, by checking the querystring in the page code and serving a noindex if a reference code is set. But if your pages are static html this wouldn't work either.

dirkz

4:23 pm on Jul 24, 2004 (gmt 0)

> I guess you could do it using the robots metatag, provided your page is active, by checking the querystring in the page code and serving a noindex if a reference code is set.

Seems to me like the best solution.

If you can't build dynamic pages like that I guess you could do it with a URL rewrite.

AthlonInside

7:42 am on Jul 25, 2004 (gmt 0)

Jam13,

They are correct!

www.mydomain.com/?ref=123

www.mydomain.com/index.html?ref=123

You can't put this in robots.txt:

User-agent: *
Disallow: /?ref=123
Disallow: /index.html?ref=123

BUT, remember that
Disallow: /?ref=123 will not only DISALLOW
www.mydomain.com/?ref=123
but also
www.mydomain.com/?ref=123456

So if all your tracking start with /?ref you just need one line
Disallow: /?ref
or another more for safety
Disallow: /index.html?ref

I recommend adding meta tag too as a backup but not as the only solution. I always discourage people to use meta tag to stop robot. For robots.txt, a file is not fecthed if it is not allowed! For meta tag, the page is fetched before it knows it can't index it! GoogleBot is a hungry monster and it will always take up lots of bandwidth which is useless. Since GoogleBot crawl a certain amount of links for one site, you can reduce the chance for GoogleBot to crawl more useful pages. And lastly, the most important, using metarobot to stop a bot will usually end up with the page appear in Google database with no title and description and contents (just the URL).

vrtlw

7:59 am on Jul 25, 2004 (gmt 0)

I had this same problem about 6 months ago, I've been using JDMorgan's recommendation [webmasterworld.com] (with a few mods) since and it has solved my issues to date.

jam13

9:12 pm on Jul 25, 2004 (gmt 0)

AthlonInside: I stand corrected :)

With regards Google just including the URL of a page in the index when you use the robots metatag, I was told that the opposite is true:

[webmasterworld.com...]