Forum Moderators: DixonJones

Message Too Old, No Replies

Tricky Webtrends filter

...for the Webtrend gurus out there....

         

pixie

12:40 am on Sep 16, 2002 (gmt 0)

10+ Year Member



I have a bit of a problem and was hoping someone could help (I'll try to make this quick).

I have a site that uses different servernames to access unique portions of the site. (i.e., [careers.company.com...] goes to the careers section of the site). Only pages related to the careers should be using this URL.

This works really well, but somewhere along the way the googlebot managed to index a number of pages with the wrong URL. (i.e. a page in the news section was indexed as [careers.company.com...]

From the user's standpoint, it's doesn't matter. When they click on a link in Google, they'll go to the correct page. It's only for me that it suddenly creating a reporting nightmare.

I need to exclude certain portions of our site for some reporting. Up until now I could easliy do by using a combination of "Multi Homed Domain" and "Directory" filters.

However, now because of these rogue urls I'm running into a problem.

Okay, so here's what I am trying to accomplish...

I need to EXCLUDE everything that goes to the root URL of "http://resources.company.com/" , but at the same time I need to INCLUDE the rogue urls that come up as, for example, [resources.company.com...] (this litterally could be any extended path).

If I Exclude "resources.company.com" using the "Multi Homed Domain" filter, then I end up losing the rogue pages as well - which I need to keep. I've tried the "Full URL", "File Only", and "Entry Page" filters, but they just ignore me when I try to filter out the path "http://resources.company.com/" .

Am I making sense?

If I had the Analyzer Suite from Webtrends it would be no problem because it has a URL Search and Replace feature that would allow me to replace the rogue URL with the correct one (i.e., replace "resources.company.com" with "www.company.com"). But, I only have the Log Analyzer and it doesn't have this feature.

Any suggestions on how I can get this to work would be greatly appreciated.

Hannu

11:12 am on Sep 16, 2002 (gmt 0)

10+ Year Member



How about using an external search and replace tool on the logfiles? (Guess I don't need to say that you shold be very careful)

And, have you tried to use full url include filters only - leaving the root out?

Mikkel Svendsen

11:49 am on Sep 16, 2002 (gmt 0)

10+ Year Member



In the long run you will have to solve this on the server side. The problem is that your unique pages can be requested from multiple entry points. As you say, from a users point of view it may not matter at all, but for tracking, analysing and indexing in web search engines this IS in fact a problem.

The best solution I have used in similar situations is to check on server side what version (entry point for this page) is requested. You then set the default you want (you third level domain structure) and if it do not match you dynamically insert a META-robots noondex tag.

This way the spiders will only be able to index one entry point for each unique page and you secure a consistent logging for your analysis. This also makes sure you don't spam by accident (having multiple entry points can look as multiple identical copies to a search engine, that might punish you for it

pixie

4:47 pm on Sep 16, 2002 (gmt 0)

10+ Year Member


Hannu: An external search and replace is what I finally resorted to. However, I was hoping for a solution that wouldn't require me to alter the original log files (of course I'm still working from a backup copy - I avoid working with the original files in any case.

Mikkel: Thanks for the tip. I will be sure to implement your suggestion going forward.