The SEO, The PPC and The Ugly balancing act of the two! - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

The SEO, The PPC and The Ugly balancing act of the two!

Mr_Binco

11:22 am on Sep 5, 2016 (gmt 0)

Top Contributors Of The Month

I will be as brief and concise as possible! Here is my issue:

- We currently have 10 international sites set up and .com serves the UK & Ireland. We also have a .eu sight that is an exact replica of the .com site but must be kept for a truckload of reasons I wont bore you with. But essentially the site is there but non indexed using the robots txt file.
- The PPC team now need to set up a google shopping feed for Ireland, however the.com site cannot be used and the .eu site is necessary, as the feed has to be in euros.
- Bringing the eu site back into the index will cause a large number of duplication issues, however i need the site index-able again for this shopping feed. I do not need it to rank well, as it'll infringe on the .com site.

Any suggestions? 2 pointers:
1) I have thought about hreflang tags for ireland/.eu site to avoid duplication penalties. But this will cause the .com to drop out the index and hopefully be replaced just as well by the .eu site. Could prove very costly as well as unpredictable
2) I am thinking about canonical tags throughout the .eu site to its .com equivalent. This should also handle the duplication issue to, as well as allow google feeds. But what effect will this have on indexing? Will .com rankings be business as usual? Despite .eu canonical tags, will I still see url's ranking and affecting the .com results served?

aakk9999

11:55 am on Sep 5, 2016 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Welcome to WebmasterWorld, Mr Binco!

Maybe this is a stupid question but could you not keep the eu site blocked by Googlebot in robots.txt, but allow it for Adsbot? This would leave the site blocked as far as the organic search is concerned, but you would be able to run Adwords on it.

Mr_Binco

2:05 pm on Sep 5, 2016 (gmt 0)

Top Contributors Of The Month

Thank you.

Are you sure it will not index any of the pages it crawls? If so, this solution is def worth rolling out!

Thanks again

aakk9999

3:01 pm on Sep 5, 2016 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

robots.txt directive blocks crawling but the URL may be shown in Google index (usually only shown if you use site:example.com), although the content on the page will not be indexed as Google cannot see it. However, this is not different to what you have currently as you are currently utilising robots.txt.

While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results.
Learn about robots.txt files [support.google.com]

Why not explicitly allow Google Adbot to a couple of pages and run a test adwords campaign for a few days, then check if the page is in index (other than with "A description for this result is not available because of this site's robots.txt")?

Pick the URL to do the test with.
Firstly check the current situation by running the following query:
site:example.eu inurl:example.eu/url-you-will-use-for-testing

The URL will either be there with "A description for this result is not available because of this site's robots.txt" or the query may not return any results.

Also, perform a search of a unique text from this page by putting this text in quotes:

site:example.eu "some unique text on the page you will use as a test page"
This should return no results.

Now change the robots.txt to allow Adbot to this URL only, then run a test Adwords campaign with this page as a landing page for a few days.

After a few days repeat two queries above.
The first one may or may not show the URL with "blocked by robots" message, but the second one should not return the page despite letting Adbot see what the content of the page is (because googlebot is still blocked).

not2easy

3:05 pm on Sep 5, 2016 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Blocking crawling in robots.txt does not prevent indexing pages. If links point to those pages from other sites, the pages can be indexed even though crawling has been blocked. The way to prevent indexing is to use either a page by page "noindex" meta tag or x-robots headers.

Robert Charlton

12:02 am on Sep 6, 2016 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Blocking crawling in robots.txt does not prevent indexing pages.

A distinction should be made here between "page" and "url", as well as between in the "index" and "displayed in the serps".

Google will index the "reference" or "url" if the page is blocked by robots.txt, but it will not crawl the content of that page. When Google does return a result for that indexed url, that's when you get...

"A description for this result is not available because of this site's robots.txt"

Depending on where that message displays and what's in the referenced file... this message might bother you a lot, or it may not both you at all.

The meta robots "noindex" tag keeps all reference to the url out of the serps, but Google does in fact have to crawl the page content in order to see the "noindex" tag in the first place, so you can't use "noindex" and robots.txt together.

For a longish but worthwhile read, which discusses the interaction of these elements and clarifies definitions of some vocabulary we commonly misuse, I suggest this thread....

Pages are indexed even after blocking in robots.txt
https://www.webmasterworld.com/google/4490125.htm [webmasterworld.com]

The thread doesn't go into AdsBot-Google, which is a somewhat different animal, and I don't know how much, if any, of the discussion would apply in your Product Feed setup.

Regarding your situation... Google does like to crawl the pages included in your product feed, but I don't know whether AdsBot crawl would suffice to index product feed pages.

It is possible, though, to block Googlebot but allow AdsBot-Google, and what aakk9999 suggests is a very ingenious test....

Why not explicitly allow Google Adsbot to a couple of pages and run a test adwords campaign for a few days, then check if the page is in index (other than with "A description for this result is not available because of this site's robots.txt")?

Since these pages have been blocked for a long time, you might want to give them more than a few days to see if they get indexed. Please keep us posted on what happens. This is a very intriguing issue.

Mr_Binco

3:40 pm on Sep 6, 2016 (gmt 0)

Top Contributors Of The Month

Thanks guys for all your feedback. I will def keep you guys posted.

I guess my main aim is to achieve 3 things:
1. Making sure the .eu urls are not present in search results
2. Google does not see the .eu urls as duplicate content.
3. .eu site is crawled and served for Googles shopping feeds only!

Based on the feedback here my best bet is to test this by:
- Allowing the adbot access (Maybe allow all bots access if adbots isnt enough for Google to accept the feed) to crawl site
- Add no index, follow tags to each page
- Test this for several days before opening up the site to all bots if this isnt sufficient enough.
- Monitor how this effects the .com site and serps within Ireland

I'll update the findings soon