Welcome to WebmasterWorld Guest from 3.80.4.76

Forum Moderators: Robert Charlton & goodroi

Can <meta name="origin" content=" be used instead of canonicals?

     
8:43 am on Sep 2, 2019 (gmt 0)

New User

joined:Sept 2, 2019
posts:5
votes: 0


So my CTO is running a duplicate of our www. site in api. I am not 100% clear on the reasons for this - but he is doing it. We apparently can't have a different robots.txt in the www and api subdomain as the files all must be identical.

The whole of api.sitename.com has now been indexed which just should not be.

To mitigate the problem, his solution was to add

<meta name="origin" content="https://www.sitename.com/webpage"> to pages such as api.sitename.com/webpage

I have not heard of Google respecting the origin meta tag, treating it as a faux canonical tag.

Does anyone have experience with this?
4:16 pm on Sept 2, 2019 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12368
votes: 403


I've never heard of the origin meta tag either.

The whole of api.sitename.com has now been indexed which just should not be.

I've encountered IT departments that have exact dupes of a site for development, and I've seen a lot of grief created by this kind of situation. If you block the subdomain with a password, there's essentially no way to communicate with the site from the outside,

4:29 pm on Sept 2, 2019 (gmt 0)

New User

joined:Sept 2, 2019
posts:5
votes: 0


The problem is.... I can't see any possible way, that it could be impossible, to not do what you are saying.

And yet, my CTO is looking me in the eye... and saying.... thats how it is.

So it really just comes down to the meta orgin tag either working or not. Technically? I guess it should. Its described here [doc.ohreally.nl...] though I don't know the real value of this site.

My issue is... does Google respect it or not?
5:01 pm on Sept 2, 2019 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Jan 22, 2011
posts:115
votes: 6


These are the tags Google understands

index - Allow the page to be indexed.
follow - Follow any links in the page as part of crawling.
noindex - Prevents the page from being indexed.
nofollow - Don't follow links from this page as part of crawling.
nosnippet - Don't show a text snippet or video preview from being shown in the search results. For video, a static image will be shown instead, if possible. Example: <meta name="robots" content="nosnippet">
noarchive - Don't show a Cached link for a page in search results.
unavailable_after:[date] - Lets you specify the exact time and date you want to stop crawling and indexing of this page.
noimageindex - Don't show the page as the referring page for an image in Google Image search results.
none - Equivalent to noindex, nofollow.
all - [Default] Equivalent to "index, follow".

Source [support.google.com ]
7:15 pm on Sept 2, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15818
votes: 854


We apparently can't have a different robots.txt in the www and api subdomain as the files all must be identical.
Say what now? You not only can, you have to have a different robots.txt for each valid hostname, including subdomains, because they will be separately requested. It is possible that www.example.com/robots.txt and api.example.com/robots.txt both serve the same physical file, whether because of the site's directory structure or some behind-the-scenes rewriting. But it’s nonsense to say they “must be” identical. Is this your CTO’s way of saying “I don’t know how to do it so I’ll tell the boss it can’t be done”?
3:59 am on Sept 3, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10334
votes: 1061


If it is not here, it probably isn't valid: [w3schools.com...]

You can robots.txt anywhere ... that does not mean the bots will respect it one way or the other! And I suspect that g has it's own "understanding" of robots.txt and that's the one that needs to be addressed.
5:04 am on Sept 3, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4470
votes: 332


It almost looks like someone confused some efforts to enable referrer tracking with some other ideas? About 5 years ago when more sites were starting to 301 to https, Moz.com's blog suggested using
<meta name="referrer" content="origin"> 
to enable tracking "https: --> http: " traffic source/referrers. That is the only place I know of where "origin" was ever discussed for meta tags. It was not an effective method because it only included the 'origin' domain, no page or full URLs. It was not related to canonicals in any way. I vaguely recalled it and had to look it up: [moz.com...]

It is not a Google thing, they are clear on what they read and use.
7:03 am on Sept 3, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10334
votes: 1061


It is not a Google thing, they are clear on what they read and use.


Exactly!

OP ... might provide a link to this discussion to the powers that be? :)

Or not!
7:34 am on Sept 3, 2019 (gmt 0)

Junior Member from DK 

Top Contributors Of The Month

joined:Oct 24, 2018
posts: 47
votes: 4


I looked at the source provided, the origin tag in this context deals with the origin of intellectual work. "... Indicate sources that were used to create an original work; a list of sources which is readable to the end user (e.g. footnotes or a seperate page) should be used for that ...". The example used is that of the ISBN number for a book. I don't see any way in which G would respect (or even know how to deal with) this in the situation described.

Alternative suggestion to the otherwise exellent suggestions already provided: x-robots-tag no-index for the API-version.

My guess is that you're an in-house, like myself?
7:44 am on Sept 3, 2019 (gmt 0)

New User

joined:Sept 2, 2019
posts:5
votes: 0


Ok. I can't stress enough what I am dealing with, in terms of locked down codebase, frameworks, and derision at suggesting things not of the idea of the CTO. Its really not worth my time to tell this guy... there is no way something is not possible... if If I well know... it is.

a. The robots.txt is pulled from a database field. The database is the same for the main domain and subdomain. It could still be made conditional on the subdomain it was being pulled into.

b. Google says its list of meta tags they can read is not exhaustive. I guess he is relying on this.

c. Yes, I also thought at first it was a confusing with/#*$!isation of the referrer meta tag, but it seems somewhat legit as a meta tag... I just don't see Google valuing it.

The problem is. I have no way to prove it won't work and a difficult time also showing harm.... that is... till the harm is done.

Of course the solution is "follow best practices, take no chances do what Google says" ... but all you guys are living in rational town, I am in crazy world.
4:36 am on Sept 4, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10334
votes: 1061


@richinberl ... as an employee you do as directed, but MEANWHILE, document these directions to CYA when the fit hits the shan!
7:29 am on Sept 5, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 5, 2002
posts:900
votes: 4


As far the Google index is concerned. you may just want to validate it on GSC and have the domain removed from the index using URL removal tool. That will relieve you for 3 months. "Just a double protection" will be the justification to your CTO.
8:02 am on Sept 5, 2019 (gmt 0)

New User

joined:Sept 2, 2019
posts: 5
votes: 0


Yes indeed, McMohan, I am already all over that solution, even prepared to do it permanently
8:42 am on Sept 5, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 5, 2002
posts:900
votes: 4


Unless you disallow on robots.txt or use noindex tag, it can't be permanent. You may have to do keep using removal tool every time it reappears in the index.
8:46 am on Sept 5, 2019 (gmt 0)

New User

joined:Sept 2, 2019
posts: 5
votes: 0


what i meant. I every 3 months. remove the urls. I do this as a permanent task in my calendar of stupid things i do/
10:31 pm on Sept 8, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10334
votes: 1061


chuckles ... for all the tech noise that AI can read your mind ... they can't read commonsense "disavow this krap 4ever and don't bother me!"

Pretty sure there's a human engineered routine that prevents the machine from actually getting that directive correct.

</sarcasm>
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members