homepage Welcome to WebmasterWorld Guest from 54.198.224.121
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 35 message thread spans 2 pages: 35 ( [1] 2 > >     
URL seen when performing site:example.com search
gouri




msg:4442653
 8:57 pm on Apr 18, 2012 (gmt 0)

I performed a site:example.com search for a website that I am working on and in addition to the pages of the website displaying in the SERPS, I saw the following message: In order to show you the most relevant results, we have omitted some entries very similar to the (some number) already displayed.
If you like, you can repeat the search with the omitted results included

I then clicked the repeat link and saw the pages of the website that I saw when I performed the site:example.com search and the following url: www.example.com/scripts/siteUtil.js?somenumber

Some number represents a number.

I am not sure what the url represents and how it is coming up when performing a site:example.com search. I clicked on the url, and I saw a page with the following function code on it:

<!--
function getCopyrightDate(iStartYear, iRangeSize, separatorString)
{
var date = new Date();

// if no start year is passed in, then use the current year
if (iStartYear == null || iStartYear == 0 || iStartYear == "")
{
iStartYear = date.getFullYear();
}

// default end year to current year
var iEndYear = date.getFullYear();
if (iRangeSize != null && iRangeSize != "")
{
iEndYear = iStartYear + iRangeSize;
}

if (iStartYear == iEndYear)
{
return iStartYear.toString();
}
else
{
var separator = "-";
if (separatorString != null && separatorString != "")
separator = separatorString;
return iStartYear + separator + iEndYear;
}
}
-->

I was wondering if some of you could tell me what this url represents, how it is appearing when I perform a site:example.com search, and what does the function code mean?

I am not really familiar with what I am seeing.

Thanks.

 

enigma1




msg:4442677
 10:10 pm on Apr 18, 2012 (gmt 0)

You mean how it ended up in the Google's index? Several scenarios, if the bot saw a link like <a href="www.example.com/scripts/siteUtil.js?somenumber"> it got it, if there is some broken html somewhere, if the bot saw the string thought it was URL etc.

Since you can see the code should be easy to find out which file uses it.

The js code looks like it returns a string of a start and end year. Maybe it tries to create a copyright date range of the document to display it somewhere?

gouri




msg:4442684
 10:18 pm on Apr 18, 2012 (gmt 0)

You mean how it ended up in the Google's index?


Yes. That is one thing that I am trying to figure out.

Could this be related to Adsense?

lucy24




msg:4442694
 10:45 pm on Apr 18, 2012 (gmt 0)

Remember the old rule about child-proofing?

If a child can see it, the child can reach it.
If a child can reach it, the child can touch it.
If a child can touch it, the child can hold it.
If a child can hold it, the child can break it (or put it in its mouth, or use it to destroy your home, or ... et cetera, depending on what "it" is).

Change a few words and you've got Today's Google and URLs.

Andy Langton




msg:4442711
 11:57 pm on Apr 18, 2012 (gmt 0)

Google is "greedy" about URLs - it wants to retrieve as much as possible, and that includes URLs it gets from javascript, forms and many other areas.

If you want to avoid such URLs appearing in search results, I would recommend you look at the (sometimes complex!) subjects of canonicalisation and robots exclusion.

gouri




msg:4485870
 1:30 pm on Aug 17, 2012 (gmt 0)

@enigma1,

The js code looks like it returns a string of a start and end year. Maybe it tries to create a copyright date range of the document to display it somewhere?


Can you tell me where the js code might be trying to display the copyright date range? Is it somewhere on the site?

Or is it Google that is trying to do this and display it somewhere in the SERP?

You mean how it ended up in the Google's index?


Also, can urls such as www.example.com/scripts/siteUtil.js?somenumber
showing up in Google's index have an impact on rankings?



@Andy Langton,

If you want to avoid such URLs appearing in search results, I would recommend you look at the (sometimes complex!) subjects of canonicalisation and robots exclusion.


I don't have access to the root host file so I don't think that I can do anything with robots exclusion. Is there anything else that I can do?

Canonicalisation, I have heard that this is something that is used to set the preferred domain (www.example.tld instead of example.tld), but is this something that I can use for this situation as well? And would I need access to the root host file to do this?

tedster




msg:4485957
 5:25 pm on Aug 17, 2012 (gmt 0)

If you can't create a new file at the root - then you are at a big disadvantage. However, if you can upload a new text file so that it has the address http://www.example.com/robots.txt then you are set. Anyone responsible for a website should have this kind of access. If you don't then I think you should ask for it!

---------

Canonical issues come up any time a server file can be accessed with more than one URL - and any difference in the URL at all makes a second URL. That can be spelling, capitalization, order of variables, extra characters - on and on. The "canonical" URL is the exact character string that you intended to be indexed.

There's a thread on many of those possibilities in our Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page. See Canonical URL Issues - including some new ones [webmasterworld.com].

gouri




msg:4486230
 3:02 am on Aug 19, 2012 (gmt 0)

@tedster,

Thanks for the response and the recommendation.

One thing that I wanted to ask is if I am able to access my robots.text file, what would I write in it so that these urls don't appear in Google's index?

Also, I am not sure how to interpret the fact that these urls are appearing in Google's index when I do a site: operator search but they are not appearing in Bing's index?

The function code that I am seeing on these urls appears to be some sort of calendar script. Where on my site are these pages located?

Since the url has js in it, I am interpreting this to mean that it is some kind of javascript. The only javascript that I think I have on the site is Adsense. Could this be adsense or is it something else?

A reason that I feel it is not Adsense is I don't think that a calendar script is related to adsense, but at the same time, I don't know what it could be.

Could the presence of these urls in Google's index affect the site's rankings?

This is an area that I am not very familiar with, and I would really appreciate your thoughts.
.

[edited by: Robert_Charlton at 5:07 am (utc) on Aug 28, 2012]
[edit reason] fixed (smiley) formatting [/edit]

bhartzer




msg:4486261
 1:43 pm on Aug 19, 2012 (gmt 0)

It is not related to Adsense. In the robots.txt file, you would want to disallow the /scripts directory, i do not see any reason to let that directory get indexed.

gouri




msg:4486272
 3:07 pm on Aug 19, 2012 (gmt 0)

Could the presence of these URLs in Google's index spread the page rank of the site and impact rankings?

So instead of all of the site's page rank going to the content pages, it is also going to these URLs and having an impact on the terms that the site's content pages rank for.

I am also wondering if these URLs could be related to the website's external stylesheets? The reason I ask is I looked at the site logs and I saw URLs such as:

http://www.example.tld/color_2.css?somenumber
http://www.example.tld/theme.css?somenumber

and somenumber are the same numbers that I am seeing in the URLs in Google's index.

I would appreciate if you can help me to make sense of all this.

I am trying to figure out how it all fits together.

phranque




msg:4486343
 11:25 pm on Aug 19, 2012 (gmt 0)

the question mark (?) in a url is followed by a query string which the server passes to the requested resource.
in your case the requested resources are javascript and css files which are static and therefore ignore the query string.

however these urls are non-canonical for these resources and these requests should be externally redirected to the canonical url (i.e. with no query string appended)
how you do this depends on your server configuration.

Could the presence of these URLs in Google's index spread the page rank of the site and impact rankings?

these urls can't pass page rank and you probably aren't linking to them with an anchor tag.
the only search impact they might have on your site is wasted crawl budget.

In the robots.txt file, you would want to disallow the /scripts directory, i do not see any reason to let that directory get indexed.

disallowing a url pattern in robots.txt will not prevent a url from being discovered and indexed without being crawled.

gouri




msg:4486370
 2:11 am on Aug 20, 2012 (gmt 0)

@phranque,

Thanks for the explanation.

If I am seeing http://www.example.tld/color_2.css?somenumber for example, should it be redirected to

(1) http://www.example.tld/color_2.css? (2) http://www.example.tld/color_2.css or (3) http://www.example.tld

Which of the above is the canonical URL? If it is something else, I would appreciate if you can tell me.

I don't have access to my root host file. Is there something that I can do to externally redirect the URLs that I am seeing to the canonical url or is access to the root host file necessary to do this? If access to the root host file is necessary and I don't have it, is there another way that I can accomplish this?

I am also wondering, for several years on this site, I did not have URLs such as

www.example.com/scripts/siteUtil.js?somenumber

appearing in the SERP. What could be making them to appear in the SERP now?

If I have some idea of what is causing this, then maybe I can focus on that area(s).

I really appreciate your help with this.

phranque




msg:4486393
 5:43 am on Aug 20, 2012 (gmt 0)

http://www.example.tld/color_2.css?somenumber should be redirected (with a 301 status code) to http://www.example.tld/color_2.css

I don't have access to my root host file.

i'm not sure what that means technically.
do you mean you don't have access to the server configuration file or the document root directory or ...?

what type of server is it?

What could be making them to appear in the SERP now?

most likely it was this series of events:
- google discovered the url
- googlebot requested the url from your server
- your server responded to that request with a 200 OK status code
- the url was indexed

gouri




msg:4486640
 2:31 am on Aug 21, 2012 (gmt 0)

@phranque,

do you mean you don't have access to the server configuration file or the document root directory or ...?

I don't have access to the document root directory (I believe that is the same as the server configuration file). I can't upload a .htaccess file.

what type of server is it?

It is a linux platform.


With the way things are, is there a way for me to externally redirect the non-canonical urls to the canonical url?

I would definitely like to make good use of my crawl budget.

I appreciate your help.

phranque




msg:4486663
 4:59 am on Aug 21, 2012 (gmt 0)

It is a linux platform.

linux is an operating system, not a server.
i'll assume you meant it's an apache server.

I don't have access to the document root directory (I believe that is the same as the server configuration file).

it had better not be the same!
if you don't have access to the document root directory how did you upload those javascript and css files?
i'm not sure there's anything you can do without the intervention of your hosting service.

gouri




msg:4486741
 12:05 pm on Aug 21, 2012 (gmt 0)

@phranque,

Thanks for the information about the server.

it had better not be the same!
if you don't have access to the document root directory how did you upload those javascript and css files?

To add the javascript, I added a script box into the body text of my template and pasted the Adsense code there.

The template is CSS enabled so I think that this is where the CSS files come from.

Do you think that there is additional javascript on the site in addition to the Adsense and the urls that I am seeing in the SERP are coming from that other javascript? Could there be javascript in the external CSS files or another area of the website?

Also, can you tell me the difference between a document root directory and the server configuration file?

I apologize for the questions, but I think that this will help me to figure out if I can do something about the urls that I am seeing in the SERP.

Thanks.

gouri




msg:4488295
 5:28 pm on Aug 25, 2012 (gmt 0)

I observed something, but I am not sure what it means. I would appreciate if you guys can give me your opinion.

I did a site:operator search recently and did not see the urls such as www.example.com/scripts/siteUtil.js?somenumber in the SERP.

The next day, when I performed a site:operator search, I saw those urls in the SERP.

A couple of days later, I was looking at the cache versions of the site's pages and I saw that they were around the same time as when I did the site:operator search and did not see the urls such as www.example.com/scripts/siteUtil.js?somenumber in the SERP.

Does this mean something? Can you help to interpret this?

Thanks.

[edited by: incrediBILL at 4:24 am (utc) on Aug 28, 2012]
[edit reason] fixed formatting [/edit]

g1smd




msg:4488581
 7:48 am on Aug 27, 2012 (gmt 0)

Go to the very last page of the site: listings and click on the "show omitted results" link. Google sometimes hides the duplicates, sometimes not.

Why are css and js files in the serach results anyway?

lucy24




msg:4488590
 9:31 am on Aug 27, 2012 (gmt 0)

Why are css and js files in the search results anyway?

If you have a small enough site-- in filecount, not traffic-- you can get the entire site to show up in Unwanted Smiley Search simply by searching for something like the letter "e" and constraining it to your domain.

gouri




msg:4488647
 1:31 pm on Aug 27, 2012 (gmt 0)

Go to the very last page of the site: listings and click on the "show omitted results" link. Google sometimes hides the duplicates, sometimes not.

When I do this, I see urls such as www.example.com/scripts/siteUtil.js?somenumber

somenumber is different for all the urls but if I click on the link, I think that I am seeing what I have in my opening post for all of them.

Why are css and js files in the serach results anyway?

That is something that I am trying to understand :)

Can you tell me why this might be happening?

Also, where could these files be coming from?

I am not sure if this helps but for several years, when I performed a site:operator search, I did not see this, but now I am.

[edited by: incrediBILL at 4:25 am (utc) on Aug 28, 2012]
[edit reason] fixed formatting [/edit]

lucy24




msg:4488742
 4:41 pm on Aug 27, 2012 (gmt 0)

It sounds as if you have very little control over your site content, both from the production side (css or js files that you don't know about, parameters that you can't explain) and the upload side (no access to the physical site directory, if I'm reading right).

It's tricky without naming names, but I for one would understand better if you explained how your CMS works and what your hosting setup is. Or, if that sentence was Hungarian to you: How do you make the website? How do you change it?

gouri




msg:4488932
 2:18 am on Aug 28, 2012 (gmt 0)

@lucy24,

The website is made with a WYSIWYG editor. Changes are made by going to the particular page that you want to make a change to and adding text, etc.(I know that you know how WYSIWYG editors work, but I just thought that I would mention it). It is shared hosting.

The WYSIWYG editor has good features and is user-friendly, but I don't have root access. By not having root access, I don't think that I can put a robots.txt or .htaccess file in the site's directory.

On the site, I believe that I can add meta tags to the site's pages.

Is there anything that I can do about the urls appearing in the SERP? Are there some functions and features that I should see if I have in the hosting setup and/or site content that might help me?

phranque




msg:4488950
 4:12 am on Aug 28, 2012 (gmt 0)

do you have any type of web-based control panel access to manage your server?
something like cpanel or plesk?

lucy24




msg:4488965
 5:20 am on Aug 28, 2012 (gmt 0)

When you say "don't have root access", do you mean that you can't FTP into the site (or SFTP or SSH or something web-based or, or, or), look at everything that's there, and move/upload/delete files? Is this your own domain or are you piggybacking on someone else's? Or is it one of those package deals where the site gives you the software and it uploads itself?

Oh, wait. Didn't you start out saying it was your own domain? That would tend to exclude WordPress-type things where, as far as I know, they take you by the hand and do everything for you.

This may seem like an awful lot of questions, but it's hard to give concrete advice without knowing the exact physical setup. I know I'd be very unpleasantly surprised if a site:mydomain search turned up anything like
<!--
function getCopyrightDate(iStartYear, iRangeSize, separatorString)
{
var date = new Date();

// if no start year is passed in, then use the current year
if (iStartYear == null || iStartYear == 0 || iStartYear == "")
{
iStartYear = date.getFullYear();
}

et cetera because I know very well I haven't done anything like that myself. And the only thing in my domain that I didn't personally make is piwik, which is strictly off limits to robots. ("Strictly" = htaccess 403, no futzing about with robots.txt)

I don't even know what that is. Javascript i-thingie whose name I keep forgetting? (And incidentally, isn't it doing one of those standard no-no's like dynamically generating a copyright date?)

gouri




msg:4489482
 3:38 pm on Aug 29, 2012 (gmt 0)

@phranque, @lucy24,

I am figuring out the answers to the questions that you asked, but I wanted to ask a couple of questions that I think might help to figure out why this is happening. They expand, to some extent, on some of the questions that I asked in earlier posts.

The first thing is that the urls such as www.example.com/scripts/siteUtil.js?somenumber are only appearing in Google's index. They are not appearing in Bing's index. What could be the reason for this? Are there different ways to crawl a site and maybe the search engines are using different methods? I am not sure if this is possible, but if it is, can you mention what these methods could be?

Another thing is that for several years, I did not see urls such as www.example.com/scripts/siteUtil.js?somenumber in Google's index. How did they start becoming discovered after several years? Could there have been some kind of change in crawling method (this probably relates to the question that I asked in the previous paragraph).

One more thing I wanted to mention that I feel might help to figure out what is happening is the urls that I am seeing in Google's index have increased over the past couple of months. When I perform a site:operator search, I have gradually seen more of them. They did not all appear in the index at the same time.

Thanks.

phranque




msg:4489506
 4:19 pm on Aug 29, 2012 (gmt 0)

their respective crawlers discover urls in different places and parse documents in different ways.
and they will obviously go new places and use new methods over time.

as long as those urls aren't referred to by anything on your domain i would focus on providing the proper redirect or error response.

gouri




msg:4489515
 5:44 pm on Aug 29, 2012 (gmt 0)

as long as those urls aren't referred to by anything on your domain i would focus on providing the proper redirect or error response.


Does as long as those urls aren't referred to by anything on your domain mean as long as I myself am not creating links to these urls on the site, the way that I would link text from a paragraph on the homepage to an inner page, for example.

lucy24




msg:4489579
 8:31 pm on Aug 29, 2012 (gmt 0)

Yes. But don't overlook invisible links. To sum up scattered remarks from earlier in this thread:

Your page might, for example, have a tracking code that consists of a couple lines of javascript which link to the "real" tracking function living somewhere else. The user can't see this; it just executes quietly in the background. But that invisible link such as <a href = "http://www.example.com/trackername/trackername.php">* is visible to any passing robot, because they're reading the raw html, not the surface of the page. And each new request comes with a whacking huge query string containing basically every speck of information the analytics program was able to extract.

So if the robot is exceptionally stupid and/or in an unusually bad mood, you've suddenly got a pile of Duplicate Content for stuff it was never meant to see in the first place.

The same invisible-link pattern applies to any external file: images, stylesheets, shared javascript. Except that those aren't likely to have query strings.


* Hasty edit here to remove actual name of program ;)

gouri




msg:4489657
 2:59 am on Aug 30, 2012 (gmt 0)

But that invisible link such as <a href = "http://www.example.com/trackername/trackername.php">* is visible to any passing robot, because they're reading the raw html, not the surface of the page.


Is the raw html what you see in the source code? Or is it something else, something that a robot can see when crawling a page but isn't visible in the source code?

Also, can these invisible links be something that a robot did not pick up for years and then started to pick up? Or do you think that there might have been some type of change made to the hosting settings and/or template of the site that caused these urls to get indexed?

lucy24




msg:4489668
 5:40 am on Aug 30, 2012 (gmt 0)

Raw HTML = source code. Yes. (Edit: Well, sort of. If you've got php or even simple Includes, "raw hmtl" can mean different things depending on context. But basically, search engines see what you see if you're online and select View Source or equivalent.)

Googlebot only recently started reading and acting on javascript. There are probably other changes, but that's a big and obvious one.

Someone somewhere can probably pinpoint pretty exactly when g### learned how to do javascript. It's pretty recent-- within the last year or two, I think. But that's why I used analytics as an example. The link is inside a <script> section.

This 35 message thread spans 2 pages: 35 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved