Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

inurl mystery

inurl returning larger than possible number of pages

         

enotalone

11:09 pm on Jun 24, 2006 (gmt 0)

10+ Year Member



Hi everyone, I would appreciate explanation why inurl returns much larger, impossible numbers. This is going on for a long time, first I though it might be a bug, a work in progress on G’s end, but it seems that this is there to stay and I am afraid something might be wrong in my end.

When searching with inurl:domain.com/dir/ results returned are 10 times larger than number of pages in that dir ever hosted.

Same is the case if I search with inurl:domaine.com/dir/ site:www.domain.com

For a directory that never had more than lets say 10.000 pages it would return 100.000 results!

I did try to manually review the result set but I can not get passed 100 pages and everything before that looks fine.

enotalone

11:38 pm on Jul 2, 2006 (gmt 0)

10+ Year Member



This thread was on hold for 2 days or so because it contained a link I guess and when approved was already on 2nd, 3rd page of ww.

If anyone have any experience at all with the issue above it would be really helpful if shared what can be causing inurl to return much higher than expected and possible number of results.

Thanks!

g1smd

12:19 am on Jul 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For any result that has more than 1000 entries, Google simply cannot make an accurate estimate of the number.

This has been ongoing for at least a year now.

daveVk

12:43 am on Jul 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



see [webmasterworld.com...] probably still relevent although site: fixes mean not seen on these querys now

rmoore007ri

12:53 pm on Jul 21, 2006 (gmt 0)

10+ Year Member



Perhaps related to the above:

We always used this syntax
<input . . . value="inurl:www.example.com/Departments/Psychology/" />
which would return only pages containing the search term in and beyond the url. For example, 13 hits on the search term "moore" . . .

Some time ago, a search began returning all pages in and beyond www.example.com/ For example, 34,000+ hits on the search term "moore" . . .

Changing "inurl:" to "+inurl:" fixed the problem. And so did changing "inurl:" to "site:" But we have no real idea why the search broke in the first place. Nor why this should fix it.

Something feels very fuzzy here.

<Sorry, no specifics - use example.com.
See Forum Charter [webmasterworld.com]>

[edited by: tedster at 4:56 pm (utc) on July 21, 2006]

rmoore007ri

3:15 am on Jul 25, 2006 (gmt 0)

10+ Year Member



Expanding my previous comment:

In our case, we are searching beyond the domain on a Public Service Search at an educational institution.

Sorry this is so long-winded, but what we have is syntax that used to work and is now broken. So I want to be very clear about what the problem is.

At the domain level we search using:
<http://www.google.com/u/school>
embedded in a form:

<form action="http://www.google.com/u/school" method="get" name="f2">
<input type="hidden" value="school.edu;schoolalumnimagazine.com;schoolbears.collegesports.com" name="domains">
<input type="hidden" value="school.edu" name="sitesearch">
<input type="text" class="txt" name="query" value="Search the Web"
<input type="submit" class="but" name="sa" value="Search" />
</form>

and this works very well.

To search beyond this, for example, to search in a university department, Google suggested using inurl: as an interim solution.
I've only found this documented on the blog:
<http://tenderlover.blogspot.com/2005/11/google-site-search.html>
where the suggested syntax is
<type="hidden" name="hq" value="inurl:www.example.com/yourdirectory">
embedded in a form:

<form method="get" action="http://www.google.com/u/school">
<input type="hidden" name="hq" value="inurl:www.school.edu/Departments/departmentname/" />
<label for="query">Search the Department Web </label>
<input size="30" name="query" id="query" value="Enter search term..." onfocus="clearDefault(this)">
<input type="submit" name="submit" value="Go" />
</form>

which used to work but stopped working some time ago. We don't really know when. Searching a
department with a search term that should return 20 hits returns 30,000 hits instead! Various
departments have various versions of this form and none of them work. So it seems unlikely it is
a subtle syntax problem.

With a little experimentation on variations of inurl: or substituting site: for inurl: in the example above
I obtained these reurlts:

value="inurl:www.school.edu/Departments/departmentname/" 100,000,000 hits
value="+inurl:www.school.edu/Departments/departmentname/" 31 hits
value="site:www.school.edu/Departments/departmentname/" 23 hits
value="+site:www.school.edu/Departments/departmentname/" 23 hits

The difference between 31 and 23 in these results is that Google is finding pages outside of our
domain with both the department url and the search term in them. Obviously we get results
that make the most sense with:
value="site:www.school.edu/Departments/departmentname/"
but can anyone see any problem with this syntax? We have no idea why the original syntax
stopped working. And need to change this on dozens (and dozens) of department pages. (Don't you just hate going back to 80 people in your academic departments and saying "sorry, we got the syntax wrong again.")

thanks all.

daveVk

8:00 am on Jul 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The article of Nov 2005 is probably past its used by date as G has changed and claims to have fixed the site: command this year. Have you tried reverting to arrangement used prior to this fix? Failing that would go for site: alternative, as dont think others are less buggy.

g1smd

8:55 am on Jul 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For some site, see that site:www.domain.com returns some results, and

site:www.domain.com -inurl:www returns a load of Supplemental results, but they are all www pages (shouldn't the -inurl:www exclude www pages?)

site:www.domain.com inurl:www returns www pages, but without any Supplemental results.

I see this for a large number of sites. I cannot yet confirm that it happens for all sites.

daveVk

11:02 am on Jul 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



shouldn't the -inurl:www exclude www pages?

Appears to exclude main index www pages only, in the same way the positive version includes main index pages only. Thats on a good day.