Advertisement
Article main image
Oct 9, 2018

I’ve recently seen some presentation decks online mentioning Google’s AROUND(X) and Bing’s NEAR:X proximity search operators, which reminded me of a recent exchange I had with a connection on LinkedIn. Someone reached out to me to ask about proximity search and to look at one of the searches they were running.

Here’s what they shared: (Komodo OR KMD OR “Agama Wallet” OR JL777) AROUND(10) (wallet OR merkle OR ICO OR blockchain OR decentralized OR decentralised OR dICO OR crypto OR cryptocurrency OR atomic OR etomic OR DEX OR exchange OR CEX OR dPOW OR “delayed proof of work” OR ERC20 OR contracts OR bitcointalk OR fintech OR “alternatives to” OR invest* OR “pay attention”)

Given the large OR statements on either side of the AROUND(10) operator, I couldn’t easily tell if the proximity operator was working or not, but I did tell them that in some of my previous testings over the years, it appeared that AROUND(X) didn’t work.

What’s the big deal?

First, proximity search is something I’ve been using for nearly 20 years and writing about since AROUND(2009). Being able to control how close terms are to each other in search results can give you the ability to target sentences in which people describe a responsibility, which is typically a verb (e.g., developing, designing, implementing, etc.) combined with a noun (Node.js, Hadoop, SAP, etc.). This is incredibly powerful and is a form of semantic search. Being able to target sentences and find people based on what they have specifically done is a great deal more predictive of a match than searching for keywords regardless of location or relation to each other. Some ATS’s (e.g., iCIMS), job boards (e.g., Monster) and search & match solutions (e.g., Sovren, Textkernel) reliably support proximity search.

Second, almost everything you see and read about sourcing is about the searches and not about reviewing and processing results. I find that many people don’t pay enough attention to their search results so I wanted to take the time to say something about the importance of examining results not only for the terms you searched for, but also to see if your search is actually “working.” As I like to say, (almost) all searches work. However, that doesn’t necessarily mean that the search engine actually adhered to your search syntax, or in other words, worked as you intended.

I proved myself wrong

I find that many articles and presentations on sourcing are about the “answers” rather than the process of achieving the “answers.” I’m also aware that many people simply want the “answers” – for example, specific Boolean search strings, email subject lines, etc. However, I’ve found that some of the most effective learning comes from making mistakes, thoughtful experimentation and questioning the status quo. In the process of writing this article (re)learned some lessons I think everyone can benefit from.

You may find it interesting to note I began this post as an article about how Google’s AROUND(X) search operator doesn’t work, but as I started to crunch through a bunch of searches to test my claim (I tried over 20), I ran into challenges to find clear evidence to show how it doesn’t work.

However, I finally settled on a pair of terms that would not appear very commonly together so I would not have to wade through a ton of results to find examples of AROUND(X) not working.

decrypt AROUND(3) hadoop

At first glance, it appears that the search was obviously working, especially if you allow for automated stemming (e.g., decrypting).

I started to dig into the results to find some in which decrypt and Hadoop were not mentioned within three terms.

On page one, I found a result with no mention of Hadoop. I found that odd, but I didn’t accept it on surface value that Google’s proximity search wasn’t working properly.

As I inspected the result, I noticed that although Hadoop wasn’t mentioned anywhere, I did see Hive was mentioned many times, and once right next to decrypt. Pay special attention to the bottom of the image below.

What I think we see here is that Google is actually meeting my search request to see Hadoop within three words of decrypt by using terms synonymous with Hadoop, such as Hive in this case and HDFS in at least one other.

This is quite cool, but it makes it more difficult to figure out whether or not Google’s AROUND(X) operator is actually working because you have to scan for synonyms (or at least what Google might believe to be synonyms).

So I decided to remove that variable from the equation and searched for the exact terms: “decrypt” AROUND(3) “Hadoop”

Remember, I purposefully chose a relatively uncommon combination of terms to find nearby to make it easy to inspect results to see if Google’s AROUND operator worked. If you use very common terms and/or a large distance between terms, you can easily be fooled into thinking Google is obeying the rules of the AROUND(X) operator, when in fact your search terms are close together anyway, and not because you “forced” them to be by using AROUND(X). It is critical to understand this.

Back to my search – if you look through the ~31 search results (minus ads), they all have a mention of Hadoop within three words of decrypt. However, it does shed some light on what “within X terms” means, as in 2 cases there were three terms between decrypt and Hadoop.

I’ve actually never really thought too deeply about exactly what within(X) meant – thinking about it now; I would typically interpret “within three terms” as a maximum of 2 terms separating the terms, so one has to be the 3rd term in order, forwards or backward. Many of Google’s results from this search fit that criterion. However, in a few cases, it appears Google’s AROUND(X) could be interpreting X to mean the maximum number of terms that can separate the search terms. Or, perhaps in the example above, Google is just ignoring the common word of “the” and not counting it as the 3rd word between decrypt and Hadoop. Either way works for me. ????

I decided to try one more search to ensure what I found with “decrypt” AROUND(3) “Hadoop” wasn’t some kind of fluke.

Here’s the second test search I settled on:

(developer | programmer | engineer) migrate AROUND(4) JavaScript resume -sample

Keep in mind I wasn’t trying to run the ultimate resume finding search – just test Google’s AROUND(X) functionality with a different search.

I didn’t go through the 100+ results (remember, ignore the “About 1,060 results” – keep clicking on the last page you can, and you will often quickly find the actual number of results is a fraction of the initial estimate), but I did test many random results across many pages. While the majority of the results I spot inspected did strictly adhere to migrate within four terms of JavaScript, I was able to find a couple of suspect results in which I could not be sure that Google wasn’t trying to substitute other terms for “migrate.”

So, just as in the previous example (I should have learned!), I used quotation marks to eliminate that variable. Although to be honest, I was happy with how Google was finding JavaScript related terms that met my criteria.

(developer | programmer | engineer) “migrate” AROUND(4) “JavaScript” resume -sample

I checked over 20 results across many pages and did manage to find a couple of results that I could not figure out why Google returned them, as they didn’t mention migrate at all.

Like this one.

In disfusion (a portmanteau of disbelief and confusion), I kept hitting CTRL-F and using the Multi-Highlight Chrome extension (which doesn’t always work properly – grrr).

Nope – no mention of migrate anywhere.

I was about to chalk it up to Google’s AROUND (X) not working, but that one result really bothered me. It didn’t make sense why Google would return it when every other result did in fact mention migrate.

Then, out of curiosity, I decided to view the page source and poke around.

Guess what I found?

This reminds me why I like to write about this kind of stuff.

I did the same thing to check on the other few suspect results and checked a couple of the other suspect results and found the same thing.

While it would take significantly more testing for me to say with 100% confidence that the AROUND(X) operator functions under all possible searching conditions, my test searches above clearly show Google’s AROUND(X) working as designed.

What about Bing and Yandex?

In the case of Bing, I can keep it pretty simple: NEAR:X definitely doesn’t work.

Here’s the documentation. Check out this search: decrypt near:3 Hadoop. Doesn’t work if you capitalize NEAR either: decrypt NEAR:3 Hadoop

For Yandex, it’s not as easy to tell.

I ran “decrypt” /3 “hadoop” as a first test.

As you can see from the search results, it looks like it’s working. However, I realized I assumed that the quotations would be the right operators. After checking Yandex’s search operators, quotation marks are for “search for exact words order,” so I can’t be certain quotation marks work on single terms.

Interestingly, another Yandex search operator guide describes quotation marks as matching “the exact number of words” – but the example given shows it can produce results changing the order of the terms.

How’s that for a contradiction‽

In any event, ! is Yandex’s operator for searching for words in their exact form.

Even though I initially used the wrong search operator for searching for exact terms, you can see the search ran and returned results that looked “right” on the surface, so I did dig into the results, and it didn’t take much effort to find examples where decrypt and Hadoop were not mentioned within 3 words of each other.

So, I took a second go using the ! operator: !decrypt /3 !hadoop

The results changed from the first search, but once again, on the surface, things look generally pretty good.

However, after digging in and testing random results, while many of the search results appear to meet the proximity requirement, many do not. Some of the results on the bottom of page 1 don’t even mention the search term decrypt, including when checking via the page source code. It doesn’t get any better on page 2. Feel free to check for yourself.

I decided to try my other proximity test search but simplify it to focus solely on the proximity aspect: !migrate /4 !JavaScript

At first glance, things look great, right? (do you see a pattern yet?)

However, don’t assume that just because you see a bunch of mentions of migrate close to JavaScript that Yandex is actually honoring the proximity condition.

All you need to do is randomly open results and look for migrate and JavaScript and check to see if they are in fact within four words of each other at least once in each result. From my random sampling, there are many results that do not mention migrate within four words of JavaScript, although you do have to get to page 5 to get to the more obvious examples. That’s really because mentions of migrate and JavaScript are not uncommonly mentioned together in pages/documents (and why my decrypt/Hadoop was a better test).

The other variant of proximity searching with Yandex is the &, which I’ve seen written about on SourceCon – it is supposed to “search for the terms to appear in one sentence.” Since the /X didn’t appear to be working, at least for my test searches, I decided to try the &.

!decrypt & !Hadoop

Yes, once again, page 1 looks good on the surface. It actually looked pretty good when I dove into individual results too.

However, I only had to get to page 2 to find results that didn’t even mention decrypt anywhere.

The Moral(s) of the Story

Remember, I started out this article as a “Google’s AROUND(X) operator doesn’t work” piece. As I dove into my testing, I had to go back and change the title and the entire article, which I believe makes this post so much better than what it was going to be.

You see, when I began running searches to support my belief that Google’s AROUND(X) operator didn’t work properly, I found evidence otherwise and changed my perspective.

I always love being proved wrong, even if I do it to myself. Perhaps, that’s the best way.

Here’s what I am hoping you can take away from my experience and this article:

  • Don’t blindly believe everything you see and hear
  • Test for yourself before believing what other people say or write
  • When experimenting, don’t just look for supporting evidence
  • When testing hypotheses, seek to remove as many variables as possible
  • Almost all searches “work” – even those with syntactical errors often produce results!
  • Don’t be fooled by how results look on the surface
  • When reviewing search results, don’t just look for what you expect – look for what you don’t
  • When you modify your searches, pay attention to changes in the # of results and the nature of the results themselves
  • If something doesn’t make sense, don’t give up – get creative and keep at it! What are you missing? Is there any other way you haven’t thought of?
  • Stay curious and open-minded: question and challenge others and yourself

Final thoughts on Google’s AROUND(X):

  • If you use Google’s AROUND(X) to search for 8, 9, 10 or more terms between two search terms, it may be incredibly challenging for you to verify if every result is, in fact, adhering to the maximum distance of X, as the terms will definitely be present throughout the documents. However, if they are in separate sentences, the semantic aspect of the search is most likely going to be broken. Remember, the real power of proximity search, when it comes to searching for people who have performed a specific responsibility, is the ability to target terms within the same sentence – typically when you are searching for people with specific experience/responsibilities (verb/noun combinations).
  • If you try and search for OR statements on either side of the AROUND(X) operator, I can’t say for certain if that actually works – that would take additional testing. Combining OR statements with the AROUND(X) operator can make it difficult to be certain if the results being returned are adhering to the proximity search limit.