Version 4

    Search is one of the most critical components of a healthy community. A solid search tool helps you find people, places, and content quickly and easily. It lets you search far and wide or narrow your question to very specific parameters. It can make your job much easier – especially if you know how to make it work for you.

     

    We’ve found that the best way to use search effectively in Jive is to understand how it works. Jive Search  In this post, we’ll cover the nitty-gritty of Jive’s search engine so that you’re armed with the knowledge you need to search with success.

     

    Let’s start with the basics

     

    To make sure we’re all on the same page, let’s start with some vocabulary.

    • Field: A single piece of information within the content or user profile you're searching. For example, in a document, you have the title, the content, and the tags. For a user, you have their first name, last name, expertise, tags, and many more.
    • Spotlight search: The search box included in the header of every page. It pops up a limited number of results from content, users, and places.
    • Advanced search: A special search page you can open from the Spotlight search. It is used for more complex searches.
    • Stop words: Common words often occurring in any text. For example, the stop words for the English language include of, the, this, a, and and.
    • Stemming: Searching for the root of the word. For example, fishing, fished, and fisher have the same stem fish.

     

     

    Cloud Search 4.x architecture

     

    The new Jive search is powered by Amazon Elasticsearch Service and Amazon Comprehend, whose main benefits are in the improved stability and performance, as well as greater relevance due to enriching Jive objects with additional metadata.

    With this version, we have deprecated social search and replaced that with an engine where results are based on a combination of the previously weighed fields (title, body, tags) and newly extracted metadata (key phrases, entities, topics).

     

     

    Searching for content, people and places

     

    With Jive, you can search for content, users, and places, such as spaces, groups, and projects. There are some important differences to note with these types of search.

     

    Unless filtered by content type, the system will search for the search phrase in all of these content types:

    • Direct message
    • Poll
    • Blog post
    • Idea
    • Announcement
    • Document
    • Discussion
    • Question
    • File
    • Photo
    • Status update
    • Task
    • Event
    • Video
    • External Activity
    • Comments on content

     

    Previously, content search used only subject, body, and tags fields in search. Now the Cloud Search searches the following fields:

    • Content items' fields: subject, body, and tags (or language-dependent alternatives for subject and body)
    • Fields with extracted phrases by Amazon Comprehend: keyphrases, entities.text
    • Video fields: videoCaption, videoCelebrities, videoContent, videoLabels, videoText

     

    Stop words are removed from queries before these fields are further processed.

     

    When searching for users, the system searches for the phrase in each of the profile fields that person performing the search has access to according to their user settings (for instance, an admin will have access to more fields than a standard user.) This includes searching for users through the front end (spotlight and advanced search) as well as searching for users in the admin console People tab.

     

    When searching places, such as spaces, groups or projects, Jive searches the title, the description, and the tags.

    The same search algorithm applies here; a field that contains 5 words, one of which is a match, will receive a higher score than a field that contains 25 words, one of which is a match. If you're having trouble getting your place to show up at the top of a search for a particular term, be sure to use the search term in the title, description and tag field as many times as possible, with as few other words as possible.

     

    Searchable place types

    • Space
    • Group
    • Project
    • Personal blogs

     

    @mentioning

    When you @mention someone or something, Jive searches in a similar way to spotlight search. The search algorithm takes what you've typed in so far and adds a wildcard  (*) to it; like the Spotlight search, this means that no stemming is done with this search. The main difference from the Spotlight search is that @mentioning only searches the title (of content or place) and username, name, and email of a user.

     

    Exact matches

    When searching in the Spotlight search, if there is a person or a place in the community that matches the exact term used for search, the user will see a highlighted display of the person or place above all the content search results.

     

    The exact match follows the following criteria:

    • Top result is shown only for places and people.
    • The comparison for "exact match" is performed after "normalizing" the texts (both the query and the place name):
      • Replacing commas and pluses with spaces
      • Replacing all whitespace characters to a simple space
      • Collapsing multiple adjacent spaces into one
      • Trimming whitespace off the start
      • Lowercasing
    • The top result appears when there is an exact match, no matter what the rank of the result is.

     

     

    Spotlight search vs. Advanced search

     

    The Spotlight search appears at the top of each page. It’s intended as the "quick access" search feature, with suggestions, frequently used items, and search history. So, when typing the first few letters in the search box you can see suggestions based on the quick search performed in the background – including content completion and spelling correction options.

    In order to initiate the search, you will need to click on one of the suggestions or press Enter. At this point, the Spotlight search adds a wildcard (*) to the end of your search term. This means that if you’re searching for Computing and press Enter after typing com, it searches for computer, communications, comma, not just com. Note that the Spotlight search searches for tags as well as content, people, and places.

     

    The Advanced search takes place on the main search page. It offers more options to refine your search and does not apply the wildcard, as it expects you to provide all of your detailed criteria for the most specific results. For example, here you can limit a document search by author.

     

    The Cloud Search service uses the same algorithm to return results. The algorithm uses the “AND” search, which means that it will find results that include the search phrase first. The algorithm searches all included text, including attachments and comments – not just the initial blog post, document, or discussion.

     

     

     

    Suggestions

     

    The Cloud Search 4.x introduced suggestions based on the index content – when you type the first few letters in the search box then you are able to see suggestions based on the quick search result done in the background for you.

     

    The Cloud Search present autocomplete and spelling corrections to users. They are based on the user's search queries and the community context:

    • Personal search history – marked with the counterclockwise icon
    • Community specific suggestions – based on the relevant content, topics, and terms most common in your company
    • People suggestions – based on actual names of individuals at your company
    • General suggestions – based on public knowledge graphs
    • Spelling corrections – possible corrections for spelling mistakes, these are marked with Did you mean?

     

    These suggestions are based on special (hidden from search) multi-valued field data that is provided during document indexing. For people search, it contains person first name and full name, for other content it contains titles (subjects), and fields that are a result from Amazon Comprehend processing (keyphrases, entities and topic terms). Cloud Search checks for prefix matches that user types in the search box at the same time applying the fuzzy parameter to it (to be able to correct some spelling mistakes). In practice, the first letter must always match, and the longer is the search phrase, the less exact match is allowed.

     

    Suggestions respect user rights to see the name, as well as document rights and type filtering – technically we use completion suggester with context filtering. Meaning that if a user wants to search just for blog posts and they select such filter, only suggestions created from blog posts will be shown, and only the ones coming from documents that user has right to see.

    Suggestions are not affected by search synonyms defined for the community.

     

     

    How to get the most relevant search results: 5 parameters that influence content and place search rankings

     

    Relevant search results are critical to the success of your community. Let’s explore what parameters impact the relevancy score for a piece of content and the rank it will get when you search for a specific search phrase.

    Several parameters impact the rank of a piece of content and can provide a boost to get it to the top of the search results. This gets a bit technical, but provides a comprehensive overview of how Jive search “thinks”.

     

    Note that stop words are removed from queries before these fields are further processed.

     

    1. Similarity score

    When searching for a phrase, the system looks at each word in the phrase and checks the match type and place of match. Each match type/place has a boost score. The boost score is normalized with the number of times this term appears in the given content – the more it appears the higher the score. It is also normalized with the number of times the term appears in the search index in general, only in this normalization, the more common a term is, the less impact it has on the rank.

     

    Match types reflect how well your search query matches the results:

    • Raw: exact match of the search term
    • Analyzed: matches created by the language analyzer that use stemming, looking for the root of the word. For example, focusing will find focus, focused, etc.
    • Edgengram: for wildcard (*) search matches and for search-as-you-type queries

     

    Place of match is exactly what it sounds like: where in content the match was discovered.

    • Subject: title field
    • Body: content
    • Tags: tags added to the content

     

    The combination of these parameters determines the content’s similarity to the search query and boosts the more similar results accordingly. The higher the boost score, the more relevant the result.

     

    Match place / Match typeRawAnalyzedEdgengram
    subject1.01.01.0
    body0.10.10.1
    tag0.50.5
    keyphrases1.5
    entities.text *1.5
    videoCaption *1.25
    videoCelebrities *1.25
    videoContent *1.25
    videoLabels *1.25
    video.Text *1.25

    * The boost parameter is fixed and cannot be changed.

     

    2. Proximity score

    The proximity score checks how close the term is that the user searches for to what appears in the content. When a user searches for a phrase built from several words, this phrase may appear exactly the same way in the content or it may appear in the content in a slightly different way. For example, content with the term "product one-pager brochure" is an approximate match when searching for "product brochure".

     

    This proximity is also used to boost more relevant results. Exact matches get boosted more than proximity matches.

    • Exact match: when all the search terms appear in the content next to each other.
    • Proximity match: when all the search terms appear less than 3 words apart from each other.

    The exact and proximity matches are boosted by using only the Raw sub-fields for tags, and only the Analyzed sub-fields for other search fields.

     

    Match place Proximity boostExact match boost
    subject0.5 (default)1.6 (default)
    body0.5 (default)1.0 (default)
    tag0.1 (default)1.0 (default)

     

    We also look at the frequency. The score has a lot to do with how many occurrences of the word you're searching for exist in the field. For instance, if you write a 20,000 word essay that makes a single reference to the movie "Finding Nemo" somewhere in the document and you have another document in the system (or a status update or a blog post or a thread, etc.) that's only 50 words and includes "Finding Nemo", the system assumes that the latter is more relevant to a query for "nemo".

     

    3. Outcome type

    Content in Jive can be marked with structured outcomes. These outcomes impact the score of that content in the search results.

     

    OutcomeBoostOutcomeBoost
    Final1.4 (default)Offical2.0 (default)
    Outdated0.1 (default)Default1.0 (default)

     

    The content’s rank score is multiplied by the boost in the table based on its structured outcome. A higher boost will result in content being ranked higher in the search results and vice versa, so the 0.1 score for outdated documents significantly reduces their rank.

     

    4. Object type

    Similarly, content is boosted in search results based on its type. Documents and blogs are ranked higher as these are usually used for more comprehensive content that may be more relevant for the searching user.

     

    ObjectBoostObjectBoostObjectBoost
    Document1.4Discussion1.0Idea1.0
    Blog1.4Question1.0Video1.0
    Blog post1.4Poll1.0Status update1.0

     

    5. Recency

    Recency (also known as time decay) lowers the score for older content. The impact of content can be seen this way:

     

    Recency score vs. content item age

     

    The recency score calculation is based on these parameters:

    • Drop speed: 50. This determines how fast the algorithm reduces the content score by age.
    • Max value: 4 weeks. All the new content from the last 4 weeks has the same score without decay.
    • Minimum score: set to 0.9. This makes the maximum score difference of a very old document and a just-created document 2x. It is set so that even the oldest relevant content can still be found, but fresh content retains precedence.

     

    Final Rank

    The final rank of a content item is based on a combination of all of these parameters. This is how it is calculated:

     

    Rank = (SimilarityScore + ProximityScore) * OutcomeType * ObjectType * Recency

     

    The final rank numbers determine what will be displayed in your search results and in what order, with the objective of surfacing the most relevant content first.

     

    Default parameter configuration

    Here are the Default Cloud Search Index Parameters currently being used. This will help understand which fields, content types, and boosts are weighted higher and how it determines the search relevancy ranking.

     

     

    Admin tip: Using synonyms to improve search

     

    The Cloud Search supports synonyms for content and user searches. (The user search synonyms are fully supported from the Cloud Search 4.x.)

     

    You can define common synonyms for your particular system, like docs and documentation (for content) or Richard, Richy, Dick, and Ricky (for user names). The search word must match exactly the one from the dictionary (synonyms list) you defined. This means that if a user types a wildcard (*) at the end of the word they search for, the search will not include synonyms for this word – unless you add word ending with * to the synonyms list.

     

    Please note that adding synonyms will not affect the results of suggestions, and only search results will be altered when synonyms are defined. This happens because suggestions are based only on originally typed search prefix.

     

    To add synonyms, go to the Admin Console > Content > Content Synonyms or User Synonyms, enter a pair of words separated by a comma in the Synonyms box, then click Add Synonym.

     

     

    Tips for tweaking your search

    Finally, here are some tips and tricks for searching more effectively in Jive.

    • Add wildcards (*) to your search.
      • Note that wildcards can't be used as the first character of a search. This means that you can't search for all users with a particular email domain. For example, a search for '*@jivesoftware.com' will return no results (unless you have a user who has the exact string '@jivesoftware.com' as part of their profile, such as their username.)
    • Use filters to narrow down the default search range.
      • You can choose a different search range (other than the default “all”)  if you're only looking for more recent items, specific content types, or a particular author, for example. You can also narrow the search by outcome types.
    • Change the order of your search results.
      • The order of search results is set by the system according to relevance. You can change the order on the Advanced Search page by sorting by the last modification date or turning social search on or off.

     

    * * * * *

     

    This should provide a better understanding of how Jive search works and how to make it work for you. You can check Jive documentation for more details: Managing search in the Jive Cloud Community Manager Help and Finding people, places, and content in the Jive Cloud User Help.

     

    Questions? Let us know in the comments.