.. _searches: ======== Searches ======== A search is composed of a query, and various parameters to control which part of the set of matching documents to return, which fields to return in the matching documents, and which additional information about the matching documents to calculate and return. :: SEARCH = { "query": QUERY, "from": , "fromdoc": FROMDOC, "size": , "check_at_least": , "info": [ INFO ], "order_by": ORDER_BY, "display": , "verbose": , } Basic queries ============= Matching everything, or nothing ------------------------------- These seem a little pointless on first sight, but are useful placeholders for slotting into a standard query structure, and occasionally have other uses. In many situations, these queries will be optimised away when performing the search. A query which matches all documents:: QUERY = { "matchall": true } A query which matches no documents:: QUERY = { "matchnothing": true } A search for a value in a particular field ------------------------------------------ :: QUERY = { "field": [ , , ] } Various types of search are possible: - "is": searches for documents in which a value stored in the field is equal to the value to search for. This type is available for "exact", "id" and "category" field types. The value to search for must be an array of values, each of which is either a string or an integer (in the range 0..2^64-1). - "is_descendant": searches for documents in which a value stored in the field is a descendant of a value specified in the search. This type is available for "category" field types. The value to search for must be an array of values, each of which is either a string or an integer (in the range 0..2^64-1). - "is_or_is_descendant": searches for documents in which a value stored in the field is a value specified in the search, or is a descendant of a value specified in the search. This type is available for "category" field types. The value to search for must be an array of values, each of which is either a string or an integer (in the range 0..2^64-1). This is produces an equivalent query to combining an "is" search with an "is_descendant" search using the "or" operator (though this version may be slightly faster to parse). - "range": searches for documents in which a stored value is in a given range. This type is currently available only for "double", "date" and "timestamp" field types. - "distscore": searches for documents within a given range of a center point, and returns a score which increases the closer a document is to that point. The value to search for must be an object holding the following parameters (or, at least, those parameters which are marked as required): - "center": Required. The coordinate of the center point for the query; expressed either as a (longitude, latitude) pair, or as an object with "lon" and "lat" properties. - "max_range": Optional. The maximum distance, in meters, that a document's coordinate may be from the center for that document to match the query. Defaults to an unlimited distance. - "text": searches for a piece of text in a text field. The value to search for may be a single string, or an object holding the following parameters: - "text": - "op": - "window": - "parse": parses a query, and searches for the query in a text field. The value to search for may be a single string, or an object holding the following parameters: - "text": - "op": - "exists": used on the meta field (by default, named `_meta`) to search for documents in which a field exists. The value to search must be an array of values, each of which is either a fieldname to search for existence of, or "null" to search for existence of any field. Note that the ID and type special fields are excluded from the meta field, so it is not possible to search for their existence. - "nonempty": used on the meta field (by default, named `_meta`) to search for documents in which a field exists and has a non-empty value. The value to search must be an array of values, each of which is either a fieldname to search for non-empty values in, or "null" to search for non-empty values in any field. Note that the ID and type special fields are excluded from the meta field, so it is not possible to search for nonempty values in them. - "empty": used on the meta field (by default, named `_meta`) to search for documents in which a field exists and has an empty value. The value to search must be an array of values, each of which is either a fieldname to search for empty values in, or "null" to search for empty values in any field. Note that the ID and type special fields are excluded from the meta field, so it is not possible to search for empty values in them. - "error": used on the meta field (by default, named `_meta`) to search for documents in which a field caused an error when processing. The value to search must be an array of values, each of which is either a fieldname to search for error values in, or "null" to search for error values in any field. Note that the ID and type special fields are excluded from the meta field, so it is not possible to search for error values in them. Filtering results from another query ------------------------------------ The results from the primary query are returned, filtered so that only those results which also match the filter are returned. :: QUERY = { "query": QUERY, "filter": QUERY } Combining Queries ================= :: QUERY = { "and": [QUERY, ...] } QUERY = { "or": [QUERY, ...] } QUERY = { "xor": [QUERY, ...] } QUERY = { "and_not": [QUERY, ...] } QUERY = { "and_maybe": [QUERY, ...] } Scale the weights returned by a query. ====================================== Weights of a query, at any point in the tree, can be scaled by multiplying them by a constant factor. :: QUERY = { "scale": { "query": QUERY, "factor": = 0. Required.> } } Getting additional information ============================== Get co-occurrence counts for words in matching documents -------------------------------------------------------- Warning - fairly slow (and O(L*L), where L is the average document length). Returns counts for each pair of terms seen, in decreasing order of cooccurrence. The count entries are of the form: [suffix1, suffix2, co-occurrence count] or [suffix1, suffix2, co-occurrence count, termfreq of suffix1, termfreq of suffix2] if get_termfreqs was true. :: INFO = { "cooccur": { "prefix": , "doc_limit": "result_limit": "get_termfreqs": "stopwords": } } Getting term occurrence counts for words in matching documents -------------------------------------------------------------- Warning - fairly slow. Returns counts for each term seen, in decreasing order of occurrence. The count entries are of the form: [suffix, occurrence count] or [suffix, occurrence count, termfreq] if get_termfreqs was true. :: INFO = { "occur": { "prefix": , "doc_limit": "result_limit": "get_termfreqs": "stopwords": } } Setting custom sort orders ========================== By default, search results are ordered by a relevance score, calculated using the BM25 weighting scheme. The internal RestPose architecture allows for considerable flexibility in how weights are calculated, and also allows for ordering by schemes other than relevance score (eg, by a field value). As yet, little of this flexibility is exposed in the API, but more is planned to be. Contact the author if you wish particular options to be made available. Currently, the sort order can be set using the ``order_by`` configuration. A sort order may be set using a field, as follows:: ORDER_BY = [ {"field": , "ascending": ASCENDING // Optional - defaults to true } ] ASCENDING = , Alternately, the sort order can be set to be relevance order (which is the default order):: ORDER_BY = [ {"score": "weight", "ascending": false // Optional - defaults to false. true is not allowed, but this is included for completeness. } ] At present, the list of sort orders may only contain exactly one item. Specifying a result set offset relative to a document ===================================================== Sometimes, it's useful to be able to get the section of a search result set containing a particular document, rather than just a section of a result set based on an offset. For example, imagine you're providing an interface which allows users to page through a resultset where documents are constantly being added to the underlying database, so additional documents may be added at any time. In this situation, it may be better to ask for the results which follow a particular document, rather than to ask for the next page of results by a fixed offset. This is particularly true if documents are being returned sorted by most-recent first; using this technique. Note that this interface can also be used to get the rank of a given document in a set of search results, without having to iterate through the search results on the client side. To get a set of search results based on the position of a document in the search results, you can add a "fromdoc" property to the search:: FROMDOC = { "type": "id": "from": "pagesize": } Note that if a "fromdoc" property is supplied for a search, the "from" property must be 0 (or absent). .. _search_results: Search results ============== Search results are returned as a JSON object, with the following properties. * ``from``: (int) The `from` value used when performing the search. * ``size_requested``: (int) The `size` value used when performing the search. * ``check_at_least``: (int) The `check_at_least` value used when performing the search. * ``total_docs``: (int) The total number of documents searched through. * ``matches_lower_bound``: (int) A lower bound on the number of matching documents. This will be precise if `check_at_least` was -1, or was high enough to ensure that all matches were checked. * ``matches_estimated``: (int) An estimate on the number of matching documents. This will be precise if `check_at_least` was -1, or was high enough to ensure that all matches were checked. * ``matches_upper_bound``: (int) An upper bound on the number of matching documents. This will be precise if `check_at_least` was -1, or was high enough to ensure that all matches were checked. * ``items``: (array) An array of results from searching. Each result is a object, keyed by fieldname, holding the stored fields for that result. The search may limit which fields are returned.