BibRank Record Sorter API
Invenio Bibrank Record Sorter can be called from within your Python
programs via a high-level API and a mid-mid level API.
1. High-level API
Description:
The high-level access to the BibRank Record Sorter is provided
by exactly the same function as called from the web interface
when users submit their queries, if a rank method has been
selected. This should guarantee exactly the same behaviour if
the same parameters are given.
There are three thing to note: (i) When a search is done, the
search engine is sending a intbitset containing all the records that
matches the query. Since only records in the intbitset are ranked, a
intbitset must be created containing wished records to rank and be sent
as a parameter to the function. (ii) Some rank methods may choose to
ignore the intbitset, like the "Similar Records" function. (iii) In case
of an error ranking the records, the returned data is different.
Signature:
def rank_records(rank_method_code, rank_limit_relevance,
hitset_global, pattern, verbose=0):
"""
rank_method_code - 'jif', 'wrd' or other methods
rank_limit_relevance - a number defining the threshold of which
to rank records, may be ignored by rank method.
hitset_global, search engine hits, if all records should be
ranked, fill the intbitset with ones.
pattern, search engine query or record ID, must be a
list. ['CERN', 'fermilab'] or ['recid:12345']
verbose, verbose level - 0-9 defines how much debug information
should be shown
output if successfull:
list of records - [123, 321, 12451, 123, 12, 4]
list of rank values - ascending, same length as the list of
records [0, 10, 20, 30, 40, 100]
prefix - text to show before the rank value, '<--' hides rank
value, defined in config file.
postfix - text to show after the rank value, '-!>' hides rank
value, defined in config file.
verbose_output - the debug text depending on the verbose level.
output if error:
list of records - is None
list of rank values - is None
prefix - Contains headline of error
postfix - Error message or error box if exception.
verbose_output - the debug text depending on the verbose level.
"""
Examples:
>>> # import the function:
>>> from invenio.bibrank_record_sorter import rank_records
>>> # rank all records with the words 'higgs boson' according to the method "wrd"
>>> rank_records('wrd', 0, a_hitset, ['higgs', 'boson'], 0)
>>> # find similar records to the record 12345, hitset is here ignored because of 'similar records'
>>> rank_records('wrd', 0, a_hitset, ['recid:12345'], 0)
>>> # rank all records based on jif value
>>> rank_records('jif', 0, a_hitset, [], 0)
2. Mid-level API
Description:
Using the mid-level API, you can call directly the various methods
for ranking. The functions will not return data in a way the search
engine understands. They will neither find out if it is the correct
function that is called, but return an error if wrong code/function
is used.
Signatures:
def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose):
-This method calls each method mentioned in the config file and add the results together
def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,
-This method finds similar records based on the one given in the recID field. The recID field
must be a integer value. hitset is ignored. rank_method_code has to be 'wrd'.
def word_frequency(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
-This method ranks records based on the list of words in lwords field. rank_method_code has to be
'wrd'. Only records in hitset is ranked.
def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
-All other rank methods uses this function together with data from the rnkMETHODDATA table
(a dictionary of {recid: (text, value)} to rank the data. Only records in the hitset is ranked.
These mid-level API functions demands that the function create_rnkmethod_cache() has been called,
since it loads the config options needed.
The rank methods returns all the same data:
([[recid, value],[recid, value]], prefix, postfix, verbose_output)
Examples:
>>> # import the function:
>>> from invenio.bibrank_record_sorter import find_similar
>>> # find records similar to 12345, hitset must be full
>>> find_similar('wrd', 12345, hitset, 0, 0)
>>> # rank records according to a method called jif, using the single_tag...based method.
>>> # the list of words is here ignored, only the records in the hitset are used.
>>> rank_by_method('jif',['higgs'], hitset, 0, 0)
>>> # rank records containing ['higgs', 'boson'] using word similarity ('wrd')
>>> word_similarity('wrd',['higgs', 'boson'], hitset, 0, 0)
>>> # rank records using various methods, which methods to use is read from the config file.
>>> combine_method('cmb', ['higgs','boson'], hitset, 0, 0)