Home > Admin Area > BibSort Admin Guide |
BibSort main goal is to make the sorting of search results faster. It does this by creating several sorting buckets (that hold recids) that are then loaded by the search_engine and cached.
BibSort module is active if the search_engine is using the sorting buckets to fast sort
the search results. BibSort module can be deactivated by setting the CFG_BIBSORT_BUCKETS=0
in the invenio.conf
file. Also, if bsrMETHOD
table does not contain any
data, it also means that the BibSort module is not active. The search engine will look into the BibSort
data structures to see if the method that was requested to sort the search results exists or not. If
it does not exist, then the old style sorting function (using bibxxx tables) will be used.
Currently there is no web interface for configuring this module. All the configuration is done via a configuration file. The location of this file is:
CFG_ETCDIR/bibsort/bibsort.cfg
Each sorting method has a section in this config file, that looks like this:
[sort_field_1] name = title washer = sort_alphanumerically_remove_leading_articles definition = FIELD: titleEach section of the file corresponds to a method.
definition = RNK: method_name
means that the data should be taken from the rnkMETHODDATA table, based
on the method_name (the method_name should correspond to a method in rnkMETHOD table)definition = MARC: marc1,marc2,marc3..
means that BibSort will sequentially look in all
the MARC fields (bibxxx tables) and retrieve the data (the order of the MARC fields is important, since
for a given record, BibSort will keep the value from the first MARC field that has data).definition = FIELD: foo
is similar with the above option, but for cases were we already have
logical fields defined in Invenio, BibSort can look into them in order to retrieve the list of MARC fields
that need to be queried.For adding a new sorting method, one needs to add a new section to the bibsort.cfg
file.
Once this is done, the config file needs to be loaded into the database:
$ ./bibsort --load-configSimilar, for deleting a method, one needs to remove the corresponding section from the
bibsort.cfg
file, and load the config into the database.
$ ./bibsort --dump-config
There are several command line instructions that can be used in order to update the BibSort data. For each instruction, one can define the methods and the records that the command should run on, like this:
$ ./bibsort --methods=method1,method2 --recids=4,7-17,23,1If these options will be let empty it will mean that the bibsort operations will run on all the defined methods, and either on all the records existing in the database, or on the all updated records (depending on the operation, see 3.1 and 3.2).
Rebalancing is the operation that will redo from scratch the sorting and recreate the sorting buckets. This should be performed once at the beginning and then maybe once per day, to be sure that the database is in complete sync with the BibSort data structures, and also, to be sure that the buckets are balanced (Imagine a big upload of new records, that will have the same publication year. All these records will be added to the same bucket for the 'publication date' method, making it much bigger then the others, and slower to perform any data calculations on it, including intersecting with the search engine output). If you have a clear idea of how the data is changing during one day, you can set up the rebalancing only for several methods, that contain data that is frequently updated.
$ ./bibsort -R [--methods=method1,method2]
Inserting/Updating/Deleting records in BibSort is done via the update-sorting operation.
Theoretically, this operation should run at short intervals, and for the benefit of the user
it would be good to run after BibIndex, so that the updates can be viewed as soon as possible.
If no methods are defined it will run for all the methods defined in bibsort.cfg
.
But, if you have a good overview of the nature of the changes in the data during a period of time,
the update-sorting can run more frequently for some methods (like sort by year or sort by title)
or less frequently (like sort by most cited, since the citation dictionaries are not updated so frequently).
Defining the recids, will result in the update-sorting to run only on those records. If no records are
defined bibsort will grab all the modified records since its last run. Since for ranking methods it will
anyway grab all the data, update-sorting for a ranking method is basically a rebalancing.
$ ./bibsort -S [--methods=method1,method2] [--recids=4,7-17,23,1]
Using the BibSort functionality will have the following impact on the 'Sort by' functionality of Invenio:
CFG_WEBSEARCH_NB_RECORDS_TO_SORT
currently set to 200.jrecÀ®
, so only up to those 'seen' by the user.
This means that using of=id
to retrieve the list of recids will not give the full list of recids in case
these also need to be sorted.