ckanext-interlinking
A CKAN extension that allows resource interlinking using recline.
Lucene Servlet for interlinking
The lucene servlet is able to interlinking given terms in Greek with refence datasets. The reference datasets must be provisioned in CSV format, and be indexed once prior searching in them. The user has alse to dictate the field which will be indexed.
After indexing datasets search queries can be carried out. Each time the servlet return the 10 best results order by score in descending order. Apart from the mathcing term of the indexed field, the score and the values of the rest of the fields are returned back.
Indexing datsets
The reference dataset must use commas as delimiter and its first row should carry the names of its columns. The csv must be placed inside 'WebContent/WEB-INF/data/' folder. Then an indexing POST request must be sent on the service (e.g. at http://localhost:8080/LuceneInterlinking/Interlinker) with the follwing JSON body:
{
"mode": "index",
"index": <index>,
"index_field": <index_field>,
"file": <file>
}
<index> is the name of the index which will be created. <index_field> is the name of the column to be indexed. <file> is the name of the file to be indexed.
After this step the dataset is available for search queries.
Querying indexed datasets
There two ways to query an indexed dataset. The first is a stemmed term serch query and the second a wildcard search query.
Stemmed term search query
It stems the search term and it uses it to search an index with stemmed values fot the indexed field of the indexed dataset. The request is a POST one, applied on the same URL as the indexing query. The body of the request is as follows:
{
"mode": "search",
"term": <term>,
"reference": <reference>
}
<term> is the search term. <reference> is the name of referenced dataset which will be queried and it essentialy refers to