ckanext-fulltext - Fulltext searching plugin for CKAN
This extension provides plugins that allow CKAN to store and search full text data. It uses a new Solr field
to do a full text search and then display the matches in CKAN.
The full text field enables the user to find datasets that contain text he or she is looking for, without the text being
part of one of the CKAN fields. That means the full text will be stored separate and apart from other CKAN package data in
Solr as well as in the PostgreSQL database.
Additionaly you can parse the fulltext of documents using a JCC-Wrapper for Apache Tika.
Plugin Installation
Install the extension into your python environment:
(pyenv) $ pip install -e git + https://github.com/transparenzportalhamburg/ckanext-fulltext.git#egg=ckanext-fulltext
Your CKAN configuration ini file should contain the following plugin:
ckan.plugins = inforeg_solr_search
Add a new field to your conf/schema.xml that acts like a catch-all field for the content of all resources:
<field name="fulltext" type="textgen" indexed="true" stored="true"/>
...
<copyField source="fulltext" dest="text"/>
Create a fulltext table:
paster --plugin=ckanext-fulltext fulltext init_fulltext_table --config=/etc/ckan/default/development.ini
Tika-Wrapper Installation (for Ubuntu)
In order to use the tikaparser you have to install jcc (http://lucene.apache.org/jcc/).
JCC requieres a recent cpp compliler, Java JDK 1.7+.
If you dont have the above installed just
sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install openjdk-7-jdk
After that you should be able to install jcc
pip install jcc
Now install the tikaparser
cd /path/to/ckanext-fulltext/ckanext/fulltext/parser
python setup.py build
python setup.py install
API usage
Once you’ve downloaded a full text online resource that you want to