Extension Fulltext


Extension Basics

Title
Fulltext
Name
ckanext-fulltext
Type
Public extension
Description
The **ckanext-fulltext** extension enhances CKAN's search capabilities by enabling full-text indexing and searching of resources.
CKAN versions
Download-Url (zip)
Last commit
2 years ago (2023-08-31 20:10:40)
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

ckanext-fulltext - Fulltext searching plugin for CKAN

This extension provides plugins that allow CKAN to store and search full text data. It uses a new Solr field to do a full text search and then display the matches in CKAN.

The full text field enables the user to find datasets that contain text he or she is looking for, without the text being part of one of the CKAN fields. That means the full text will be stored separate and apart from other CKAN package data in Solr as well as in the PostgreSQL database.

Additionaly you can parse the fulltext of documents using a JCC-Wrapper for Apache Tika.

Plugin Installation

  • Install the extension into your python environment:

    (pyenv) $ pip install -e git + https://github.com/transparenzportalhamburg/ckanext-fulltext.git#egg=ckanext-fulltext

  • Your CKAN configuration ini file should contain the following plugin:

    ckan.plugins = inforeg_solr_search
    
  • Add a new field to your conf/schema.xml that acts like a catch-all field for the content of all resources:

    <field name="fulltext" type="textgen" indexed="true" stored="true"/>
    ...
    <copyField source="fulltext" dest="text"/> 
    
  • Create a fulltext table:

    paster --plugin=ckanext-fulltext fulltext init_fulltext_table --config=/etc/ckan/default/development.ini
    

Tika-Wrapper Installation (for Ubuntu)

In order to use the tikaparser you have to install jcc (http://lucene.apache.org/jcc/).
JCC requieres a recent cpp compliler, Java JDK 1.7+.

If you dont have the above installed just

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install openjdk-7-jdk

After that you should be able to install jcc

 pip install jcc

Now install the tikaparser

cd /path/to/ckanext-fulltext/ckanext/fulltext/parser
python setup.py build
python setup.py install

API usage

Once you’ve downloaded a full text online resource that you want to

Version
Version release date
(not set)
Contact name
Transparenzportal Hamburg
Contakt email
(not set)
Contact Url
(not set)


Installation Guide

Configuration hints
Plugins to configure (ckan.ini)
inforeg_solr_search
CKAN Settings (ckan.ini)
DB migration to be executed
(not set)
<< back to Extensions