ckanext-versioned-datastore
A CKAN extension providing a versioned datastore using MongoDB and Elasticsearch.
Overview
This plugin provides a complete replacement for CKAN’s datastore plugin and therefore shouldn’t be used in conjunction with it. Rather than storing data in PostgreSQL, resource data is stored in MongoDB and then made available to frontend APIs using Elasticsearch.
This allows this plugin to:
- provide full versioning of resource records - records can be updated when new resource data is uploaded without preventing access to the old data
- expose advanced search features using Elasticsearch’s extensive feature set
- achieve fast search response times, particularly when compared to PostgreSQL, due to Elasticsearch’s search performance
- store large resources (millions of rows) and still provide high speed search responses
- store complex data as both MongoDB and Elasticsearch are JSON based, allowing object nesting and arrays
This plugin is built on Splitgill.
Installation
Installing from PyPI
pip install ckanext-versioned-datastore
Installing from source
- Clone the repository into the
src folder:
cd $INSTALL_FOLDER/src
git clone https://github.com/NaturalHistoryMuseum/ckanext-versioned-datastore.git
- Activate the virtual env:
. $INSTALL_FOLDER/bin/activate
- Install via pip:
pip install $INSTALL_FOLDER/src/ckanext-versioned-datastore
Post-install setup
- Add ‘versioned_datastore’ to the list of plugins in your config file:
ckan.plugins = ... versioned_datastore
- Install
lessc globally:
npm install -g "less@~4.1"
Other requirements
- MongoDB 4.x
- Elasticsearch 6.7.x (6.x is probably ok, but untested)
- CKAN’s job queue
Configuration
Required
| Name |
Description |
Example |
| ckanext.versioned_datastore.elasticsearch_hosts |
A comma separated list of elasticsearch server hosts |
1.2.3.4,1.5.4.3,es.mydomain.local |
| ckanext.versioned_datastore.elasticsearch_port |
The port for elasticsearch server hosts |
9200 |
| ckanext.versioned_datastore.elasticsearch_index_prefix |
The prefix to use for index names in elasticsearch |
nhm- |
| ckanext.versioned_datastore.mongo_host |
The mongo server host |
10.54.24.10 |
| ckanext.versioned_datastore.mongo_port |
The port to connect to the mongo host |
27017 |
| ckanext.versioned_datastore.mongo_database |
The name of the mongo database |
nhm |
Optional
| Name |
Description |
Example |
| ckanext.versioned_datastore.redis_host |
The redis server host (enables slugging) |
14.1.214.50 |
| ckanext.versioned_datastore.redis_port |
The port to connect to the redis host |
6379 |
| ckanext.versioned_datastore.redis_database |
The redis database index for slugs |
1 |
| ckanext.versioned_datastore.slug_ttl |
Slug time-to-live in days (default: 7) |
7 |
Usage
The plugin automatically detects resources on upload that can be added to the datastore. Accepted formats: CSV, TSV, XLS, XLSX.
Data is added in two steps:
1. Ingesting records into MongoDB
2. Indexing documents from MongoDB into Elasticsearch
Search is available via datastore_search and datastore_search_raw actions.
Commands
ckan -c $CONFIG_FILE initdb - ensure tables needed by this plugin exist
ckan -c $CONFIG_FILE reindex $OPTIONAL_RESOURCE_ID - reindex a specific resource or all resources
Interfaces
IVersionedDatastore - general interface for modifying data dicts, searches, results, fields, and index documents
IVersionedDatastoreQuery - hooks for search queries
IVersionedDatastoreDownloads - hooks for downloads