Extension Datastore Solr


Extension Basics

Title
Datastore Solr
Name
ckanext-datasolr
Type
Public extension
Description
CKAN extension to use Solr for datastore queries, providing faster searches on very large datasets by replacing the PostgreSQL-based datastore_search.
CKAN versions
Download-Url (zip)
Last commit
6 years ago (2019-11-26 12:39:43)
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

ckanext-datasolr

⚠️ This extension is no longer maintained. (Archived on Nov 26, 2019)

datasolr is a CKAN extension to use Solr to perform datastore queries.

Motivation

Motivated by low PostgreSQL performance on very large datasets, datasolr provides an alternative API endpoint to perform searches using Solr. datasolr is compatible with and can be configured to replace the datastore_search API endpoint.

Use Case

datasolr aims to replace the search component of the datastore only. It is not a full replacement for the datastore, and its use case is for large datasets that are either not updated, or updated at regular intervals only.

  • The data is still stored in (and the actual values fetched from) the PostgreSQL database
  • datasolr does not currently provide automatic indexing

Key Features

  • Can replace or work alongside standard datastore_search
  • Supports Solr stats on fields (min, max, sum, etc.)
  • Special filter _solr_not_empty to ensure fields are not empty
  • Resource-specific configuration support
  • Extensible through IDataSolr interface
  • Data Import Handler support for indexing from PostgreSQL

Differences with datastore_search

  • Does not accept double quotes in field names
  • Only accepts DISTINCT queries on a single field
  • Does not support PostgreSQL full text query syntax
  • Implements full text search on fields as wildcard search

Configuration

# Replace datastore_search API calls (default: False)
datasolr.replace_datastore_search = False

# Fallback action when resource is not handled by datasolr
datasolr.fallback = ckanext.datastore.logic.action.datastore_search

# Solr search URL
datasolr.search_url = http://localhost:8080/solr/collection2/select

# Unique field in dataset (default: _id)
datasolr.id_field = _id

# Solr schema field matching id_field
datasolr.solr_id_field = _id

# Field holding resource id for multi-dataset cores
datasolr.resource_id_field = resource_id

# Resource-specific configuration (prefix with resource.<resource_id>)
datasolr.resource.UUID.search_url = http://localhost:8080/solr/collection2/select

License

Developed by the Natural History Museum.

Version
1.0.5
Version release date
2018-02-28
Contact name
Natural History Museum
Contakt email
(not set)
Contact Url
(not set)


Installation Guide

Configuration hints

ARCHIVED repository - no longer maintained. Requires Solr setup and schema configuration. Data still stored in PostgreSQL.

Plugins to configure (ckan.ini)
datasolr
CKAN Settings (ckan.ini)
# datasolr.replace_datastore_search = False
# datasolr.search_url = http://localhost:8080/solr/collection2/select
# datasolr.id_field = _id
# datasolr.resource_id_field = resource_id
DB migration to be executed
(not set)
<< back to Extensions