ckanext-datasolr
⚠️ This extension is no longer maintained. (Archived on Nov 26, 2019)
datasolr is a CKAN extension to use Solr to perform datastore queries.
Motivation
Motivated by low PostgreSQL performance on very large datasets, datasolr provides an alternative API endpoint to perform searches using Solr. datasolr is compatible with and can be configured to replace the datastore_search API endpoint.
Use Case
datasolr aims to replace the search component of the datastore only. It is not a full replacement for the datastore, and its use case is for large datasets that are either not updated, or updated at regular intervals only.
- The data is still stored in (and the actual values fetched from) the PostgreSQL database
- datasolr does not currently provide automatic indexing
Key Features
- Can replace or work alongside standard datastore_search
- Supports Solr stats on fields (min, max, sum, etc.)
- Special filter
_solr_not_empty to ensure fields are not empty
- Resource-specific configuration support
- Extensible through IDataSolr interface
- Data Import Handler support for indexing from PostgreSQL
Differences with datastore_search
- Does not accept double quotes in field names
- Only accepts DISTINCT queries on a single field
- Does not support PostgreSQL full text query syntax
- Implements full text search on fields as wildcard search
Configuration
# Replace datastore_search API calls (default: False)
datasolr.replace_datastore_search = False
# Fallback action when resource is not handled by datasolr
datasolr.fallback = ckanext.datastore.logic.action.datastore_search
# Solr search URL
datasolr.search_url = http://localhost:8080/solr/collection2/select
# Unique field in dataset (default: _id)
datasolr.id_field = _id
# Solr schema field matching id_field
datasolr.solr_id_field = _id
# Field holding resource id for multi-dataset cores
datasolr.resource_id_field = resource_id
# Resource-specific configuration (prefix with resource.<resource_id>)
datasolr.resource.UUID.search_url = http://localhost:8080/solr/collection2/select
License
Developed by the Natural History Museum.