datapusher-plus
DataPusher+ is a CKAN extension by dathere replacing the legacy Datapusher webservice with a full CKAN extension that leverages qsv (a Rust-based CSV data-wrangling toolkit) for blazing-fast type inference and data analysis.
Key Features
- Guaranteed data type inference by scanning entire files
- PostgreSQL COPY for direct data loading (no API overhead)
- Jinja2 formula system for metadata inference/suggestion
- DRUF (Dataset Resource Upload First) workflow support
- Supports CSV, Excel, ODS, Shapefile, GeoJSON formats
- Auto-indexing based on cardinality/dates
- PII screening with configurable regex patterns
Requirements
- CKAN 2.10+
- qsv v4.0.0+
- ckanext-scheming
- PostgreSQL datastore
- Redis Queue (RQ)
Installation
git clone https://github.com/dathere/datapusher-plus.git
cd datapusher-plus
pip install -e .
Add datapusher_plus to the ckan.plugins setting.
Database Migration
ckan -c /etc/ckan/default/ckan.ini db upgrade -p datapusher_plus
Configuration
ckanext.datapusher_plus.qsv_bin = /usr/local/bin/qsv
ckanext.datapusher_plus.formats = csv xls xlsx tsv ods
ckanext.datapusher_plus.preview_rows = 1000
ckanext.datapusher_plus.auto_index_threshold = 3
ckanext.datapusher_plus.prefer_dmy = false
ckanext.datapusher_plus.enable_druf = false
License
AGPL-3.0
Info
Version 3.0.0. By dathere / Joel Natividad. 50 contributors. Active development.