ckanext-datastore-profiler
This is a CKAN extension for profiling datastore resources, developed for Toronto Open Data.
What is the Profiler?
The profiler will:
- Summarize each attribute in datastore resources
- Assign classifications to each attribute
- Present those on the Portal’s API (and maybe frontend)
Descriptive Statistics for Integers and Floats
- mean
- min
- max
- median
- count of distinct values
Descriptive Statistics for Dates and Datetimes
- earliest date
- latest date
- earliest time
- latest time
- count of distinct values for date
- count of distinct values for date month
- count of distinct values for date year
- count of distinct values for time
Descriptive Statistics for Strings
- count of distinct values
- count of distinct “words” (string separated by spaces, ignoring things like “this”, “that”, “a” etc.)
- min and max string length
- min and max word count per string
- count of distinct “masks” (where every letter is turned into an “L” and every digit turned into a “D”)
Requirements
Compatibility with core CKAN versions:
| CKAN version |
Compatible? |
| 2.6 and earlier |
not tested |
| 2.7 |
not tested |
| 2.8 |
not tested |
| 2.9 |
yes |
Installation
To install ckanext-datastore-profiler:
- Activate your CKAN virtual environment:
. /usr/lib/ckan/default/bin/activate
- Clone the source and install:
git clone https://github.com/open-data-toronto/ckanext-datastore-profiler.git
cd ckanext-datastore-profiler
pip install -e .
pip install -r requirements.txt
Add datastore-profiler to the ckan.plugins setting in your CKAN config file.
Restart CKAN.
Tests
To run the tests:
pytest --ckan-ini=test.ini
Contact
- Reach out to opendata@toronto.ca
- Reach out on the Civic Tech Toronto Slack
License
AGPL