Extension Datastore Profiler


Extension Basics

Title
Datastore Profiler
Name
ckanext-datastore-profiler
Type
Public extension
Description
CKAN extension that creates data profiling summaries for datastore resources, calculating descriptive statistics for integers, floats, dates, and strings.
CKAN versions
Download-Url (zip)
Last commit
2 years ago (2023-07-11 21:32:36)
Url to repo
Category
Visualization & Analytics


Background Infos

Description (long)
Show details

ckanext-datastore-profiler

This is a CKAN extension for profiling datastore resources, developed for Toronto Open Data.

What is the Profiler?

The profiler will:

  1. Summarize each attribute in datastore resources
  2. Assign classifications to each attribute
  3. Present those on the Portal’s API (and maybe frontend)

Descriptive Statistics for Integers and Floats

  • mean
  • min
  • max
  • median
  • count of distinct values

Descriptive Statistics for Dates and Datetimes

  • earliest date
  • latest date
  • earliest time
  • latest time
  • count of distinct values for date
  • count of distinct values for date month
  • count of distinct values for date year
  • count of distinct values for time

Descriptive Statistics for Strings

  • count of distinct values
  • count of distinct “words” (string separated by spaces, ignoring things like “this”, “that”, “a” etc.)
  • min and max string length
  • min and max word count per string
  • count of distinct “masks” (where every letter is turned into an “L” and every digit turned into a “D”)

Requirements

Compatibility with core CKAN versions:

CKAN version Compatible?
2.6 and earlier not tested
2.7 not tested
2.8 not tested
2.9 yes

Installation

To install ckanext-datastore-profiler:

  1. Activate your CKAN virtual environment:
. /usr/lib/ckan/default/bin/activate
  1. Clone the source and install:
git clone https://github.com/open-data-toronto/ckanext-datastore-profiler.git
cd ckanext-datastore-profiler
pip install -e .
pip install -r requirements.txt
  1. Add datastore-profiler to the ckan.plugins setting in your CKAN config file.

  2. Restart CKAN.

Tests

To run the tests:

pytest --ckan-ini=test.ini

Contact

  • Reach out to opendata@toronto.ca
  • Reach out on the Civic Tech Toronto Slack

License

AGPL

Version
Version release date
(not set)
Contact name
Open Data Toronto
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Creates data profiling summaries with descriptive statistics for different data types.

Plugins to configure (ckan.ini)
datastore-profiler
CKAN Settings (ckan.ini)
DB migration to be executed
(not set)
<< back to Extensions