Extension Datastore Profiler


Extension Basics


Background Infos

Description (long)
Show details

Tests

ckanext-datastore-profiler

What is this? This is a public repo for logic and documentation for Toronto’s Open Data Profiler

What is the Profiler Eventually, it will 1. Summarize each attribute in Toronto Open Data datastore resources 2. Assign classifications to each attribute 3. Present those on the Portal’s API (and maybe frontend)

How can I Contribute? - Reach out to opendata@toronto.ca - Reach out on the Civic Tech Toronto Slack

Descriptive Statitics for Integers and Floats:

  • mean
  • min
  • max
  • median
  • count of distinct values

Descriptive Statitics for Dates and Datetimes:

  • earliest date
  • latest date
  • earliest time
  • latest time
  • count of distinct values for date
  • count of distinct values for date month
  • count of distinct values for date year
  • count of distinct values for time

Descriptive Statitics for Strings:

  • count of distinct values
  • count of distinct “words” (string separated by spaces, ignoring things like “this”, “that”, “a” etc
  • min and max string length
  • min and max word count per string
  • count of distinct “masks” (where every letter of a string is turned into an “L” and every digit turned into a “D”)

Requirements

Compatibility with core CKAN versions:

CKAN version Compatible?
2.6 and earlier not tested
2.7 not tested
2.8 not tested
2.9 yes

Installation

To install ckanext-datastore-profiler:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com/open-data-toronto/ckanext-datastore-profiler.git cd ckanext-datastore-profiler pip install -e .

Version
Version release date
(not set)
Contact name
(not set)
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

To install ckanext-datastore-profiler:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com/open-data-toronto/ckanext-datastore-profiler.git cd ckanext-datastore-profiler pip install -e . pip install -r requirements.txt

  3. Add datastore-profiler to the ckan.plugins setting in your CKAN config file (by default the config file is located at `/et

Plugins to configure (ckan.ini)
datastore_profiler
CKAN Settings (ckan.ini)
# ckanext.datastore_profiler.some_setting = some_default_value
DB migration to be executed
(not set)
<< back to Extensions