ckanext-datapreview
This CKAN extension supplies data from local storage or via a remote download, parses CSV/XLS and provides it at a URL that can be called by Recline or other CKAN data previewer.
e.g. for a spreadsheet it returns a JSON dict such as:
{
"fields": ["Name", "Age"],
"data": [["Bob", 42], ["Jill", 54]],
"extra_text": "This preview shows only the first 10 rows",
"max_results": 10,
"length": 435,
"url": "http://data.com/file.csv"
}
This extension is a modified, but local implementation of the OKFN dataproxy that runs as a CKAN extension rather than on Google AppEngine. This has been written to improve the performance on data.gov.uk and increase the maximum file size processed.
The interface to the extension:
/data/preview/<resource_id>?max-results=N&encoding=utf-8
is not exactly the same - dataproxy requires the URL instead of the resource id - the data returned is identical. Rather than always fetching the data from the remote site the new controller at the above route will first attempt to find the data in the ckanext-archiver’s local archive.
Installation
The most straightforward method of installation is:
git clone git://github.com/datagovuk/ckanext-datapreview.git
cd ckanext-datapreview
python setup.py develop
Or alternatively install directly using pip:
pip install -e git+https://github.com/datagovuk/ckanext-datapreview.git#egg=ckanext-datapreview
Once complete the datapreview should be added to your ckan.plugins property in the appropriate .ini file.
Config
In your CKAN config file, configure the following options:
limit
The ‘limit’ is the maximum size of a file downloaded or loaded into memory. If the data is not stored locally, then you don’t want to wait forever downloading it to be able to proxy it.
The limit is expressed in bytes, so the default of 5MB would be:
ckan.datapreview.limit = 5242880
Local CSV files are not subject to this limit because the first 100 rows can be loaded without loading the whole file into memory.
Requirements
- ckanext-archiver - for the resource cache
- messytables - (in setup.py)
Improvements
- Increases the limit on download size (doesn’t have the appengine download limit)
- Uses the local archive cache if it exists rather than hitting the remote site (only if ckanext-archiver has retrieved the file).
Note: This repository was archived by the owner on Jun 19, 2023. It is now read-only.