
ckanext-bioschemaharvester
This plug-in is an extension to CKAN Harvester to harvest (bio)schema datasets from repositories using (bio)schema. Example: MassBank Repo
This harvester is developed using the offical CKAN Harvester https://github.com/ckan/ckanext-harvest
following the actual Harvest Interface of gather, fetch and import techniques.
When installed, you can see an option to use as BioSchema Scrapper/Harvest

As name suggests, this harvester is more of a web-scrapper. It is developed using Beautiful scoop to harvest/fetch metadata from HTML page of the dataset (tested only on MassBank Repo)
Note: This plugin uses migrated tables from other plugin to store metadata to desired metadata tables without overwriting default ckan tables in the database. So, see that you already have these tables in your ckan instance.
- https://github.com/bhavin2897/ckanext-rdkit-visuals
- https://github.com/bhavin2897/ckanext-related_resources
Requirements
If your extension works across different versions you can add the following table:
Compatibility with core CKAN versions:
| CKAN version |
Compatible? |
| 2.8 & eariler |
not tested |
| 2.9 |
yes |
Installation
To install ckanext-bioschemaharvester:
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
Clone the source and install it on the virtualenv
git clone https://github.com/bhavin2897/ckanext-bioschemaharvester.git
cd ckanext-bioschemaharvester
pip install -e .
pip install -r requirements.txt
Add bioschemaharvester to the ckan.plugins s