Extension Bioschemaharvester


Extension Basics

Title
Bioschemaharvester
Name
ckanext-bioschemaharvester
Type
Public extension
Description
The `ckanext-bioschemaharvester` extension enhances CKAN's harvesting capabilities by enabling the extraction of metadata from websites and repositories that utilize (bio)schema markup.
CKAN versions
Download-Url (zip)
Last commit
a year ago (2024-08-05 09:40:15)
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

Tests

ckanext-bioschemaharvester

This plug-in is an extension to CKAN Harvester to harvest (bio)schema datasets from repositories using (bio)schema. Example: MassBank Repo

This harvester is developed using the offical CKAN Harvester https://github.com/ckan/ckanext-harvest following the actual Harvest Interface of gather, fetch and import techniques.

When installed, you can see an option to use as BioSchema Scrapper/Harvest

Screenshot from 2022-04-26 13-55-13

As name suggests, this harvester is more of a web-scrapper. It is developed using Beautiful scoop to harvest/fetch metadata from HTML page of the dataset (tested only on MassBank Repo)

Note: This plugin uses migrated tables from other plugin to store metadata to desired metadata tables without overwriting default ckan tables in the database. So, see that you already have these tables in your ckan instance.

  • https://github.com/bhavin2897/ckanext-rdkit-visuals
  • https://github.com/bhavin2897/ckanext-related_resources

Requirements

If your extension works across different versions you can add the following table:

Compatibility with core CKAN versions:

CKAN version Compatible?
2.8 & eariler not tested
2.9 yes

Installation

To install ckanext-bioschemaharvester:

Activate your CKAN virtual environment, for example:

 . /usr/lib/ckan/default/bin/activate

Clone the source and install it on the virtualenv

git clone https://github.com/bhavin2897/ckanext-bioschemaharvester.git
cd ckanext-bioschemaharvester
pip install -e .
pip install -r requirements.txt

Add bioschemaharvester to the ckan.plugins s

Version
Version release date
(not set)
Contact name
(not set)
Contakt email
(not set)
Contact Url
(not set)


Installation Guide

Configuration hints

To install ckanext-bioschemaharvester:

Activate your CKAN virtual environment, for example:

 . /usr/lib/ckan/default/bin/activate

Clone the source and install it on the virtualenv

git clone https://github.com/bhavin2897/ckanext-bioschemaharvester.git
cd ckanext-bioschemaharvester
pip install -e .
pip install -r requirements.txt

Add bioschemaharvester to the ckan.plugins setting in your CKAN config file (by default the config file is located at `/etc/ckan/default/c

Plugins to configure (ckan.ini)
bioschemaharvester
CKAN Settings (ckan.ini)
DB migration to be executed
upgrade
<< back to Extensions