Extension BioSchema Harvester


Extension Basics

Title
BioSchema Harvester
Name
ckanext-bioschemaharvester
Type
Public extension
Description
CKAN harvester extension that extracts metadata from websites and repositories using (bio)schema markup and JSON-LD format.
CKAN versions
Download-Url (zip)
Last commit
a year ago (2024-08-05 00:00:00)
Url to repo
Category
Standards Compliance


Background Infos

Description (long)
Show details

This plugin is an extension to CKAN Harvester to harvest (bio)schema datasets from repositories using (bio)schema. Example: MassBank Repo. This harvester is developed using the official CKAN Harvester following the actual Harvest Interface of gather, fetch and import techniques. When installed, you can see an option to use as ‘BioSchema Scrapper/Harvest’. Includes three specialized harvesters: 1) BioSchema Harvester for sitemaps and web pages with JSON-LD, 2) nmrXiv Swagger Harvester (under development), 3) Chemotion-Repository Harvester (under development). Uses Beautiful Soup for web scraping and requires ckanext-harvest, ckanext-rdkit-visuals, and ckanext-related_resources plugins.

Version
0.1.5
Version release date
(not set)
Contact name
Technische Informationsbibliothek (TIB)
Contakt email
Contact Url


Installation Guide

Configuration hints

Requires ckanext-harvest, ckanext-rdkit-visuals, and ckanext-related_resources. Configure harvester type in harvest source settings.

Plugins to configure (ckan.ini)
bioschemaharvester
CKAN Settings (ckan.ini)
DB migration to be executed
(not set)
<< back to Extensions