Extension Embeddings


Extension Basics

Title
Embeddings
Name
ckanext-embeddings
Type
Public extension
Description
The **Embeddings** extension enhances CKAN's search and discovery capabilities by leveraging machin
CKAN versions

~2.10, ~2.10.4

Show details
Download-Url (zip)
Last commit
a year ago (2024-05-22 08:47:42)
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

ckanext-embeddings

Tests

An extension that uses Machine Learning Embeddings to provide similarity-based features for CKAN portals. Currently there are two features explored:

  • Returning similar datasets
  • Performing Semantic Search in Solr integrated with the usual CKAN search
:warning: Note :warning:
This is just an alpha version and has not been tested in any real world deployment

What this does

Embeddings are a Machine Learning technique that allows to encode complex pieces of information like text or images as numerical vectors (lists of numbers) that encode the relationships and similarities between these pieces of information. The closer the vectors are, the more similar these objects are according to the underlying model used to create the embeddings. There are many introductory resources to learn more about embeddings, here are some I found useful:

In the context of CKAN, this plugin computes embeddings for all datasets, using their metadata (their title or description, but also any other relevant metadata field can be used). Being able to compare datasets allows us to build features that increase discoverability of relevant data for users.

Right now there are two features implemented:

1. Similar datasets

By computing all datasets embeddings and rank them against a particular dataset one, we can get the most similar datasets to the one

Version
0.1.1
Version release date
2024-05-22
Contact name
Adrià Mercader
Contakt email
(not set)
Contact Url
(not set)


Installation Guide

Configuration hints

To install ckanext-embeddings:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com//ckanext-embeddings.git cd ckanext-embeddings pip install . pip install -r requirements.txt

  3. Add embeddings to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).

  4. Add the following con

Plugins to configure (ckan.ini)
embeddings
CKAN Settings (ckan.ini)
# ckanext.embeddings.backends =
        my_embeddings_backend = ckanext.my_ext.embeddings:MyBackend
# ckanext.embeddings.backend = my_embeddings_backend` in your ini file.
# ckanext.embeddings.solr_vector_field_name = vector
# ckanext.embeddings.solr_vector_field_name = vector_st_mpnet
# ckanext.embeddings.solr_vector_field_name = vector_openai
DB migration to be executed
(not set)
<< back to Extensions