Releases: IQSS/dataverse
v5.13
Dataverse Software 5.13
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Schema.org Improvements (Some Backward Incompatibility)
The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.
Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)
Folder Uploads via Web UI (dvwebloader, S3 only)
For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)
Long Descriptions of Collections (Dataverses) are Now Truncated
Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)
License Sorting
Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)
Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search
Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)
Support for NetCDF and HDF5 Files
NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.
For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.
An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.
See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)
Support for .eln Files (Electronic Laboratory Notebooks)
The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...
Improved Security for External Tools
External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)
Geospatial Search (API Only)
Geospatial search is supported via the Search API using two new parameters: geo_point
and geo_radius
.
The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)
Reproducibility and Code Execution with Binder
Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)
CodeMeta (Software) Metadata Support (Experimental)
Experimental support for research software metadata deposits has been added.
By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.
There is more work underway to make Dataverse installations around the world "research software ready."
Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)
Mechanism Added for Stopping a Harvest in Progress
It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)
API Endpoint Listing Metadata Block Details has been Extended
The API endpoint /api/metadatablocks/{block_id}
has been extended to include the following fields:
controlledVocabularyValues
- All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.isControlledVocabulary
: Whether or not this field has a controlled vocabulary.multiple
: Whether or not the field supports multiple values.
See Metadata Blocks in the API Guide for details. (PR #9213)
Advanced Database Settings
You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require
, though installations already setting this parameter in the Postgres connection string will need to move it to dataverse.db.parameters
. See the new Database Persistence section of the Installation Guide for details. (PR #8915)
Support for Cleaning up Leftover Files in Dataset Storage
Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.
OAI Server Bug Fixed
A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release not already mentioned above include:
- Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
- To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
- Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
- A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
- Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.
New JVM Options and MicroProfile Config Options
The following JVM option is now available:
dataverse.personOrOrg.assumeCommaInPersonName
- the default is false
The following MicroProfile Config options are now available (these can be treated as JVM options):
dataverse.files.uploads
- alternative storage location of generated temporary files for UI file uploadsdataverse.api.signing-secret
- used by signed URLsdataverse.solr.host
dataverse.solr.port
dataverse.solr.protocol
dataverse.solr.core
dataverse.solr.path
dataverse.rserve.host
The following existing JVM options are now available via MicroProfile Config:
dataverse.siteUrl
dataverse.fqdn
dataverse.files.directory
dataverse.rserve.host
dataverse.rserve.port
dataverse.rserve.user
dataverse.rserve.password
dataverse.rserve.tempdir
Notes for Developers and Integrato...
v5.12.1
Dataverse Software 5.12.1
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Bug Fix for "Internal Server Error" When Creating a New Remote Account
Unfortunately, as of 5.11 new remote users have seen "Internal Server Error" when creating an account (or checking notifications just after creating an account). Remote users are those who log in with institutional (Shibboleth), OAuth (ORCID, GitHub, or Google) or OIDC providers.
This is a transient error that can be worked around by reloading the browser (or logging out and back in again) but it's obviously a very poor user experience and a bad first impression. This bug is the primary reason we are putting out this patch release. Other features and bug fixes are coming along for the ride.
Ability to Disable OAuth Sign Up While Allowing Existing Accounts to Log In
A new option called :AllowRemoteAuthSignUp
has been added providing a mechanism for disabling new account signups for specific OAuth2 authentication providers (Orcid, GitHub, Google etc.) while still allowing logins for already-existing accounts using this authentication method.
See the Installation Guide for more information on the setting.
Production Date Now Used for Harvested Datasets in Addition to Distribution Date (oai_dc
format)
Fix the year displayed in citation for harvested dataset, especially for oai_dc
format.
For normal datasets, the date used is the "citation date" which is by default the publication date (the first release date) unless you change it.
However, for a harvested dataset, the distribution date was used instead and this date is not always present in the harvested metadata.
Now, the production date is used for harvested dataset in addition to distribution date when harvesting with the oai_dc
format.
Publication Date Now Used for Harvested Dataset if Production Date is Not Set (oai_dc
format)
For exports and harvesting in oai_dc
format, if "Production Date" is not set, "Publication Date" is now used instead. This change is reflected in the Dataverse 4+ Metadata Crosswalk linked from the Appendix of the User Guide.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Users creating an account by logging in with Shibboleth, OAuth, or OIDC should not see errors. (Issue 9029, PR #9030)
- When harvesting datasets, I want the Production Date if I can't get the Distribution Date (PR #8732)
- When harvesting datasets, I want the Publication Date if I can't get the Production Date (PR #8733)
- As a sysadmin I'd like to disable (temporarily or permanently) sign ups from OAuth providers while allowing existing users to continue to log in from that provider (PR #9112)
- As a C/C++ developer I want to use Dataverse APIs (PR #9070)
New DB Settings
The following DB settings have been added:
:AllowRemoteAuthSignUp
See the Database Settings section of the Guides for more information.
Complete List of Changes
For the complete list of code changes in this release, see the 5.12.1 Milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.12.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5
if you are using a csh
-like shell)
1. Undeploy the previous version
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
6. Start Payara
service payara start
7. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.12.1.war
8. Restart payara
service payara stop
service payara start
Upcoming Versions of Payara
With the recent release of Payara 6 (Payara 6.2022.1 being the first version), the days of free-to-use Payara 5.x Platform Community versions are numbered. Specifically, Payara's blog post says, "Payara Platform Community 5.2022.4 has been released today as the penultimate Payara 5 Community release."
Given the end of free-to-use Payara 5 versions, we plan to get the Dataverse software working on Payara 6 (#8305), which will require substantial efforts from the IQSS team and community members, as this also means shifting our app to be a Jakarta EE 10 application (upgrading from EE 8). We are currently working out the details and will share news as soon as we can. Rest assured we will do our best to provide you with a smooth transition. You can follow along in Issue #8305 and related pull requests and you are, of course, very welcome to participate by testing and otherwise contributing, as always.
v5.12
Dataverse Software 5.12
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Support for Globus
Globus can be used to transfer large files. Part of "Harvard Data Commons Additions" below.
Support for Remote File Storage
Dataset files can be stored at remote URLs. Part of "Harvard Data Commons Additions" below.
New Computational Workflow Metadata Block
The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows.
To add the new metadata block, follow the instructions in the Admin Guide: https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html
The location of the new metadata block tsv file is scripts/api/data/metadatablocks/computational_workflow.tsv
. Part of "Harvard Data Commons Additions" below.
Support for Linked Data Notifications (LDN)
Linked Data Notifications (LDN) is a standard from the W3C. Part of "Harvard Data Commons Additions" below.
Harvard Data Commons Additions
As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include:
- Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores.
- Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store.
- Initial support for computational workflows, including a new metadata block and detected filetypes.
- Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack).
- Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version.
- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs.
- Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status.
- Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually.
- Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset.
- A new capability to provide custom per field instructions in dataset templates
- The following file extensions are now detected:
- wdl=text/x-workflow-description-language
- cwl=text/x-computational-workflow-language
- nf=text/x-nextflow
- Rmd=text/x-r-notebook
- rb=text/x-ruby-script
- dag=text/x-dagman
Improvements to Fields that Appear in the Citation Metadata Block
Grammar, style and consistency improvements have been made to the titles, tooltip description text, and watermarks of metadata fields that appear in the Citation metadata block.
This includes fields that dataset depositors can edit in the Citation Metadata accordion (i.e. fields controlled by the citation.tsv and citation.properties files) and fields whose values are system-generated, such as the Dataset Persistent ID, Previous Dataset Persistent ID, and Publication Date fields whose titles and tooltips are configured in the bundles.properties file.
The changes should provide clearer information to curators, depositors, and people looking for data about what the fields are for.
A new page in the Style Guides called "Text" has also been added. The new page includes a section called "Metadata Text Guidelines" with a link to a Google Doc where the guidelines are being maintained for now since we expect them to be revised frequently.
New Static Search Facet: Metadata Types
A new static search facet has been added to the search side panel. This new facet is called "Metadata Types" and is driven from metadata blocks. When a metadata field value is inserted into a dataset, an entry for the metadata block it belongs to is added to this new facet.
This new facet needs to be configured for it to appear on the search side panel. The configuration assigns to a dataverse what metadata blocks to show. The configuration is inherited by child dataverses.
To configure the new facet, use the Metadata Block Facet API: https://guides.dataverse.org/en/5.12/api/native-api.html#set-metadata-block-facet-for-a-dataverse-collection
Broader MicroProfile Config Support for Developers
As of this release, many JVM options
can be set using any MicroProfile Config Source.
Currently this change is only relevant to developers but as settings are migrated to the new "lookup" pattern documented in the Consuming Configuration section of the Developer Guide, anyone installing the Dataverse software will have much greater flexibility when configuring those settings, especially within containers. These changes will be announced in future releases.
Please note that an upgrade to Payara 5.2021.8 or higher is required to make use of this. Payara 5.2021.5 threw exceptions, as explained in PR #8823.
HTTP Range Requests: New HTTP Status Codes and Headers for Datafile Access API
The Basic File Access resource for datafiles (/api/access/datafile/$id) was slightly modified in order to comply better with the HTTP specification for range requests.
If the request contains a "Range" header:
- The returned HTTP status is now 206 (Partial Content) instead of 200
- A "Content-Range" header is returned containing information about the returned bytes
- An "Accept-Ranges" header with value "bytes" is returned
CORS rules/headers were modified accordingly:
- The "Range" header is added to "Access-Control-Allow-Headers"
- The "Content-Range" and "Accept-Ranges" header are added to "Access-Control-Expose-Headers"
This new functionality has enabled a Zip Previewer and file extractor for zip files, an external tool.
File Type Detection When File Has No Extension
File types are now detected based on the filename when the file has no extension.
The following filenames are now detected:
- Makefile=text/x-makefile
- Snakemake=text/x-snakemake
- Dockerfile=application/x-docker-file
- Vagrantfile=application/x-vagrant-file
These are defined in MimeTypeDetectionByFileName.properties
.
Upgrade to Payara 5.2022.3 Highly Recommended
With lots of bug and security fixes included, we encourage everyone to upgrade to Payara 5.2022.3 as soon as possible. See below for details.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891)
- Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325)
- Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751)
- Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770)
- Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901)
- Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610)
- Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696)
- Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775)
- Collection managers can create templates that include custom instructions on how to fill out specific metadata fields.
- Dataset update API users are given more information when the dataset they are updating is out of compliance with Terms of Access requirements (Issue #8859)
- Adds...
v5.11.1
Dataverse Software 5.11.1
This is a bug fix release of the Dataverse Software. The .war file for v5.11 will no longer be made available and installations should upgrade directly from v5.10.1 to v5.11.1. To do so you will need to follow the instructions for installing release 5.11 using the v5.11.1 war file. (Note specifically the upgrade steps 6-9 from the 5.11 release note; most importantly, the ones related to the citation block and the Solr schema). If you had previously installed v5.11 (no longer available), follow the simplified instructions below.
Release Highlights
Dataverse Software 5.11 contains two critical issues that are fixed in this release.
First, if you delete a file from a published version of a dataset, the file will be deleted from the file system (or S3) and lose its "owner id" in the database. For details, see Issue #8867.
Second, if you are a superuser, it's possible to click "Delete Draft" and delete a published dataset if it has restricted files. For details, see #8845 and #8742.
Notes for Dataverse Installation Administrators
Identifying Datasets with Deleted Files
If you have been running 5.11, check if any files show "null" for the owner id. The "owner" of a file is the parent dataset:
select * from dvobject where dtype = 'DataFile' and owner_id is null;
For any of these files, change the owner id to the database id of the parent dataset. In addition, the file on disk (or in S3) is likely gone. Look at the "storageidentifier" field from the query above to determine the location of the file then restore the file from backup.
Identifying Datasets Superusers May Have Accidentally Destroyed
Check the "actionlogrecord" table for DestroyDatasetCommand. While these "destroy" entries are normal when a superuser uses the API to destroy datasets, an entry is also created if a superuser has accidentally deleted a published dataset in the web interface with the "Delete Draft" button.
Complete List of Changes
For the complete list of code changes in this release, see the 5.11.1 Milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.1. To upgrade from 5.10.1, follow the instructions for installing release 5.11 using the v5.11.1 war file. If you had previously installed v5.11 (no longer available), follow the simplified instructions below.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5
. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5
if you are using a csh
-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.11.1.war
5. Restart Payara
service payara stop
service payara start
v5.11
Dataverse Software 5.11
Please note: We have removed the 5.11 war file and dvinstall.zip because there are very serious bugs in the 5.11 release. For the upgrade instructions below, please use the 5.11.1 war file instead. New installations should start with 5.11.1. The bugs are explained in the 5.11.1 release notes.
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Terms of Access or Request Access Required for Restricted Files
Beginning in this release, datasets with restricted files must have either Terms of Access or Request Access enabled. This change is to ensure that for each file in a Dataverse installation there is a clear path to get to the data, either through requesting access to the data or to provide context about why requesting access is not enabled.
Published datasets are not affected by this change. Datasets that are in draft and that have neither Terms of Access nor Request Access enabled must be updated to select one or the other (or both). Otherwise, datasets cannot be futher edited or published. Dataset authors will be able to tell if their dataset is affected by the presence of the following message at the top of their dataset (when they are logged in):
"Datasets with restricted files are required to have Request Access enabled or Terms of Access to help people access the data. Please edit the dataset to confirm Request Access or provide Terms of Access to be in compliance with the policy."
At this point, authors should click "Edit Dataset" then "Terms" and then check the box for "Request Access" or fill in "Terms of Access for Restricted Files" (or both). Afterwards, authors will be able to further edit metadata and publish.
In the "Notes for Dataverse Installation Administrators" section, we have provided a query to help proactively identify datasets that need to be updated.
See also Issue #8191 and PR #8308.
Muting Notifications
Users can control which notifications they receive if the system is configured to allow this. See also Issue #7492 and PR #8530.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Terms of Access or Request Access required for restricted files. (Issue #8191, PR #8308)
- Users can control which notifications they receive if the system is configured to allow this. (Issue #7492, PR #8530)
- A 500 error was occuring when creating a dataset if a template did not have an associated "termsofuseandaccess". See "Legacy Templates Issue" below for details. (Issue #8599, PR #8789)
- Tabular ingest can be skipped via API. (Issue #8525, PR #8532)
- The "Verify Email" button has been changed to "Send Verification Email" and rather than sometimes showing a popup now always sends a fresh verification email (and invalidates previous verification emails). (Issue #8227, PR #8579)
- For Shibboleth users, the
emailconfirmed
timestamp is now set on login and the UI should show "Verified". (Issue #5663, PR #8579) - Information about the license selection (or custom terms) is now available in the confirmation popup when contributors click "Submit for Review". Previously, this was only available in the confirmation popup for the "Publish" button, which contributors do not see. (Issue #8561, PR #8691)
- For installations configured to support multiple languages, controlled vocabulary fields that do not allow multiple entries (e.g. journalArticleType) are now indexed properly. (Issue #8595, PR #8601, PR #8624)
- Two-letter ISO-639-1 codes for languages are now supported, in metadata imports and harvesting. (Issue #8139, PR #8689)
- The API endpoint for listing notifications has been enhanced to show the subject, text, and timestamp of notifications. (Issue #8487, PR #8530)
- The API Guide has been updated to explain that the
Content-type
header is now (as of Dataverse 5.6) necessary to create datasets via native API. (Issue #8663, PR #8676) - Admin API endpoints have been added to find and delete dataset templates. (Issue 8600, PR #8706)
- The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse data files, validating checksums along the way. See the BagIt File Handler section of the Installation Guide for details. (Issue #8608, PR #8677)
- For BagIt Export, the number of threads used when zipping data files into an archival bag is now configurable using the
:BagGeneratorThreads
database setting. (Issue #8602, PR #8606) - PostgreSQL 14 can now be used (though we've tested mostly with 13). PostgreSQL 10+ is required. (Issue #8295, PR #8296)
- As always, widgets can be embedded in the
<iframe>
HTML tag, but the HTTP header "Content-Security-Policy" is now being sent on non-widget pages to prevent them from being embedded. (PR #8662) - URIs in the the experimental Semantic API have changed (details below). (Issue #8533, PR #8592)
- Installations running Make Data Count can upgrade to Counter Processor-0.1.04. (Issue #8380, PR #8391)
- PrimeFaces, the UI framework we use, has been upgraded from 10 to 11. (Issue #8456, PR #8652)
Notes for Dataverse Installation Administrators
Identifying Datasets Requiring Terms of Access or Request Access Changes
In support of the change to require either Terms of Access or Request Access for all restricted files (see above for details), we have provided a query to identify datasets in your installation where at least one restricted file has neither Terms of Access nor Request Access enabled:
This will allow you to reach out to those dataset owners as appropriate.
Legacy Templates Issue
When custom license functionality was added, dataverses that had older legacy templates as their default template would not allow the creation of a new dataset (500 error).
This occurred because those legacy templates did not have an associated termsofuseandaccess linked to them.
In this release, we run a script that creates a default empty termsofuseandaccess for each of these templates and links them.
Note the termsofuseandaccess that are created this way default to using the license with id=1 (cc0) and the fileaccessrequest to false.
See also Issue #8599 and PR #8789.
PostgreSQL Version 10+ Required
This release upgrades the bundled PostgreSQL JDBC driver to support major version 14.
Note that the newer PostgreSQL driver required a Flyway version bump, which entails positive and negative consequences:
- The newer version of Flyway supports PostgreSQL 14 and includes a number of security fixes.
- As of version 8.0 the Flyway Community Edition dropped support for PostgreSQL 9.6 and older.
This means that as foreshadowed in the 5.10 and 5.10.1 release notes, version 10 or higher of PostgreSQL is now required. For suggested upgrade steps, please see "PostgreSQL Update" in the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10
Counter Processor 0.1.04 Support
This release includes support for counter-processor-0.1.04 for processing Make Data Count metrics. If you are running Make Data Counts support, you should reinstall/reconfigure counter-processor as described in the latest Guides. (For existing installations, note that counter-processor-0.1.04 requires a newer version of Python so you will need to follow the full counter-processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse.)
New JVM Options and DB Settings
The following DB settings have been added:
:ShowMuteOptions
:AlwaysMuted
:NeverMuted
:CreateDataFilesMaxErrorsToDisplay
:BagItHandlerEnabled
:BagValidatorJobPoolSize
:BagValidatorMaxErrors
:BagValidatorJobWaitInterval
:BagGeneratorThreads
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
See the "Backward Incompatibilities" section below.
Backward Incompatibilities
Semantic API Changes
This release includes an update to the experimental semantic API and the underlying assignment of URIs to metadata block terms that are not explicitly mapped to terms in community vocabularies. The change affects the output of the OAI_ORE metadata export, the OAI_ORE file in archival bags, and the input/output allowed for those terms in the semantic API.
For those updating integrating code or existing files intended for input into this release of Dataverse, URIs of the form...
https://dataverse.org/schema/<block name>/<parentField name>#<childField title>
and
https://dataverse.org/schema/<block name>/<Field title>
...are both replaced with URIs of the form...
https://dataverse.org/schema/<block name>/<Field name>
.
Create Dataset API Requires Content-type Header (Since 5.6)
Due to a code change introduced in Dataverse 5.6, calls to the native API without the Content-type
header will fail to create a dataset. The API Guide has been updated to indicate the necessity of this header: https://guides.dataverse.org/en/5.11/api/native-api.html#create-a-dataset-in-a-dataverse-collection
Complete List of Changes
...
v5.10.1
Dataverse Software 5.10.1
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Bug Fix for Request Access
Dataverse Software 5.10 contains a bug where the "Request Access" button doesn't work from the file listing on the dataset page if the dataset contains custom terms. This has been fixed in PR #8555.
Bug Fix for Searching and Selecting Controlled Vocabulary Values
Dataverse Software 5.10 contains a bug where the search option is no longer present when selecting from more than ten controlled vocabulary values. This has been fixed in PR #8521.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Users can use the "Request Access" button when the dataset has custom terms. (Issue #8553, PR #8555)
- Users can search when selecting from more than ten controlled vocabulary values. (Issue #8519, PR #8521)
- The default file categories ("Documentation", "Data", and "Code") can be redefined through the
:FileCategories
database setting. (Issue #8461, PR #8478) - Documentation on troubleshooting Excel ingest errors was improved. (PR #8541)
- Internationalized controlled vocabulary values can now be searched. (Issue #8286, PR #8435)
- Curation labels can be internationalized. (Issue #8381, PR #8466)
- "NONE" is no longer accepted as a license using the SWORD API (since 5.10). See "Backward Incompatibilities" below for details. (Issue #8551, PR #8558).
Notes for Dataverse Installation Administrators
PostgreSQL Version 10+ Required Soon
Because 5.10.1 is a bug fix release, an upgrade to PostgreSQL is not required. However, this upgrade is still coming in the next non-bug fix release. For details, please see the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10
Payara Upgrade
You may notice that the Payara version used in the install scripts has been updated from 5.2021.5 to 5.2021.6. This was to address a bug where it was not possible to easily update the logging level. For existing installations, this release does not require upgrading Payara and a Payara upgrade is not part of the Upgrade Instructions below. For more information, see PR #8508.
New JVM Options and DB Settings
The following DB settings have been added:
:FileCategories
- The default list of the pre-defined file categories ("Documentation", "Data" and "Code") can now be redefined with a comma-separated list (e.g.'Docs,Data,Code,Workflow'
).
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the SWORD API.
Backward Incompatibilities
As of Dataverse 5.10, "NONE" is no longer supported as a valid license when creating a dataset using the SWORD API. The API Guide has been updated to reflect this. Additionally, if you specify an invalid license, a list of available licenses will be returned in the response.
Complete List of Changes
For the complete list of code changes in this release, see the 5.10.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5
. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5
if you are using a csh
-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.10.1.war
5. Restart payara
service payara stop
service payara start
v5.10
Dataverse Software 5.10
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Multiple License Support
Users can now select from a set of configured licenses in addition to or instead of the previous Creative Commons CC0 choice or provide custom terms of use (if configured) for their datasets. Administrators can configure their Dataverse instance via API to allow any desired license as a choice and can enable or disable the option to allow custom terms. Administrators can also mark licenses as "inactive" to disallow future use while keeping that license for existing datasets. For upgrades, only the CC0 license will be preinstalled. New installations will have both CC0 and CC BY preinstalled. The Configuring Licenses section of the Installation Guide shows how to add or remove licenses.
Note: Datasets in existing installations will automatically be updated to conform to new requirements that custom terms cannot be used with a standard license and that custom terms cannot be empty. Administrators may wish to manually update datasets with these conditions if they do not like the automated migration choices. See the "Notes for Dataverse Installation Administrators" section below for details.
This release also makes the license selection and/or custom terms more prominent when publishing and viewing a dataset and when downloading files.
Ingest and File Upload Messaging Improvements
Messaging around ingest failure has been softened to prevent support tickets. In addition, messaging during file upload has been improved, especially with regard to showing size limits and providing links to the guides about tabular ingest. For screenshots and additional details see PR #8271.
Downloading of Guestbook Responses with Fewer Clicks
A download button has been added to the page that lists guestbooks. This saves a click but you can still download responses from the "View Responses" page, as before.
Also, links to the guides about guestbooks have been added in additional places.
Dynamically Request Arbitrary Metadata Fields from Search API
The Search API now allows arbitrary metadata fields to be requested when displaying results from datasets. You can request all fields from metadata blocks or pick and choose certain fields.
The new parameter is called metadata_fields
and the Search API documentation contains details and examples: https://guides.dataverse.org/en/5.10/api/search.html
Solr 8 Upgrade
The Dataverse Software now runs on Solr 8.11.1, the latest available stable release in the Solr 8.x series.
PostgreSQL Upgrade
A PostgreSQL upgrade is not required for this release but is planned for the next release. See below for details.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- When creating or updating datasets, users can select from a set of licenses configured by the administrator (CC, CC BY, custom licenses, etc.) or provide custom terms (if the installation is configured to allow them). (Issue #7440, PR #7920)
- Users can get better feedback on tabular ingest errors and more information about size limits when uploading files. (Issue #8205, PR #8271)
- Users can more easily download guestbook responses and learn how guestbooks work. (Issue #8244, PR #8402)
- Search API users can specify additional metadata fields to be returned in the search results. (Issue #7863, PR #7942)
- The "Preview" tab on the file page can now show restricted files. (Issue #8258, PR #8265)
- Users wanting to upload files from GitHub to Dataverse can learn about a new GitHub Action called "Dataverse Uploader". (PR #8416)
- Users requesting access to files now get feedback that it was successful. (Issue #7469, PR #8341)
- Users may notice various accessibility improvements. (Issue #8321, PR #8322)
- Users of the Social Science metadata block can now add multiples of the "Collection Mode" field. (Issue #8452, PR #8473)
- Guestbooks now support multi-line text area fields. (Issue #8288, PR #8291)
- Guestbooks can better handle commas in responses. (Issue #8193, PR #8343)
- Dataset editors can now deselect a guestbook. (Issue #2257, PR #8403)
- Administrators with a large
actionlogrecord
table can read docs on archiving and then trimming it. (Issue #5916, PR #8292) - Administrators can list locks across all datasets. (PR #8445)
- Administrators can run a version of Solr that doesn't include a version of log4j2 with serious known vulnerabilities. We trust that you have patched the version of Solr you are running now following the instructions that were sent out. An upgrade to the latest version is recommended for extra peace of mind. (PR #8415)
- Administrators can run a version of Dataverse that doesn't include a version of log4j with known vulnerabilities. (PR #8377)
Notes for Dataverse Installation Administrators
Updating for Multiple License Support
Adding and Removing Licenses and How Existing Datasets Will Be Automatically Updated
As part of installing or upgrading an existing installation, administrators may wish to add additional license choices and/or configure Dataverse to allow custom terms. Adding additional licenses is managed via API, as explained in the Configuring Licenses section of the Installation Guide. Licenses are described via a JSON structure providing a name, URL, short description, and optional icon URL. Additionally licenses may be marked as active (selectable for new or updated datasets) or inactive (only allowed on existing datasets) and one license can be marked as the default. Custom Terms are allowed by default (backward compatible with the current option to select "No" to using CC0) and can be disabled by setting :AllowCustomTermsOfUse
to false.
Further, administrators should review the following automated migration of existing licenses and terms into the new license framework and, if desired, should manually find and update any datasets for which the automated update is problematic.
To understand the migration process, it is useful to understand how the multiple license feature works in this release:
"Custom Terms", aka a custom license, are defined through entries in the following fields of the dataset "Terms" tab:
- Terms of Use
- Confidentiality Declaration
- Special Permissions
- Restrictions
- Citation Requirements
- Depositor Requirements
- Conditions
- Disclaimer
"Custom Terms" require, at a minimum, a non-blank entry in the "Terms of Use" field. Entries in other fields are optional.
Since these fields are intended for terms/conditions that would potentially conflict with or modify the terms in a standard license, they are no longer shown when a standard license is selected.
In earlier Dataverse releases, it was possible to select the CC0 license and have entries in the fields above. It was also possible to say "No" to using CC0 and leave all of these terms fields blank.
The automated process will update existing datasets as follows.
- "CC0 Waiver" and no entries in the fields above -> CC0 License (no change)
- No CC0 Waiver and an entry in the "Terms of Use" field and possibly others fields listed above -> "Custom Terms" with the same entries in these fields (no change)
- CC0 Waiver and an entry in some of the fields listed -> 'Custom Terms' with the following text preprended in the "Terms of Use" field: "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions:"
- No CC0 Waiver and an entry in a field(s) other than the "Terms of Use" field -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available with limited information on how it can be used. You may wish to communicate with the Contact(s) specified before use."
- No CC0 Waiver and no entry in any of the listed fields -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use."
Administrators who have datasets where CC0 has been selected along with additional terms, or datasets where the Terms of Use field is empty, may wish to modify those datasets prior to upgrading to avoid the automated changes above. This is discussed next.
Handling Datasets that No Longer Comply With Licensing Rules
In most Dataverse installations, one would expect the vast majority of datasets to either use the CC0 Waiver or have non-empty Terms of Use. As noted above, these will be migrated without any issue. Administrators may however wish to find and manually update datasets that specified a CC0 license but also had terms (no longer allowed) or had no license and no terms of use (also no longer allowed) rather than accept the default migrations for these datasets listed above.
Finding and Modifying Datasets with a CC0 License and Non-Empty Terms
To find datasets with a CC0 license and non-empty terms:
select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and...
v5.9
Dataverse Software 5.9
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Dataverse Collection Page Optimizations
The Dataverse Collection page, which also serves as the search page and the homepage in most Dataverse installations, has been optimized, with a specific focus on reducing the number of queries for each page load. These optimizations will be more noticable on Dataverse installations with higher traffic.
Support for HTTP "Range" Header for Partial File Downloads
Dataverse now supports the HTTP "Range" header, which allows users to download parts of a file. Here are some examples:
bytes=0-9
gets the first 10 bytes.bytes=10-19
gets 10 bytes from the middle.bytes=-10
gets the last 10 bytes.bytes=9-
gets all bytes except the first 10.
Only a single range is supported. For more information, see the Data Access API section of the API Guide.
Support for Optional External Metadata Validation Scripts
The Dataverse software now allows an installation administrator to provide custom scripts for additional metadata validation when datasets are being published and/or when Dataverse collections are being published or modified. The Harvard Dataverse Repository has been using this mechanism to combat content that violates our Terms of Use, specifically spam content. All the validation or verification logic is defined in these external scripts, thus making it possible for an installation to add checks custom-tailored to their needs.
Please note that only the metadata are subject to these validation checks. This does not check the content of any uploaded files.
For more information, see the Database Settings section of the Guide. The new settings are listed below, in the "New JVM Options and DB Settings" section of these release notes.
Displaying Author's Identifier as Link
In the dataset page's metadata tab the author's identifier is now displayed as a clickable link, which points to the profile page in the external service (ORCID, VIAF etc.) in cases where the identifier scheme provides a resolvable landing page. If the identifier does not match the expected scheme, a link is not shown.
Auxiliary File API Enhancements
This release includes updates to the Auxiliary File API. These updates include:
- Auxiliary files can now also be associated with non-tabular files
- Auxiliary files can now be deleted
- Duplicate Auxiliary files can no longer be created
- A new API has been added to list Auxiliary files by their origin
- Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned
- Improved error reporting
- A bugfix involving checksums for Auxiliary files
Please note that the Auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.
Major Use Cases and Infrastructure Enhancements
Newly-supported major use cases in this release include:
- The Dataverse collection page has been optimized, resulting in quicker load times on one of the most common pages in the application (Issue #7804, PR #8143)
- Users will now be able to specify a certain byte range in their downloads via API, allowing for downloads of file parts. (Issue #6397, PR #8087)
- A Dataverse installation administrator can now set up metadata validation for datasets and Dataverse collections, allowing for publish-time and create-time checks for all content. (Issue #8155, PR #8245)
- Users will be provided with clickable links to authors' ORCIDs and other IDs in the dataset metadata (Issue #7978, PR #7979)
- Users will now be able to associate Auxiliary files with non-tabular files (Issue #8235, PR #8237)
- Users will no longer be able to create duplicate Auxiliary files (Issue #8235, PR #8237)
- Users will be able to delete Auxiliary files (Issue #8235, PR #8237)
- Users can retrieve a list of Auxiliary files based on their origin (Issue #8235, PR #8237)
- Users will be able to supply the content type of Auxiliary files on upload (Issue #8241, PR #8282)
- The indexing process has been updated so that datasets with fewer files and indexed first, resulting in fewer failures and making it easier to identify problematically-large datasets. (Issue #8097, PR #8152)
- Users will no longer be able to create metadata records with problematic special characters, which would later require Dataverse installation administrator intervention and a database change (Issue #8018, PR #8242)
- The Dataverse software will now appropriately recognize files with the .geojson extension as GeoJSON files rather than "unknown" (Issue #8261, PR #8262)
- A Dataverse installation administrator can now retrieve more information about role deletion from the ActionLogRecord (Issue #2912, PR #8211)
- Users will be able to use a new role to allow a user to respond to file download requests without also giving them the power to manage the dataset (Issue #8109, PR #8174)
- Users will no longer be forced to update their passwords when moving from Dataverse 3.x to Dataverse 4.x (PR #7916)
- Improved accessibility of buttons on the Dataset and File pages (Issue #8247, PR #8257)
Notes for Dataverse Installation Administrators
Indexing Performance on Datasets with Large Numbers of Files
We discovered that whenever a full reindexing needs to be performed, datasets with large numbers of files take an exceptionally long time to index. For example, in the Harvard Dataverse Repository, it takes several hours for a dataset that has 25,000 files. In situations where the Solr index needs to be erased and rebuilt from scratch (such as a Solr version upgrade, or a corrupt index, etc.) this can significantly delay the repopulation of the search catalog.
We are still investigating the reasons behind this performance issue. For now, even though some improvements have been made, a dataset with thousands of files is still going to take a long time to index. In this release, we've made a simple change to the reindexing process, to index any such datasets at the very end of the batch, after all the datasets with fewer files have been reindexed. This does not improve the overall reindexing time, but will repopulate the bulk of the search index much faster for the users of the installation.
Custom Analytics Code Changes
You should update your custom analytics code to capture a bug fix related to tracking within the dataset files table. This release restores that tracking.
For more information, see the documentation and sample analytics code snippet provided in Installation Guide. This update can be used on any version 5.4+.
New ManageFilePermissions Permission
Dataverse can now support a use case in which a Admin or Curator would like to delegate the ability to grant access to restricted files to other users. This can be implemented by creating a custom role (e.g. DownloadApprover) that has the new ManageFilePermissions permission. This release introduces the new permission, and a Flyway script adjusts the existing Admin and Curator roles so they continue to have the ability to grant file download requrests.
Thumbnail Defaults
New default values have been added for the JVM settings dataverse.dataAccess.thumbnail.image.limit
and dataverse.dataAccess.thumbnail.pdf.limit
, of 3MB and 1MB respectively. This means that, unless specified otherwise by the JVM settings already in your domain configuration, the application will skip attempting to generate thumbnails for image files and PDFs that are above these size limits.
In previous versions, if these limits were not explicitly set, the application would try to create thumbnails for files of unlimited size. Which would occasionally cause problems with very large images.
New JVM Options and DB Settings
The following DB settings allow configuration of the external metadata validator:
- :DataverseMetadataValidatorScript
- :DataverseMetadataPublishValidationFailureMsg
- :DataverseMetadataUpdateValidationFailureMsg
- :DatasetMetadataValidatorScript
- :DatasetMetadataValidationFailureMsg
- :ExternalValidationAdminOverride
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
Two sections of the Developer Guide have been updated:
- Instructions on how to sync a PR in progress with develop have been added in the version control section
- Guidance on avoiding ineffeciencies in JSF render logic has been added to the "Tips" section
Complete List of Changes
For the complete list of code changes in this release, see the 5.9 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
...
v5.8
Dataverse Software 5.8
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Support for Data Embargoes
The Dataverse Software now supports file-level embargoes. The ability to set embargoes, up to a maximum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Embargoes section of the Dataverse Software Guides.
-
Users can configure a specific embargo, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Embargo' menu item and entering information in a popup dialog. Embargoes can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.
-
While embargoed, files cannot be previewed or downloaded (as if restricted, with no option to allow access requests). After the embargo expires, files become accessible. If the files were also restricted, they remain inaccessible and functionality is the same as for any restricted file.
-
By default, the citation date reported for the dataset and the datafiles in version 1.0 reflects the longest embargo period on any file in version 1.0, which is consistent with recommended practice from DataCite. Administrators can still specify an alternate date field to be used in the citation date via the Set Citation Date Field Type for a Dataset API Call.
The work to add this functionality was initiated by Data Archiving and Networked Services (DANS-KNAW), the Netherlands. It was further developed by the Global Dataverse Community Consortium (GDCC) in cooperation with and with funding from DANS.
Major Use Cases and Infrastructure Enhancements
Newly-supported major use cases in this release include:
- Users can set file-level embargoes. (Issue #7743, #4052, #343, PR #8020)
- Improved accessibility of form labels on the advanced search page (Issue #8169, PR #8170)
Notes for Dataverse Installation Administrators
Mitigate Solr Schema Management Problems
With Release 5.5, the <copyField>
definitions had been reincluded into schema.xml
to fix searching for datasets.
This release includes a final update to schema.xml
and a new script update-fields.sh
to manage your custom metadata fields, and to provide opportunities for other future improvements. The broken script updateSchemaMDB.sh
has been removed.
You will need to replace your schema.xml with the one provided in order to make sure that the new script can function. If you do not use any custom metadata blocks in your installation, this is the only change to be made. If you do use custom metadata blocks you will need to take a few extra steps, enumerated in the step-by-step instructions below.
New JVM Options and DB Settings
- :MaxEmbargoDurationInMonths controls whether embargoes are allowed in a Dataverse instance and can limit the maximum duration users are allowed to specify. A value of 0 months or non-existent setting indicates embargoes are not supported. A value of -1 allows embargoes of any length.
Complete List of Changes
For the complete list of code changes in this release, see the 5.8 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.8.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5
. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5
if you are using a csh
-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.8.war
5. Restart payara
service payara stop
service payara start
6. Update Solr schema.xml.
/usr/local/solr/solr-8.8.1/server/solr/collection1/conf
is used in the examples below as the location of your Solr schema. Please adapt it to the correct location, if different in your installation. Use find / -name schema.xml
if in doubt.
6a. Replace schema.xml
with the base version included in this release.
wget https://github.com/IQSS/dataverse/releases/download/v5.8/schema.xml
cp schema.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
For installations that are not using any Custom Metadata Blocks, you can skip the next step.
6b. For installations with Custom Metadata Blocks
Use the script provided in the release to add the custom fields to the base schema.xml
installed in the previous step.
wget https://github.com/IQSS/dataverse/releases/download/v5.8/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml
(Note that the curl command above calls the admin api on localhost
to obtain the list of the custom fields. In the unlikely case that you are running the main Dataverse Application and Solr on different servers, generate the schema.xml
on the application node, then copy it onto the Solr server)
7. Restart Solr
Usually service solr stop; service solr start
, but may be different on your system. See the Installation Guide for more details.
v5.7
Dataverse Software 5.7
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Experimental Support for External Vocabulary Services
Dataverse can now be configured to associate specific metadata fields with third-party vocabulary services to provide an easy way for users to select values from those vocabularies. The mapping involves use of external Javascripts. Two such scripts have been developed so far: one for vocabularies served via the SKOSMOS protocol and one allowing people to be identified via their ORCID. The guides contain info about the new :CVocConf setting used for configuration and additional information about this functionality. Scripts, examples, and additional documentation are available at the GDCC GitHub Repository.
Please watch the online presentation, read the document with requirements and join the Dataverse Working Group on Ontologies and Controlled Vocabularies if you have some questions and want to contribute.
This functionality was initially developed by Data Archiving and Networked Services (DANS-KNAW), the Netherlands, and funded by SSHOC, "Social Sciences and Humanities Open Cloud". SSHOC has received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782. It was further improved by the Global Dataverse Community Consortium (GDCC) and extended with the support of semantic search.
Curation Status Labels
A new :AllowedCurationLabels setting allows a sysadmins to define one or more sets of labels that can be applied to a draft Dataset version via the user interface or API to indicate the status of the dataset with respect to a defined curation process.
Labels are completely customizable (alphanumeric or spaces, up to 32 characters, e.g. "Author contacted", "Privacy Review", "Awaiting paper publication"). Superusers can select a specific set of labels, or disable this functionality per collection. Anyone who can publish a draft dataset (e.g. curators) can set/change/remove labels (from the set specified for the collection containing the dataset) via the user interface or via an API. The API also would allow external tools to search for, read and set labels on Datasets, providing an integration mechanism. Labels are visible on the Dataset page and in Dataverse collection listings/search results. Internally, the labels have no effect, and at publication, any existing label will be removed. A reporting API call allows admins to get a list of datasets and their curation statuses.
The Solr schema must be updated as part of installing the release of Dataverse containing this feature for it to work.
Major Use Cases
Newly-supported major use cases in this release include:
- Administrators will be able to set up integrations with external vocabulary services, allowing for autocomplete-assisted metadata entry, metadata standardization, and better integration with other systems (Issue #7711, PR #7946)
- Users viewing datasets in the root Dataverse collection will now see breadcrumbs that have have a link back to the root Dataverse collection (Issue #7527, PR #8078)
- Users will be able to more easily differentiate between datasets and files through new iconography (Issue #7991, PR #8021)
- Users retrieving large guestbooks over the API will experience fewer failures (Issue #8073, PR #8084)
- Dataverse collection administrators can specify which language will be used when entering metadata for new Datasets in a collection, based on a list of languages specified by the Dataverse installation administrator (Issue #7388, PR #7958)
- Users will see the language used for metadata entry indicated at the document or element level in metadata exports (Issue #7388, PR #7958)
- Administrators will now be able to specify the language(s) of controlled vocabulary entries, in addition to the installation's default language (Issue #6751, PR #7959)
- Administrators and curators can now receive notifications when a dataset is created (Issue #8069, PR #8070)
- Administrators with large files in their installation can disable the automatic checksum verification process at publish time (Issue #8043, PR #8074)
Notes for Dataverse Installation Administrators
Dataset Creation Notifications for Administrators
A new :SendNotificationOnDatasetCreation setting has been added. When true, administrators and curators (those who can publish the dataset) will get a notification when a new dataset is created. This makes it easier to track activity in a Dataverse and, for example, allow admins to follow up when users do not publish a new dataset within some period of time.
Skip Checksum Validation at Publish Based on Size
When a user requests to publish a dataset, the time taken to complete the publishing process varies based on the dataset/datafile size.
With the additional settings of :DatasetChecksumValidationSizeLimit and :DataFileChecksumValidationSizeLimit, the checksum validation can be skipped while publishing.
If the Dataverse administrator chooses to set these values, it's strongly recommended to have an external auditing system run periodically in order to monitor the integrity of the files in the Dataverse installation.
Guestbook Response API Update
With this release the Retrieve Guestbook Responses for a Dataverse Collection API will no longer produce a file by default. You may specify an output file by adding a -o $YOURFILENAME to the curl command.
Dynamic JavaServer Faces Configuration Options
This release includes a new way to easily change JSF settings via MicroProfile Config, especially useful during development.
See the development guide on "Debugging" for more information.
Enhancements to DDI Metadata Exports
Several changes have been made to the DDI exports to improve support for internationalization and to improve compliance with CESSDA requirements. These changes include:
- Addition of xml:lang attributes specifying the dataset metadata language at the document level and for individual elements such as title and description
- Specification of controlled vocabulary terms in duplicate elements in multiple languages (in the installation default langauge and, if different, the dataset metadata language)
While these changes are intended to improve harvesting and integration with external systems, they could break existing connections that make assumptions about the elements and attributes that have been changed.
New JVM Options and DB Settings
- :SendNotificationOnDatasetCreation - A boolean setting that, if true will send an email and notification to additional users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset.
- :DatasetChecksumValidationSizeLimit - disables the checksum validation while publishing for any dataset size greater than the limit.
- :DataFileChecksumValidationSizeLimit - Disables the checksum validation while publishing for any datafiles greater than the limit.
- :CVocConf - A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service.
- :MetadataLanguages - Sets which languages can be used when entering dataset metadata.
- :AllowedCurationLabels - A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.
Notes for Tool Developers and Integrators
Bags Now Support File Paths
The original Bag generation code stored all dataset files directly under the /data directory. With the addition in Dataverse of a directory path for files and then a change to allow files with different paths to have the same name, archival Bags will now use the directory path from Dataverse to avoid name collisions within the /data directory. Prior to this update, Bags from Datasets with multiple files with the same name would have been created with only one of the files with that name (with warnings in the log, but still generating the Bag).
Complete List of Changes
For the complete list of code changes in this release, see the 5.7 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.7.
If yo...