Skip to content

Merging changes from main (import API; larger file sizes; misc fixes) #742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ jobs:

- name: Install prerequisites
run: |
sudo apt-get update
sudo apt-get install -y pkg-config libxml2-dev libxmlsec1-dev libxmlsec1-openssl

- name: set up docker
Expand Down
2 changes: 1 addition & 1 deletion docker/nginx_containerized/conf.d/default.conf
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ server {
uwsgi_hide_header X-Session-ID;
uwsgi_hide_header X-User-ID;
# Max upload size for files is set to 50GB (configure as needed).
client_max_body_size 50G;
client_max_body_size 500G;
}
# Static content is served directly by nginx and not the application server.
location /static {
Expand Down
2 changes: 1 addition & 1 deletion docker/nginx_production/conf.d/default.conf
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ server {
uwsgi_request_buffering off;

# Max upload size for files is set to 50GB (configure as needed).
client_max_body_size 50G;
client_max_body_size 500G;
}
# Static content is served directly by nginx and not the application server.
location /static {
Expand Down
2 changes: 1 addition & 1 deletion docs/source/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Knowledge Commons Works is a collaborative tool for storing and sharing academic research. It is part of Knowledge Commons and is built on an instance of the InvenioRDM repository system.

Version 0.3.5-beta8
Version 0.3.7-beta10

## Copyright

Expand Down
21 changes: 18 additions & 3 deletions docs/source/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,10 @@ This request must be made with a multipart/form-data request. The request body m
| `review_required` | no | `text/plain` | A string representation of a boolean (either "true" or "false") indicating whether the work should be reviewed before publication. This setting is only relevant if the work is intended for publication in a collection that requires review. It will override the collection's usual review policy, since the work is being uploaded by a collection administrator. (Default: "true") |
| `strict_validation` | no | `text/plain` | A string representation of a boolean (either "true" or "false") indicating whether the import request should be rejected if any validation errors are encountered. If this value is "false", the imported work will be created in KCWorks even if some of the provided metadata does not conform to the KCWorks metadata schema, provided these are not required fields. If this value is "true", the import request will be rejected if any validation errors are encountered. (Default: "true") |
| `all_or_none` | no | `text/plain` | A string representation of a boolean (either "true" or "false") indicating whether the entire import request should be rejected if any of the works fail to be created (whether for validation errors, upload errors, or other reasons). If this value is "false", the import request will be accepted even if some of the works cannot be created. The response in this case will include a list of works that were successfully created and a list of errors for the works that failed to be created. (Default: "true") |
| `notify_record_owners` | no | `text/plain` | A string representation of a boolean (either "true" or "false") indicating whether the owners of the work should be notified by email of the work's creation. (Default: "true") |

#### Identifying the owners of the work

The array of owners, if provided in a metadata object's `parent.access.owned_by` property, must include at least the full name and email address of the users to be added as owners of the work. If the user already has a Knowledge Commons account, their username should also be provided. Additional identifiers (e.g., ORCID) may be provided as well to help avoid duplicate accounts, since a KCWorks account will be created for each user if they do not already have one.
#### Identifying the owners of the work

The array of owners, if provided in a metadata object's `parent.access.owned_by` property, must include at least the full name and email address of the users to be added as owners of the work. If the user already has a Knowledge Commons account, their username should also be provided. Additional identifiers (e.g., ORCID) may be provided as well to help avoid duplicate accounts, since a KCWorks account will be created for each user if they do not already have one.

| key | required | type | description |
Expand Down Expand Up @@ -201,6 +199,23 @@ KCWorks will create an internal KCWorks account for each work owner who does not

If an owner does not already belong to the collection to which the records are being imported, that owner will also be added to the collection's membership with the "reader" role. The allows them access to any records restricted to the collection's membership, but does not afford them any additional permissions. What it does mean is that collection managers will be able to see all of the work owners in the list of collection members on the collection's landing page.

#### Email notifications for work owners

When a work is imported into a collection, the work owners will receive an email notification unless the `notify_record_owners` parameter is set to "false". This email will include a link to the work's landing page on KCWorks. The email subject line and the email template used for this notification are configurable on a collection-by-collection basis. Authorized organizations should discuss the desired content with the KCWorks team.

For KCWorks developers: The configuration for this email is found in the config variable `RECORD_IMPORTER_COMMUNITIES` in the KCWorks instance's `invenio.cfg` file. This is a dictionary whose keys are the collection slugs and whose values are dictionaries with the following keys:

- `email_subject_import`: The subject line for the email notification.
- `email_template_import`: The name of the Jinja2 template file to use for the email notification. These templates must be located in the `templates/security/email` directory. One template file with an `.html` extension and one with a `.txt` extension are required, with identical names apart from the extension. The name provided in the `email_template_import` key should be the filename without the `.html` or `.txt` extension.

The template will receive the following variables:

- `record`: A dictionary containing the metadata for the imported work.
- `community_page_url`: The URL of the collection's landing page on KCWorks.
- `kc_registration_link`: The URL of the Knowledge Commons registration page.
- `user`: The KCWorks User object for the user being notified.


#### Identifying the work for import

It is crucial that each work to be imported is assigned a unique identifier. This may be an identifier used internally by the importing organization, it may be a universally unique string such as a UUID, or it may be a universal identifier such as a DOI or a handle. In either case it must be unique across all works to be imported for the collection. This identifier will be used to identify the work in the response, and will be used to identify the work when checking for duplicate imports.
Expand Down
8 changes: 7 additions & 1 deletion docs/source/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@

# Changes

## 0.4.0-beta9 (2025-02-25)
## 0.3.7-beta10 (2025-03-01)

- Importer
- Added email notifications for failed imports along with a new API flag to disable them.


## 0.3.6-beta9 (2025-02-25)

- Importer
- Added a new streamlined importer API.
Expand Down
9 changes: 8 additions & 1 deletion invenio.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,8 @@ COMMONS_API_REQUEST_PROTOCOL = os.getenv(
KC_WORDPRESS_DOMAIN = os.getenv("INVENIO_KC_WORDPRESS_DOMAIN", "hcommons-dev.org")
KC_HELP_URL = f"https://support.{KC_WORDPRESS_DOMAIN}"
KC_WORKS_HELP_URL = f"{KC_HELP_URL}/kcworks/"
KC_HELP_EMAIL = "hello@hcommons.org"
KC_CONTACT_FORM_URL = f"{KC_HELP_URL}/contact-us/"
KC_FAQ_URL = f"{KC_HELP_URL}/faqs/#kcworks-faq"
KC_PROFILES_URL_BASE = f"https://{KC_WORDPRESS_DOMAIN}/members/"
KC_REGISTER_URL = f"https://{KC_WORDPRESS_DOMAIN}/membership/"
Expand Down Expand Up @@ -1642,7 +1644,12 @@ RECORD_IMPORTER_DATA_DIR = Path(
RECORD_IMPORTER_LOGS_LOCATION = Path(
os.getenv("INVENIO_LOGGING_FS_LOGFILE", "/opt/invenio/src/logs/invenio.log")
).parent

RECORD_IMPORTER_COMMUNITIES = {
"neh": {
"email_subject_register": "Your NEH Open Access Deposit is Ready",
"email_template_register": "welcome_neh",
}
}

# Invenio-Previewer
# =================
Expand Down
2 changes: 1 addition & 1 deletion site/kcworks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@

"""KCWorks customizations to InvenioRDM."""

__version__ = "0.3.5-beta8"
__version__ = "0.3.7-beta10"
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
</tr>
<tr>
<td><b>{{ _("Draft ID") }}</b></td>
<td>{{ draft_id }} (<a href='{{ config.get("SITE_UI_URL") }}/records/{{ draft_id }}'>{{ _("View draft") }}</a>)
<td>{{ draft_id }} (<a href="{{ config.get('SITE_UI_URL') }}/records/{{ draft_id }}">{{ _("View draft") }}</a>)
</td>
</tr>
{# <tr>
Expand Down
2 changes: 1 addition & 1 deletion site/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "kcworks"
version = "0.3.5-beta8"
version = "0.3.7-beta10"

[project.optional-dependencies]
tests = ["pytest-invenio>=2.1.0,<3.0.0"]
Expand Down
4 changes: 2 additions & 2 deletions site/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ eval "$(PIPENV_DOTENV_LOCATION=/Users/ianscott/Development/knowledge-commons-wor
# Note: expansion of pytest_args looks like below to not cause an unbound
# variable error when 1) "nounset" and 2) the array is empty.
if [ ${#pytest_args[@]} -eq 0 ]; then
PIPENV_DOTENV_LOCATION=/Users/ianscott/Development/knowledge-commons-works/site/tests/.env pipenv run python -m pytest -vv
PIPENV_DOTENV_LOCATION=/Users/ianscott/Development/knowledge-commons-works/site/tests/.env pipenv run python -m pytest -vv --disable-warnings
else
PIPENV_DOTENV_LOCATION=/Users/ianscott/Development/knowledge-commons-works/site/tests/.env pipenv run python -m pytest ${pytest_args[@]}
PIPENV_DOTENV_LOCATION=/Users/ianscott/Development/knowledge-commons-works/site/tests/.env pipenv run python -m pytest ${pytest_args[@]} --disable-warnings
fi
# python -m sphinx.cmd.build -qnN -b doctest docs docs/_build/doctest
tests_exit_code=$?
Expand Down
48 changes: 47 additions & 1 deletion site/tests/api/test_accounts.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import copy
from pprint import pformat
import pytest
import datetime
from flask import Flask
from invenio_accounts import current_accounts
from invenio_accounts.models import User
from invenio_record_importer_kcworks.services.users import UsersHelper
import json
from kcworks.services.accounts.saml import (
knowledgeCommons_account_info,
Expand All @@ -16,6 +16,7 @@
from requests_mock.adapter import _Matcher as Matcher
from types import SimpleNamespace
from typing import Callable, Optional

from ..fixtures.saml import idp_responses
from ..fixtures.users import user_data_set, AugmentedUserFixture

Expand Down Expand Up @@ -365,6 +366,16 @@ def test_account_register_on_login(
assert mailbox[0].subject == "Welcome to KCWorks!"
assert mailbox[0].recipients == [user_data["email"]]
assert mailbox[0].sender == app.config["MAIL_DEFAULT_SENDER"]
assert "Welcome to Knowledge Commons Works!" in mailbox[0].html
assert (
f"Welcome {user_data['email']} to Knowledge Commons Works," in mailbox[0].body
)
assert user_data["email"] in mailbox[0].html
assert user_data["email"] in mailbox[0].body
assert app.config["KC_HELP_URL"] in mailbox[0].html
assert app.config["KC_HELP_URL"] in mailbox[0].body
assert app.config["KC_CONTACT_FORM_URL"] in mailbox[0].html
assert app.config["KC_CONTACT_FORM_URL"] in mailbox[0].body

user: User = current_accounts.datastore.get_user_by_email(user_data["email"])
assert user.email == user_data["email"]
Expand Down Expand Up @@ -394,3 +405,38 @@ def test_account_register_on_login(
assert not any([r for r in user.roles if r.name not in expected_roles])

assert next_url == "https://localhost/next-url.com"


def test_create_user_via_importer(
running_app,
appctx,
db,
mailbox,
celery_worker,
search_clear: Callable,
) -> None:
"""
Test the creation of a user programmatically via the importer.

Among other things, test that the correct welcome email is sent to the user.
"""
app: Flask = running_app.app
user = UsersHelper().create_invenio_user(
user_email="test@example.com",
full_name="Test User",
community_owner=[],
orcid="0000-0002-1825-0097",
other_user_ids=[
{"identifier": "test", "scheme": "neh_user_id"},
{"identifier": "test2", "scheme": "import_user_id"},
],
)
assert user["user"] is not None
assert user["user"].email == "test@example.com"
assert user["user"].username is None
assert user["user"].user_profile.get("full_name") == "Test User"
assert user["user"].user_profile.get("identifier_kc_username") is None
assert user["user"].user_profile.get("identifier_orcid") == "0000-0002-1825-0097"
assert user["user"].user_profile.get("name_parts") is None
assert user["user"].user_profile.get("affiliations") is None
assert len(mailbox) == 0
Loading
Loading