Skip to content

Merging changes for v0.5.2 #820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

Knowledge Commons Works is a collaborative tool for storing and sharing academic research. It is part of Knowledge Commons and is built on an instance of the InvenioRDM repository system.

Version 0.5.1
Version 0.5.2

## Copyright

Expand Down
2 changes: 1 addition & 1 deletion docs/source/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

![KCWorks logo](../../static/images/kc_works_logos/SVG/kc_works_logo_wordmark.svg)

Version 0.5.1
Version 0.5.2

Knowledge Commons Works is a collaborative platform for storing and sharing academic research. It is part of [Knowledge Commons](https://about.hcommons.org/) and is built on an instance of the [InvenioRDM](https://inveniordm.docs.cern.ch/) repository system.

Expand Down
4 changes: 4 additions & 0 deletions docs/source/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@

# Changes

## 0.5.2 (2025-05-29)

- Added a new CLI command to allow admins to bulk update a single metadata field to a single new fixed value for every record in a collection.

## 0.5.1 (2025-05-13)

- Removed BETA status from the site.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
project = "Knowledge Commons Works"
copyright = "2025, Mesh Research"
author = "Mesh Research"
release = "0.5.1"
release = "0.5.2"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
30 changes: 29 additions & 1 deletion docs/source/reference/cli_commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,37 @@ KCWorks includes a number of custom CLI commands that are not part of the core I
- destroys search indices for the KCWorks instance that are *not* destroyed by the main KCWorks index destroy command. These are primarily the indices for storing usage events and aggregated usage data.
- **WARNING:** This data *only* exists in the OpenSearch indices. It is not backed up by the database and will be lost if the indices are destroyed. Use this command with extreme caution.

### `invenio kcworks-records`

- **provided by the main KCWorks package** (kcworks/site/cli.py and kcworks/services/records/cli.py)


#### `invenio kcworks-records bulk-update`

Updates a single metadata field to a single new fixed value for **every** record in a community.

Arguments:
- `community_id`: the ID (the UUID) of the collection to update.
- `metadata_field`: the field to update.
- `new_value`: the new value to set for the field.

Example:
```shell
invenio kcworks-records bulk-update 1234567890 metadata.title "New Title"
```

```{note}
Note that the `new_value` argument may be either a python literal or a plain string. Anything that cannot be parsed as a python literal will be treated as a plain string.
```

```{note}
Also note that the `community_id` argument is the ID (the UUID) of the collection, not the collection name or its url slug. If you're not sure what the collection ID is, you can find it by looking at the api response for the collection.
```


### `invenio kcworks-users`

- **provided by the main KCWorks package** (kcworks/site/cli.py)
- **provided by the main KCWorks package** (kcworks/site/cli.py and kcworks/services/users/cli.py)

#### `invenio kcworks-users name-parts`
Either reads or updates the dictionary of name parts that KCWorks will use to construct the full name of a user (e.g., first name, last name, middle name, etc.) for display in the UI and in creating record metadata.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "kcworks"
version = "0.5.1"
version = "0.5.2"
requires-python = ">=3.12"
dependencies = [
"aiohttp>=3.11.15",
Expand Down
2 changes: 1 addition & 1 deletion site/kcworks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@

"""KCWorks customizations to InvenioRDM."""

__version__ = "0.5.1"
__version__ = "0.5.2"
5 changes: 5 additions & 0 deletions site/kcworks/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import click
from flask.cli import with_appcontext
from invenio_search.cli import abort_if_false, search_version_check
from kcworks.services.records.cli import kcworks_records as records_command
from kcworks.services.search.indices import delete_index
from kcworks.services.users.cli import group_users as group_users_command
from kcworks.services.users.cli import groups as groups_command
Expand Down Expand Up @@ -106,3 +107,7 @@ def destroy_indices(force):
) as bar:
for name, _response in bar:
bar.label = name


# Register the records command group
kcworks_users.add_command(records_command)
98 changes: 98 additions & 0 deletions site/kcworks/services/records/bulk_operations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
"""Bulk operations for records."""

from pprint import pformat
from typing import Any, TypedDict

from flask import current_app
from invenio_access.permissions import system_identity
from invenio_communities.proxies import current_communities
from invenio_pidstore.errors import PIDDoesNotExistError
from invenio_rdm_records.proxies import current_rdm_records_service
from invenio_record_importer_kcworks.utils.utils import replace_value_in_nested_dict
from invenio_search.proxies import current_search_client
from kcworks.utils.utils import get_value_by_path
from opensearchpy.helpers.search import Search


class UpdateResult(TypedDict):
"""Result report after updating a record during a bulk operation."""

total_record_count: int
updated_record_count: int
failed_record_count: int
updated_records: list[dict[str, Any]]
errors: list[str]


def update_community_records_metadata(
community_id: str, metadata_field: str, new_value: Any
) -> UpdateResult:
"""Update a specific metadata field for all records in a community.

Args:
community_id (str): The ID of the community whose records should be updated
metadata_field (str): The metadata field to update (e.g. 'metadata.title')
new_value (any): The new value to set for the field

Returns:
UpdateResult: A summary of the operation including:
- total_record_count (int): Total number of records found
- updated_record_count (int): Number of records successfully updated
- failed_record_count (int): Number of records that failed to update
- updated_records (list[dict]): List of dictionaries representing
the records successfully updated, each of which with the keys:
- id (str)
- metadata_field (str)
- old_value (any)
- new_value (any)
- errors (list[str]): List of error messages for failed updates
"""
results: UpdateResult = {
"total_record_count": 0,
"updated_record_count": 0,
"failed_record_count": 0,
"updated_records": [],
"errors": [],
}

try:
current_communities.service.read(system_identity, community_id)
except PIDDoesNotExistError as e:
raise ValueError(f"Community {community_id} not found") from e

prefix = current_app.config.get("SEARCH_INDEX_PREFIX", "")
search = Search(using=current_search_client, index=f"{prefix}rdmrecords-records")
search = search.filter("term", parent__communities__ids=community_id)

# Use scan (scroll) to allow for more than 10k records
for hit in search.scan():
current_app.logger.error(f"Processing page {pformat(hit)}")
results["total_record_count"] += 1

try:
# Update the record via a draft
draft = current_rdm_records_service.edit(system_identity, hit["id"])
draft_data = draft.to_dict()
old_value = get_value_by_path(draft_data, metadata_field)
draft_data = replace_value_in_nested_dict(
draft_data, metadata_field.replace(".", "|"), new_value
)
current_rdm_records_service.update_draft(
system_identity, draft.id, draft_data
)
current_rdm_records_service.publish(system_identity, draft.id)
results["updated_record_count"] += 1
results["updated_records"].append(
{
"id": hit["id"],
"metadata_field": metadata_field,
"old_value": old_value,
"new_value": new_value,
}
)

except Exception as e:
results["failed_record_count"] += 1
results["errors"].append(f"Failed to update record {hit['id']}: {str(e)}")

return results
63 changes: 63 additions & 0 deletions site/kcworks/services/records/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""CLI commands for record operations."""

import ast
from pprint import pformat

import click
from flask.cli import with_appcontext
from kcworks.services.records.bulk_operations import update_community_records_metadata


@click.group()
def kcworks_records():
"""CLI utility command group for record operations."""
pass


@kcworks_records.command("bulk-update")
@click.argument("community_id", type=str, required=True)
@click.argument("metadata_field", type=str, required=True)
@click.argument("new_value", type=str, required=True)
@with_appcontext
def bulk_update(community_id: str, metadata_field: str, new_value: str) -> None:
"""Update a metadata field for all records in a community.

Parameters:
community_id (str): The ID of the community whose records should be updated
metadata_field (str): The metadata field to update (e.g. 'metadata.title')
new_value (str): The new value to set for the field. If it's a valid Python literal
(e.g. '"string"', '123', '["list", "of", "items"]'), it will be parsed as such.
Otherwise, it will be treated as a string.
"""
try:
# First try to parse as a Python literal
parsed_value = ast.literal_eval(new_value)
except (SyntaxError, ValueError):
# If parsing fails, use the value as-is
parsed_value = new_value

print(
f"Updating {metadata_field} to '{parsed_value}' "
f"for all records in community {community_id}"
)

try:
results = update_community_records_metadata(
community_id=community_id,
metadata_field=metadata_field,
new_value=parsed_value,
)
except ValueError as e:
print(f"Error updating records: {e}")
return

print("\nResults:")
print(f"Total records found: {results['total_record_count']}")
print(f"Successfully updated: {results['updated_record_count']}")
print(f"Failed to update: {results['failed_record_count']}")
print(f"Updated records: {pformat(results['updated_records'])}")

if results["errors"]:
print("\nErrors:")
for error in results["errors"]:
print(f"- {error}")
Loading
Loading