Skip to content

Conversation

@calumcalder
Copy link
Collaborator

@calumcalder calumcalder commented Nov 14, 2024

To support importing data via DTP, developers are required to implement a Java extension module exposing a TransferExtension and Importer, with the Importer interfacing with a new or existing API also supported by the developer. Creating a Java extension creates a barrier for some developers who don't have the context or bandwidth to implement and maintain such an extension.
As a way to make it easier to integrate with DTP this PR implements a GenericTransferExtension and GenericImporter, which can be reused by developers without committing or maintaining Java code.

Generic Importers can be configured to call a HTTP endpoint created and maintained by the developer, allowing them to use the programming language and frameworks of their choice, as long as their API conforms to the OAuth Authorization Code Flow, and the HTTP JSON API laid out in the included README.

While this PR contains a functional implementation for the BLOBS, CALENDAR, MEDIA, and SOCIAL-POSTS verticals, I anticipate the implementation and API will be iterated on as developers begin testing and integration.

TODO

  • Auth
  • Configuration
    • Rework TransferExtension loading
  • Data-types
    • Blobs
    • Calendar
    • Media
    • Social Posts
  • Documentation
    • Service endpoints
    • Service schema
    • OAuth flow
    • Configuration
  • Tests
    • Serialization
    • OAuth
    • Importer

In preparation for creating a generic transfer extension, set the
`@type` field when serializing `SocialActivity*` objects.
Adds a generic `Importer` for SOCIAL_POSTS. The importer passes the
social data straight through, and is built to be extensible to other
data types.
Extends the `GenericImporter` from the previous commit to support data
with both JSON metadata and file data. As an example of usage, provide
an implementation for `BLOBS` data.
@calumcalder calumcalder force-pushed the feat/generic-exporters branch from dfd07cd to 312862b Compare November 15, 2024 12:23
@calumcalder calumcalder force-pushed the feat/generic-exporters branch from 7afade1 to 40e18b1 Compare November 15, 2024 15:23
Extracts the logic for serializing BLOBS data to a separate module.
Adds tests for the serializer.
Adds a generic wrapper around the payload, which will wrap all other
types after they undergo the same refactoring.
Extracts the SOCIAL_POSTS generic importer serialization logic to a
separate module.
Also adds tests, and wraps the payload in GenericPayload, like with
BLOBS that was done in the previous commit.
Extracts the CALENDAR generic importer serialization logic to a separate
module.
Also adds tests, and wraps the payload in GenericPayload, like with
other data types in previous commits.
Extracts the MEDIA generic importer serialization logic to a separate
module.
Also adds tests, and wraps the payload in GenericPayload, like with
other data types in previous commits.
Moves `CachedDownloadableItem` to the `BlobbySerializer` file, since
that's the only place it's used.
In previous commits the type information of exported data was being lost
at serialization time. This change delays converting to JSON
representation until the time when data is sent on the wire, allowing
type information to be preserved as long as possible and to avoid
double-serialization of the data.

The exported data have a union of possible types (the MediaSerializer,
for example, generates a list of MediaAlbums, VideoModels, and
PhotoModels), but unfortunately Java doesn't support proper union types.
A solution to this is to create an empty `ExportData` interface for each
serializer and have the possible subtypes 'implement' the empty
interface, then using the `ExportData` interface as a type bound.

This makes Media and Calendar serializers more verbose as we're having
to duplicate the underlying models, but it gives us a layer of
separation between the model and the serialized data, allowing them to
evolve independently, and guaranteeing that changes to the model will
require a corresponding change in the serializer, reducing the chance of
the serializer interface being broken.
Since we've added a layer between generic importer serializers and the
underlying models to better preserve type information, make use of this
by using a custom schema for media items. This lets us be
a) use consistent schemas for Photos and Videos
b) use TZ-aware date-times (although we have to assume the underlying
   Date is UTC when copying the model)
Simplifies the `@type` annotations of exported data. Might be worth
doing this for nested types too, although some of them are directly
using Models defined by DTP.
@calumcalder calumcalder force-pushed the feat/generic-exporters branch from da4861a to 23bec4a Compare November 28, 2024 10:43
Expands the `TransferServiceConfig` object to allow arbitraty data in a
`serviceConfig` field, to be later parsed by `TransferExtension`s at
initialization time.
Extends `GenericTransferExtension` to read from `serviceConfig` to
configure itself for the import service being targeted.

Also allows `TransferExtension`s to declare support for services
through a `supportsService` method, turning Service->Extension mappings
from one-to-one to many-to-one, meaning `GenericTransferExtension` can
be configured to support multiple services.

For now the configuration is fairly barebones; a list of supported
verticals (wrapped in an object to allow future support for
vertical-level configuration), the base of the API endpoint, and the
service name.
Tidies up the MediaSerializer class, and reverts some now-unnecessary
changes to the core DTP models.
Improves the serialization of Blobby data.
Refactors GenericImporter to make GenericFileImporter simpler, and to
remove some duplication in the two classes.
There was a manual test being used during development that was replaced
by unit tests.
File data had inherited some structure from when it was based on
`BlobbyStorageContainerResource`. Since refactoring to a standalone
schema, there's no need to have this structure, so simplify the schema
by flattening it.
@calumcalder calumcalder force-pushed the feat/generic-exporters branch from adb64ab to 76c37d7 Compare December 6, 2024 16:45
@calumcalder calumcalder marked this pull request as ready for review December 6, 2024 16:58
This makes the naming more consistent with e.g. MediaSerializer
lisad
lisad previously approved these changes Dec 11, 2024
Copy link
Member

@lisad lisad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it. The documentation is already very readable to an outsider.

Based on review feedback, expand on the documentation.
- Include a high-level overview of job lifecyle
- Add details on data rates
- Clarify the ordering behaviour
- Add reference to OAuth RFC
- 201 -> 20x for success codes
- Remove implementation details for OAuth flow
  It's documented more thoroughly elsewhere and we should encourage use
  of a framework or third-party authorization server.
@calumcalder
Copy link
Collaborator Author

Thanks for the reviews!

@calumcalder calumcalder requested review from lisad and sparuvu December 13, 2024 16:53
@calumcalder calumcalder merged commit 60c992b into dtinit:master Jan 6, 2025
5 checks passed
@calumcalder calumcalder deleted the feat/generic-exporters branch January 6, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants