Skip to content

API spec review: UserActivityHistory #5260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: feature/UserActivityHistoryAPI
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 243 additions & 0 deletions specs/UserActivityHistory/UserActivityHistoryAPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
UserActivityHistory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague concern about poisoning of history here to trick agents into doing bad things. there may be "nothing" here, but we should threat model it out.

The problem is that the user has no visibility into what UserActivityHistory items an app is saving, and the agent is probably dumb enough to be easily tricked by malformed items.

Basically, a low-privileged app (like a UWP) adds a UserActivityHistory item that claims to be something interesting (include a display string with juicy keywords). It also includes a URI that is malicious (note "malicious" might not mean it actively harms the user directly; it might be malicious in the sense that it furthers phishing attempts or something). Now when the user asks Copilot a query, Copilot finds the (fake) UserActivityHistory item and invokes it on behalf of the user, which ends up somewhere "bad."

The malicious app cant't pull this off directly itself, because either launching the bad URI either (1) is blocked by UWP security or (2) would look out-of-place when called directly by the app. But by having it open out of context, is it bad?

(Like I said, kind of a vague concern that may not be unique to agents or to this feature or whatever... just I worry about bad actors poisoning the inputs the CUA reasons over.)

===

# Background

The UserActivity class can be used to note down and preserve a record of activities that the user
is currently doing on their computer - e.g., browsing a website, reading a Word document, etc.

To record user activity, you use UserActivityChannel to retrieve a UserActivity object via the
API GetOrCreateUserActivityAsync. If a UserActivity with the given ID already exists, it will be
returned; otherwise, a new UserActivity object will be created and returned. You can then call
the API GetSession to return a UserActivitySession object that tracks how long the user is engaged
in that activity. This structure allows multiple sessions to be associated with the same activity,
representing the case where the user completes that activity a bit at a time - e.g., beginning to
watch a movie, then pausing, then watching more later. These will be treated as the same singular
user activity that spans multiple sessions.

UserActivityHistory is a new set of APIs that allow you to query the past 28 days of the user's
activity history, which will enable you to bring back content that the user has previously been
interacting with.

# API Pages

## UserActivityHistory class

This class provides static methods that enable you to query the user's activity history.
This activity history is stored in a database managed by a local service, and these APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we talk about implementation? If the implementation changes, will we break something?

call out to that service to retrieve data from the database.

Here is an example usage of the class that will enable you to bring back the webpage for a
Korean recipe that the user had previously interacted with within the last day:

```c#
UserActivityHistoryQuery query = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first time we're seeing code, can we show the API to request access, too? Are you just relying on the AppCapability class? Although I think the UX for consent is more dynamic, so it is probably part of the API call itself. Big open question.

query.Keywords = new string[] { "Korean", "recipe" };
query.LatestStartTime = DateTime.Now.AddDays(-1);

IList<UserActivityHistoryItem> results = await UserActivityHistory.SearchAsync(
new UserActivityHistoryQuery[] { query },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an overload? Seems strange to have to create an array for a single search item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMEND: Add an overload for 1 query

UserActivityHistoryOrderBy.DwellTime,
maxResults: 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name the parameter?


UserActivityHistoryItem item = results.FirstOrDefault();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMEND: Remove blank line

if (item != null)
{
// Now we can use item.ActivationUri to bring back the webpage in the state in which the user
// was last viewing it.
}
```

## UserActivityHistory.Search method

This method synchronously queries the user's activity history and returns a list of items matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention "synchronously" everywhere? It should be assumed that unless the API ends with "Async" that it is synchronous

the criteria specified in the `queries` parameter. The results are ordered in descending order
according by the `orderBy` parameter: either by the most recent start times, the most recent
end times, or the longest time spent on the activity.

Each parameter in `queries` is ORed together in the resulting database query, whereas the contents
of a single object in the `queries` array are ANDed together. For example, if you provided two
queries, each of which contained the keywords "tax", one of which had a content type of
"application/pdf" and the other had a content type of "image/*", the resulting database query
would be something along these lines:

```sql
SELECT * FROM UserActivityHistory WHERE
(CONTAINS (DisplayText, '"tax"')) AND
((CONTAINS (ContentType, '"application/pdf"')) OR
(CONTAINS (ContentType, '"image/*"')));
```

## UserActivityHistory.SearchAsync method

This method is an asynchronous version of the `Search` method.

## UserActivityHistory.GetAppsWithUserActivity method

This method synchronously retrieves a list of all the app names with data in the user's activity
history database. You can use this, for example, to show the user the list of apps that are being
queried against, so the user can understand why an app that is not recording user activity is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...why an app not recording isn't showing up in the list of apps that recorded...

Typo? Should be "...what an app that is not recording user activity is not showing up..." ?

showing up in the results.

## UserActivityHistory.GetAppsWithUserActivityAsync method

This method is an asynchronous version of the `GetAppsWithUserActivity` method.

## UserActivityHistoryItem class

This class represents a single item in the user's activity history. It contains properties that
describe in what app the activity occurred, what the nature of the activity was, the URI of the
resource involved in the activity (e.g., a document, a webpage, a video, etc.), the URI that
can be used to bring back the state the user left the activity in, and the times when the user
started the activity and ended the activity.

If the user performed the same activity multiple times, there will be multiple
`UserActivityHistoryItem` objects returned, each with different start and end times.

## UserActivityHistoryItem.AppName property

This property contains the name of the app in which the activity occurred.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of name is this? a PFN? an AUMID? display name? exe path?
Is this something Windows infers from the caller, or the app provided this when recording the UserActivity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current implementation contains the exe path, but it would be even better if there were a way to get the display name. I can't immediately find one. We have to infer this from the caller; the UserActivity object does not have this property anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use CallerIdentity or similar (e.g. CoGetCallContext) to get an AUMID. When an app receives this AUMID they can find the display name of that AUMID for display purposes. We could choose to store the display name too, because the app might get uninstalled sometime after capturing the user activity and before querying it (what happens with that app's user activity history, does it get deleted?)

In any case let's take an action item to update the wording here once we have a solid caller id implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrinkle: conversion from PFN to Display Name should happen in which context? Ideally, it is in the CUA's context so it is localized to match the CUA. But Start Menu might show a different localization so you might not be able to find it. I don't know what the right answer is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current implementation contains the exe path

Is this an unpackaged app or packaged?

If packaged you'll want to record the package full name. Given that you can lookup its DisplayName (and Logo) localized for the current user to view.

Is this historical? Does UserActivityHistory retain information recorded by apps after they're uninstalled? If so then you can't guarantee looking up the DisplayName. If so there are options but they have caveats so I'll wait to hear if relevant before saying more.

but it would be even better if there were a way to get the display name

p = packageManager.FindPackageForUser("", pkgfullname)
string displayName = p.DisplayName

returns the package's DisplayName localized for the calling user. Are there cases where the package isn't registered for the calling user?


## UserActivityHistoryItem.ActivityId property

This property contains the ID of the activity, which can be used to collate multiple sessions
of the same activity. For example, if the user watched a movie in multiple sessions, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is watching a video an actual scenario supported by any apps we know that report Activity History? Is it the most interesting one?

I would expect a more obvious one would be opening the same Word document 5 times in a week, and them all being related somehow. Or visiting the same website (like your e-mail) every day. And so on.

`ActivityId` property can be used to identify how long in total the user spent watching that movie.

## UserActivityHistoryItem.DisplayText property

This property contains a string that is how the app chose to describe the activity. For example,
if the activity was reading the contents of a webpage, this property might contain the webpage's title.

## UserActivityHistoryItem.ContentType property

This property contains the MIME type of the content being interacted with. For example, if the user
was looking at a PNG image, this property would contain the string "image/png".

## UserActivityHistoryItem.ContentUri property

This property contains the URI of the content being interacted with. For example, if the user was
looking at a webpage, this property would contain the URI of that webpage.

## UserActivityHistoryItem.ActivationUri property

This property contains the URI that can be used to bring back the state the user left the activity in.
For example, if the user was looking at a webpage, this property would contain the URI of that webpage
with additional information such as what page the user was on, what their scroll position was, etc.

## UserActivityHistoryItem.StartTime property

This property contains the time when the user started the activity session.

## UserActivityHistoryItem.EndTime property

This property contains the time when the user ended the activity session.

## UserActivityHistoryQuery class

This class is used to specify criteria for what portion of the user's activity history you want to
retrieve. It allows you to specify keywords to search for, content types to filter by, and time ranges
to filter by.

## UserActivityHistoryQuery.Keywords property

This property is an array of keywords, each of which is used to lexically search against the
DisplayText column in the database. Keywords are case-insensitive, and results returned will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an ordinal search, or a search based on a specific locale? If it's locale-sensitive, hopefully it uses the locale of the caller.

be those that contain all of the keywords in the array.

## UserActivityHistoryQuery.ContentType property

This property is a string that specifies the content type associated with the activity you want
to retrieve. It allows the inclusion of an asterisk as a wildcard - e.g., "image/*" will match
all content types beginning with "image/", such as "image/png", "image/jpeg", etc.
This property is case-insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make it clear you can leave it null / empty-string to match any content.


## UserActivityHistoryQuery.EarliestStartTime property

This is a nullable DateTime property that specifies the earliest start time of the activity
you want to retrieve. Any activities with a StartTime property earlier than this will be excluded.
If this property is left as null, it will be ignored.

## UserActivityHistoryQuery.EarliestEndTime property

This is a nullable DateTime property that specifies the earliest end time of the activity
you want to retrieve. Any activities with an EndTime property earlier than this will be excluded.
If this property is left as null, it will be ignored.

## UserActivityHistoryQuery.LatestStartTime property

This is a nullable DateTime property that specifies the latest start time of the activity
you want to retrieve. Any activities with a StartTime property later than this will be excluded.
If this property is left as null, it will be ignored.

## UserActivityHistoryQuery.LatestEndTime property

This is a nullable DateTime property that specifies the latest end time of the activity
you want to retrieve. Any activities with an EndTime property later than this will be excluded.
If this property is left as null, it will be ignored.

## UserActivityHistoryOrderBy enum

This enum specifies what property the results should be ordered by. The options are as follows:

| Name | Description |
|-|-|
| StartTime | Results will be in descending order of their StartTime property |
| EndTime | Results will be in descending order of their EndTime property |
| DwellTime | Results will be in descending order of the difference between their EndTime and StartTime properties |

# API Details

```c# (but really MIDL3)
namespace Microsoft.Windows.ApplicationModel.UserActivities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we put this somewhere else? It's not a general-purpose API that anyone can use. It's specific to AI scenarios and will be VERY locked down as to who can call it. Do we have a top-level "User context stuff useful for AI" namespace? Do we need one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beat me to it.

Microsoft.Windows.AI.UserActivities seems more apt

{
runtimeclass UserActivityHistory
{
static IVector<UserActivityHistoryItem> Search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an overload for a single query?

UserActivityHistoryQuery[] queries,
UserActivityHistoryOrderBy orderBy,
UInt32 maxResults);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a material benefit to passing maxResults (e.g. perf? or making the user feel better if we include this as part of the consent prompt?).


static IAsyncOperation<IVector<UserActivityHistoryItem> > SearchAsync(
UserActivityHistoryQuery[] queries,
UserActivityHistoryOrderBy orderBy,
UInt32 maxResults);

static IVector<String> GetAppsWithUserActivity();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a list of ProductIds rather than just strings? What data do we have from unpackaged apps recording activities (like Office)?


static IAsyncOperation<IVector<String>> GetAppsWithUserActivityAsync();
}

runtimeclass UserActivityHistoryItem
{
String AppName { get; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is AppName?

If a packaged app is this the app's AUMID (programmatic id) or DisplayName (localized string for human consumption)?

Does this API support unpackaged apps?

String ActivityId { get; };
String DisplayText { get; };
String ContentType { get; };
String ContentUri { get; };
String ActivationUri { get; };
DateTime StartTime { get; };
DateTime EndTime { get; };
}

runtimeclass UserActivityHistoryQuery
{
UserActivityHistoryQuery();

String[] Keywords;
String ContentType;
IReference<DateTime> EarliestStartTime;
IReference<DateTime> EarliestEndTime;
IReference<DateTime> LatestStartTime;
IReference<DateTime> LatestEndTime;
}

enum UserActivityHistoryOrderBy
{
StartTime,
EndTime,
DwellTime
};
}
```