Skip to content

Commit d04543d

Browse files
authored
SharePoint source connector: add username/password auth, clarify ingest path format (#626)
1 parent 24398f5 commit d04543d

8 files changed

+65
-52
lines changed
Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
- `<name>` (_required_) - A unique name for this connector.
2-
- `<client-id>` (_required_) - The client ID provided by SharePoint for the app registration.
32
- `<site>` (_required_) - The base URL of the SharePoint site to connect to.
4-
- `<tenant>` (_required) - The **Directory (tenant) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
3+
- `<path>` - The path from which to start parsing files. The default is **Documents** (in the UI) or `Shared Documents/` (in the URL). To start parsing from somewhere else, specify the correct path format as described previously in this article.
4+
- For `recursive`, set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.
5+
- `<client-id>` (_required_) - The client ID provided by SharePoint for the app registration.
6+
- `<tenant>` (_required_) - The **Directory (tenant) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
57
- `<authority-url>` - The authentication token provider URL for the Entra ID app registration. The default is https://login.microsoftonline.com.
6-
- `<user-pname>` (_required_) - The UPN for the OneDrive account in the Entra ID tenant.
78
- `<client-cred>` (_required_) - The **Client secret** for the Entra ID app registration.
8-
- `<path>` - The path from which to start parsing files. The default is `Shared Documents` if not otherwise specified.
9-
- For `recursive`, set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.
9+
- `<user-pname>` (_required_ for username and password authentication) - For username and password authentication, the UPN for the OneDrive account in the Entra ID tenant.
10+
- `<password>` (_required_ for username and password authentication) - For username and password authentication, the password for the target UPN.

snippets/general-shared-text/sharepoint-cli-api.mdx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,11 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
1010

1111
The following environment variables:
1212

13-
- `ENTRA_ID_USER_PRINCIPAL_NAME` - The User Principal Name (UPN) for the target OneDrive account in the Microsoft Entra ID tenant.
1413
- `SHAREPOINT_SITE_URL` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
15-
- `SHAREPOINT_SITE_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python).
14+
- `SHAREPOINT_SITE_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python). The default is **Documents** (in the UI) or `Shared Documents/` (in the URL). To start parsing from somewhere else, specify the correct path format as described previously in this article.
1615
- `ENTRA_ID_APP_CLIENT_ID` - The **Application (client) ID** value for the Microsoft Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
1716
- `ENTRA_ID_APP_TENANT_ID` - The **Directory (tenant) ID** value for the Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
17+
- `ENTRA_ID_TOKEN_AUTHORITY_URL` - The token authority URL for the Entra ID app registration, represented by `--authority-url` (CLI) or `authority_url` (Python). The default is `https://login.microsoftonline.com`.
1818
- `ENTRA_ID_APP_CLIENT_SECRET` - The **Client secret** value for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
19-
- `ENTRA_ID_TOKEN_AUTHORITY_URL` - The token authority URL for the Entra ID app registration (which is typically `https://login.microsoftonline.com`), represented by `--authority-url` (CLI) or `authority_url` (Python).
19+
- `ENTRA_ID_USER_PRINCIPAL_NAME` - For username and password authentication, the User Principal Name (UPN) for the target OneDrive account in the Microsoft Entra ID tenant.
20+
- `ENTRA_ID_USER_PASSWORD` - For username and password authentication, the password for the target UPN.

snippets/general-shared-text/sharepoint-platform.mdx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@ Fill in the following fields:
22

33
- **Name** (_required_): A unique name for this connector.
44
- **Site URL** (_required_): The base URL of the SharePoint site to connect to.
5-
- **Path** (_required_): The path from which to start parsing files, for example `Shared Documents`.
5+
- **Path**: The path from which to start parsing files. The default is **Documents** (in the UI) or `Shared Documents/` (in the URL). To start parsing from somewhere else, specify the correct path format as described previously in this article.
66
- **Recursive**: Check this box to recursively process data from subfolders within the specified path.
77
- **Client ID** (_required_): The **Application (client) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
88
- **Tenant ID** (_required_): The **Directory (tenant) ID** for the Entra ID app registration.
9-
- **User Principal Name (UPN)** (_required_): The UPN for the OneDrive account in the Entra ID tenant.
10-
- **Client Credentials** (_required_): The **Client secret** for the Entra ID app registration.
119
- **Authority URL** (_required_): The authentication token provider URL for the Entra ID app registration. The default is `https://login.microsoftonline.com`.
10+
- **Client Credentials** (_required_): The **Client secret** for the Entra ID app registration.
11+
- **User Principal Name (UPN)** (_required_ for username and password authentication): For username and password authentication, the UPN for the OneDrive account in the Entra ID tenant.
12+
- **Password** (_required_ for username and password authentication): For username and password authentication, the password for the UPN.

snippets/general-shared-text/sharepoint.mdx

Lines changed: 30 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -30,26 +30,6 @@
3030
OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.
3131
- The SharePoint Online and OneDrive plans must share the same Microsoft Entra ID tenant.
3232
[Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).
33-
- The User Principal Name (UPN) for the OneDrive account in the Microsoft Entra ID tenant. This is typically the OneDrive account user's email address. To find a UPN:
34-
35-
1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
36-
or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
37-
2. In the **Users** section, click **Active users**.
38-
3. Locate the user account in the list of active users.
39-
4. The UPN is displayed in the **Username** column.
40-
41-
The following video shows how to get a UPN:
42-
43-
<iframe
44-
width="560"
45-
height="315"
46-
src="https://www.youtube.com/embed/H0yYfhfyCE0"
47-
title="YouTube video player"
48-
frameborder="0"
49-
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
50-
allowfullscreen
51-
></iframe>
52-
5333
- The SharePoint Online site URL.
5434

5535
- Site collection-level URLs typically have the format `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
@@ -58,7 +38,12 @@
5838

5939
[Learn more](https://learn.microsoft.com/microsoft-365/community/query-string-url-tricks-sharepoint-m365).
6040

61-
- The path in the SharePoint Online site from which to start parsing files, for example `"Shared Documents"`. If the SharePoint connector is to process all sites within the tenant, this filter will be applied to all site document libraries.
41+
- The path in the SharePoint Online site from which to start parsing files. By default, parsing starts from the site's **Documents** folder (if viewed from the site's UI) or the
42+
`Shared Documents` folder (if viewed from the site's URL).
43+
44+
To start parsing from a path than **Documents** (in the UI) or `Shared Documents/` (in the URL), specify only the part of the path that
45+
comes after that. For example, to start parsing from **Documents > my-folder > my-subfolder** (in the UI) or
46+
`Shared Documents/my-folder/my-subfolder` (in the URL), specify `my-folder/my-subfolder`.
6247

6348
The following video shows how to get the site URL and a path within the site:
6449

@@ -72,8 +57,10 @@
7257
allowfullscreen
7358
></iframe>
7459

75-
- The **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Microsoft Entra ID app registration with
76-
the correct set of Microsoft Graph access permissions. These permissions include:
60+
- Two types of authentication are supported: client credentials and a username and password. Both authentication types require a
61+
Microsoft Entra ID app registration. You will need to provide
62+
the **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Entra ID app registration, and the
63+
app registration must have the correct set of Microsoft Graph access permissions. These permissions include:
7764

7865
- `Sites.ReadWrite.All` (if both reading and writing are needed)
7966
- `User.Read.All`
@@ -106,4 +93,23 @@
10693
allowfullscreen
10794
></iframe>
10895

109-
- The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`
96+
- The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`
97+
- For username and password authentication, you must also provide the User Principal Name (UPN) and its password for the OneDrive account in the Microsoft Entra ID tenant. This UPN is typically the OneDrive account user's email address. To find a UPN:
98+
99+
1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
100+
or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
101+
2. In the **Users** section, click **Active users**.
102+
3. Locate the user account in the list of active users.
103+
4. The UPN is displayed in the **Username** column.
104+
105+
The following video shows how to get a UPN:
106+
107+
<iframe
108+
width="560"
109+
height="315"
110+
src="https://www.youtube.com/embed/H0yYfhfyCE0"
111+
title="YouTube video player"
112+
frameborder="0"
113+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
114+
allowfullscreen
115+
></iframe>

snippets/source_connectors/sharepoint.sh.mdx

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,16 @@
33

44
unstructured-ingest \
55
sharepoint \
6-
--client-cred $ENTRA_ID_APP_CLIENT_SECRET \
7-
--client-id $ENTRA_ID_APP_CLIENT_ID \
8-
--user-pname $ENTRA_ID_USER_PRINCIPAL_NAME \
9-
--tenant $ENTRA_ID_APP_TENANT_ID \
10-
--authority-url $ENTRA_ID_TOKEN_AUTHORITY_URL \
116
--site $SHAREPOINT_SITE_URL \
127
--path $SHAREPOINT_SITE_PATH \
138
--recursive \
14-
--download-dir $LOCAL_FILE_DOWNLOAD_DIR\
9+
--client-id $ENTRA_ID_APP_CLIENT_ID \
10+
--tenant $ENTRA_ID_APP_TENANT_ID \
11+
--authority-url $ENTRA_ID_TOKEN_AUTHORITY_URL \
12+
--client-cred $ENTRA_ID_APP_CLIENT_SECRET \
13+
--user-pname $ENTRA_ID_USER_PRINCIPAL_NAME \
14+
--password $ENTRA_ID_USER_PASSWORD \
15+
--download-dir $LOCAL_FILE_DOWNLOAD_DIR \
1516
--partition-by-api \
1617
--api-key $UNSTRUCTURED_API_KEY \
1718
--partition-endpoint $UNSTRUCTURED_API_URL \

snippets/source_connectors/sharepoint.v2.py.mdx

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,14 @@ if __name__ == "__main__":
3131
downloader_config=SharepointDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
3232
source_connection_config=SharepointConnectionConfig(
3333
access_config=SharepointAccessConfig(
34-
client_cred=os.getenv("ENTRA_ID_APP_CLIENT_SECRET")
34+
client_cred=os.getenv("ENTRA_ID_APP_CLIENT_SECRET"),
35+
password=os.getenv("ENTRA_ID_USER_PASSWORD"), # For username and password authentication.
3536
),
37+
site=os.getenv("SHAREPOINT_SITE_URL"),
3638
client_id=os.getenv("ENTRA_ID_APP_CLIENT_ID"),
37-
user_pname=os.getenv("ENTRA_ID_USER_PRINCIPAL_NAME"),
3839
tenant=os.getenv("ENTRA_ID_APP_TENANT_ID"),
3940
authority_url=os.getenv("ENTRA_ID_TOKEN_AUTHORITY_URL"),
40-
site=os.getenv("SHAREPOINT_SITE_URL")
41+
user_pname=os.getenv("ENTRA_ID_USER_PRINCIPAL_NAME") # For username and password authentication.
4142
),
4243
partitioner_config=PartitionerConfig(
4344
partition_by_api=True,

snippets/source_connectors/sharepoint_rest_create.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,15 @@ curl --request 'POST' --location \
99
"name": "<name>",
1010
"type": "sharepoint",
1111
"config": {
12-
"client_id": "<client-id>",
1312
"site": "<site>",
13+
"path": "<path>",
14+
"recursive": <true|false>,
15+
"client_id": "<client-id>",
1416
"tenant": "<tenant>",
1517
"authority_url": "<authority-url>",
16-
"user_pname": "<user-pname>",
1718
"client_cred": "<client-cred>",
18-
"path": "<path>",
19-
"recursive": <true|false>
19+
"user_pname": "<user-pname>", # For username and password authentication.
20+
"password": "<password>" # For username and password authentication.
2021
}
2122
}'
2223
```

snippets/source_connectors/sharepoint_sdk.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as clien
1616
name="<name>",
1717
type=SourceConnectorType.SHAREPOINT,
1818
config=SharePointSourceConnectorConfigInput(
19-
client_id="<client_id>",
2019
site="<site>",
20+
path="<path>",
21+
recursive=<True|False>,
22+
client_id="<client_id>",
2123
tenant="<tenant>",
2224
authority_url="<authority_url>",
23-
user_pname="<user_pname>",
2425
client_cred="<client_cred>",
25-
path="<path>",
26-
recursive=<True|False>
26+
user_pname="<user_pname>", # For username and password authentication.
27+
password="<password>" # For username and password authentication.
2728
)
2829
)
2930
)

0 commit comments

Comments
 (0)