Skip to content

Commit 7f1ed21

Browse files
Updates Deid tests (#42029)
* Adding text encoding doc, fixing test file upload path * directory separator * updating samples * property name --------- Co-authored-by: Josiah Vinson <jovinson@microsoft.com>
1 parent 3b2723b commit 7f1ed21

16 files changed

+62
-53
lines changed

sdk/healthdataaiservices/azure-health-deidentification/README.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,10 @@ Here's an example of setting an environment variable in Bash using Azure CLI:
4848

4949
```bash
5050
# Get the service URL for the resource
51-
export AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT=$(az deidservice show --name "<resource-name>" --resource-group "<resource-group-name>" --query "properties.serviceUrl")
51+
export HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT=$(az deidservice show --name "<resource-name>" --resource-group "<resource-group-name>" --query "properties.serviceUrl")
5252
```
5353

54-
Optionally, save the service URL as an environment variable named `AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT` for the sample client initialization code.
54+
Optionally, save the service URL as an environment variable named `HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT` for the sample client initialization code.
5555

5656
Create a client with the endpoint and credential:
5757
<!-- SNIPPET: examples.create_client -->
@@ -62,7 +62,7 @@ from azure.identity import DefaultAzureCredential
6262
import os
6363

6464

65-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
65+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
6666
credential = DefaultAzureCredential()
6767
client = DeidentificationClient(endpoint, credential)
6868
```
@@ -77,6 +77,12 @@ Given an input text, the de-identification service can perform three main operat
7777
- `Redact` returns output text where detected PHI entities are replaced with placeholder text. For example `John` replaced with `[name]`.
7878
- `Surrogate` returns output text where detected PHI entities are replaced with realistic replacement values. For example, `My name is John Smith` could become `My name is Tom Jones`.
7979

80+
### String Encoding
81+
When using the `Tag` operation, the service will return the locations of PHI entities in the input text. These locations will be represented as offsets and lengths, each of which is a [StringIndex][string_index] containing
82+
three properties corresponding to three different text encodings. **Python applications should use the `code_point` property.**
83+
84+
For more on text encoding, see [Character encoding in .NET][character_encoding].
85+
8086
### Available endpoints
8187
There are two ways to interact with the de-identification service. You can send text directly, or you can create jobs
8288
to de-identify documents in Azure Storage.
@@ -94,7 +100,7 @@ from azure.health.deidentification.models import (
94100
from azure.identity import DefaultAzureCredential
95101
import os
96102

97-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
103+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
98104
credential = DefaultAzureCredential()
99105
client = DeidentificationClient(endpoint, credential)
100106

@@ -136,7 +142,7 @@ Your target Azure Storage account and container where documents will be written
136142

137143
Set the following environment variables, updating the storage account and container with real values:
138144
```bash
139-
export AZURE_STORAGE_ACCOUNT_LOCATION="https://<storageaccount>.blob.core.windows.net/<container>"
145+
export HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION="https://<storageaccount>.blob.core.windows.net/<container>"
140146
export INPUT_PREFIX="example_patient_1"
141147
export OUTPUT_PREFIX="_output"
142148
```
@@ -156,9 +162,9 @@ from azure.identity import DefaultAzureCredential
156162
import os
157163
import uuid
158164

159-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
160-
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"]
161-
inputPrefix = os.environ["INPUT_PREFIX"]
165+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
166+
storage_location = os.environ["HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION"]
167+
inputPrefix = os.environ.get("INPUT_PREFIX", "example_patient_1")
162168
outputPrefix = os.environ.get("OUTPUT_PREFIX", "_output")
163169

164170
credential = DefaultAzureCredential()
@@ -208,7 +214,7 @@ from azure.health.deidentification.models import (
208214
from azure.identity import DefaultAzureCredential
209215
import os
210216

211-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
217+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
212218
credential = DefaultAzureCredential()
213219
client = DeidentificationClient(endpoint, credential)
214220

@@ -244,7 +250,7 @@ from azure.health.deidentification.models import (
244250
from azure.identity import DefaultAzureCredential
245251
import os
246252

247-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
253+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
248254
credential = DefaultAzureCredential()
249255
client = DeidentificationClient(endpoint, credential)
250256

@@ -272,7 +278,7 @@ from azure.health.deidentification.models import (
272278
from azure.identity import DefaultAzureCredential
273279
import os
274280

275-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
281+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
276282
credential = DefaultAzureCredential()
277283
client = DeidentificationClient(endpoint, credential)
278284

@@ -349,6 +355,8 @@ additional questions or comments.
349355
[pip]: https://pypi.org/project/pip/
350356
[azure_sub]: https://azure.microsoft.com/free/
351357
[deid_quickstart]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/quickstart
358+
[string_index]: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/healthdataaiservices/azure-health-deidentification/azure/health/deidentification/models/_models.py#L548
359+
[character_encoding]: https://learn.microsoft.com/dotnet/standard/base-types/character-encoding-introduction
352360
[deid_redact]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/redaction-format
353361
[deid_rbac]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/manage-access-rbac
354362
[deid_managed_identity]: https://learn.microsoft.com/azure/healthcare-apis/deidentification/managed-identities

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/deidentify_documents_async.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
python deidentify_documents_async.py
1616
1717
Set the environment variables with your own values before running the sample:
18-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
19-
2) AZURE_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
18+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
19+
2) HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
2020
3) INPUT_PREFIX - the prefix of the input document name(s) in the container.
2121
For example, providing "folder1" would create a job that would process documents like "https://<storageaccount>.blob.core.windows.net/<container>/folder1/document1.txt".
2222
"""
@@ -36,9 +36,9 @@
3636

3737

3838
async def deidentify_documents_async():
39-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
40-
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"]
41-
inputPrefix = os.environ["INPUT_PREFIX"]
39+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
40+
storage_location = os.environ["HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION"]
41+
inputPrefix = os.environ.get("INPUT_PREFIX", "example_patient_1")
4242
outputPrefix = "_output"
4343

4444
credential = DefaultAzureCredential()

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/deidentify_text_redact_async.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
python deidentify_text_redact_async.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1818
"""
1919

2020

@@ -30,7 +30,7 @@
3030

3131

3232
async def deidentify_text_redact_async():
33-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
33+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
3434
credential = DefaultAzureCredential()
3535
client = DeidentificationClient(endpoint, credential)
3636

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/deidentify_text_surrogate_async.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
python deidentify_text_surrogate_async.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1818
"""
1919

2020

@@ -30,7 +30,7 @@
3030

3131

3232
async def deidentify_text_surrogate_async():
33-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
33+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
3434
credential = DefaultAzureCredential()
3535
client = DeidentificationClient(endpoint, credential)
3636

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/deidentify_text_tag_async.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
python deidentify_text_tag_async.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1818
"""
1919

2020

@@ -30,7 +30,7 @@
3030

3131

3232
async def deidentify_text_tag_async():
33-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
33+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
3434
credential = DefaultAzureCredential()
3535
client = DeidentificationClient(endpoint, credential)
3636

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/list_job_documents_async.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
python list_job_documents_async.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
18-
2) AZURE_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
18+
2) HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
1919
3) INPUT_PREFIX - the prefix of the input document name(s) in the container.
2020
For example, providing "folder1" would create a job that would process documents like "https://<storageaccount>.blob.core.windows.net/<container>/folder1/document1.txt".
2121
"""
@@ -35,9 +35,9 @@
3535

3636

3737
async def list_job_documents_async():
38-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
39-
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"]
40-
inputPrefix = os.environ["INPUT_PREFIX"]
38+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
39+
storage_location = os.environ["HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION"]
40+
inputPrefix = os.environ.get("INPUT_PREFIX", "example_patient_1")
4141
outputPrefix = "_output"
4242

4343
credential = DefaultAzureCredential()

sdk/healthdataaiservices/azure-health-deidentification/samples/async_samples/list_jobs_async.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
python list_jobs_async.py
1414
1515
Set the environment variables with your own values before running the sample:
16-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
16+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1717
"""
1818

1919

@@ -24,7 +24,7 @@
2424

2525

2626
async def list_jobs_async():
27-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
27+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
2828
credential = DefaultAzureCredential()
2929
client = DeidentificationClient(endpoint, credential)
3030

sdk/healthdataaiservices/azure-health-deidentification/samples/deidentify_documents.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
python deidentify_documents.py
1616
1717
Set the environment variables with your own values before running the sample:
18-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
19-
2) AZURE_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
18+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
19+
2) HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION - an Azure Storage container endpoint, like "https://<storageaccount>.blob.core.windows.net/<container>".
2020
3) INPUT_PREFIX - the prefix of the input document name(s) in the container.
2121
For example, providing "folder1" would create a job that would process documents like "https://<storageaccount>.blob.core.windows.net/<container>/folder1/document1.txt".
2222
4) OUTPUT_PREFIX - the prefix of the output document name(s) in the container. This will appear as a folder which will be created if it does not exist, and defaults to "_output" if not provided.
@@ -37,9 +37,9 @@ def deidentify_documents():
3737
import os
3838
import uuid
3939

40-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
41-
storage_location = os.environ["AZURE_STORAGE_ACCOUNT_LOCATION"]
42-
inputPrefix = os.environ["INPUT_PREFIX"]
40+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
41+
storage_location = os.environ["HEALTHDATAAISERVICES_STORAGE_ACCOUNT_LOCATION"]
42+
inputPrefix = os.environ.get("INPUT_PREFIX", "example_patient_1")
4343
outputPrefix = os.environ.get("OUTPUT_PREFIX", "_output")
4444

4545
credential = DefaultAzureCredential()

sdk/healthdataaiservices/azure-health-deidentification/samples/deidentify_text_redact.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
python deidentify_text_redact.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1818
"""
1919

2020

@@ -29,7 +29,7 @@ def deidentify_text_redact():
2929
from azure.identity import DefaultAzureCredential
3030
import os
3131

32-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
32+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
3333
credential = DefaultAzureCredential()
3434
client = DeidentificationClient(endpoint, credential)
3535

sdk/healthdataaiservices/azure-health-deidentification/samples/deidentify_text_surrogate.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
python deidentify_text_surrogate.py
1515
1616
Set the environment variables with your own values before running the sample:
17-
1) AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT - the service URL endpoint for a de-identification service.
17+
1) HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT - the service URL endpoint for a de-identification service.
1818
"""
1919

2020

@@ -29,7 +29,7 @@ def deidentify_text_surrogate():
2929
from azure.identity import DefaultAzureCredential
3030
import os
3131

32-
endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
32+
endpoint = os.environ["HEALTHDATAAISERVICES_DEID_SERVICE_ENDPOINT"]
3333
credential = DefaultAzureCredential()
3434
client = DeidentificationClient(endpoint, credential)
3535

0 commit comments

Comments
 (0)