Salesforce-Python-Scripts

various scripts for interacting with salesforce from Python

filesExtractRunner.py

This Python script is designed to export in bulk the "files" saved against records, file libraries or your personal files.
In Salesforce, these are stored in the ContentVersion object from the Salesforce org. https://developer.salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_contentversion.htm The provided script is designed to fetch files by Title only. The running user should have the Query All Files permission assigned https://help.salesforce.com/s/articleView?id=000381258&type=1 Set the authentication in the .env file and run it from the command line. Files will be downloaded and placed into a salesforce_downloads folder in the working directory.

Example use case: Extract pdf files for use in AI training or as knowledge for RAG. You could use these data to train a Document AI OCR model on your particular documents. You could also export these documents as predetermined "knowledge" for feeding into a RAG search system for their text extraction, chunking and embedding in a vector database. https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview

Ideas for future improvement:

Use of the MD5 Checksum to verify file transfer integrity.
Specify the WHERE clause in .env file or through param query string.
Allowing to define the LinkedEntity or ContentWorkspace (aka Library) from where the files should be retrieved. May or may not be used in conjunction with file title.

Install & Run:
To install: It is recommended to use a virtual environment then git clone into your local directory.
To install dependencies: python -m pip install -r requirements.txt
To run:

Update the .env-sample as .env with your specific environment variables for the target org.
Update the WHERE clause for the specific files needed. A subquery such as WHERE FirstPublishedLocationId in (SELECT id FROM Account) will also work for files loaded against a record.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env-sample		.env-sample
.gitignore		.gitignore
README.md		README.md
filesExtractRunner.py		filesExtractRunner.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Salesforce-Python-Scripts

filesExtractRunner.py

About

Uh oh!

Releases

Packages

Languages

almc-c/Salesforce-Python-Scripts

Folders and files

Latest commit

History

Repository files navigation

Salesforce-Python-Scripts

filesExtractRunner.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages