How to download a PDF file with a link that show up in your browser as a PDF? #98
-
Here is the problem in detail: BWT, dirver.NavigateTo "https://XXX.net/account/download_cfd_statement/2024-01-28T22:00:00Z/pdf/" would only redirect to the web page as shown below and could NOT download the pdf. So for now, I have to download it manually... |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 7 replies
-
Hi @GHRyunosuke, did you set the download preferences before navigating to the pdf url? Sub test_file_download2()
'WARNING - this currently will fail in Edge/Chrome if running in Incognito mode
'see https://github.com/GCuser99/SeleniumVBA/issues/87 for work-around(s)
Dim driver As WebDriver
Set driver = New WebDriver
driver.StartChrome
'set the directory path for saving download to
Dim caps As WebCapabilities
Set caps = driver.CreateCapabilities
caps.SetDownloadPrefs ".\"
driver.OpenBrowser caps
'delete legacy copy if it exists
driver.DeleteFiles ".\dummy.pdf"
driver.NavigateTo "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
driver.WaitForDownload ".\dummy.pdf"
driver.CloseBrowser
driver.Shutdown
End Sub CORRECTED! |
Beta Was this translation helpful? Give feedback.
-
Hi @GHRyunosuke, that error indicates that the regex used by SeleniumVBA to extract the file name out from your VBE window, that is Apparently, and I remember that we already saw this before, your caption format is different than our caption format, where our format includes the module name too, like this: Go to the WebShared class module and change the regex to this fixed version: ![]() Please let us know if this fix worked. |
Beta Was this translation helpful? Give feedback.
-
Hello @GCuser99 thanks for your suggestion, here comes the result:
Tomorrow, i will test my case which is "https://xxx.net/account/download_cfd_statement/2024-01-28T22:00:00Z/pdf/" with the "old way" to see if it could be downloaded or not. |
Beta Was this translation helpful? Give feedback.
-
Oh, I see from your screenshots that you have three different formats for the caption of the main VBE window, which is the only one of interest to us: Can you please confirm that? On my system, I can only see these: Once I'll see all the formats, I'll make a new fixed regex supporting them all. |
Beta Was this translation helpful? Give feedback.
-
Thanks @GHRyunosuke for helping us debug this issue and thanks @6DiegoDiego9 for sharing your expertise on RegExp. BTW, @6DiegoDiego9, a heads-up - the pattern that you gave @GHRyunosuke to try in your post above was not the same as you used in https://regex101.com/ - I think the GitHub Markdown is escaping characters from the pattern when you copy it from https://regex101.com/. |
Beta Was this translation helpful? Give feedback.
-
@GHRyunosuke, I understood that you obtain both A) and C) by unmaximizing the editor window. Can you please clarify the different way you obtain A) vs C)? @GCuser99 oh yes you're right, it's likely because of markdown rendering. I'll be careful when I paste the new fixed version. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Here is the fixed regex: Please let us know if it works for you with all the formats, although I doubt you can have the C) type in real cases. |
Beta Was this translation helpful? Give feedback.
-
@6DiegoDiego9 works for me too! |
Beta Was this translation helpful? Give feedback.
-
@GHRyunosuke, so it seems that you have found your way to download the pdf(?). Plus with @6DiegoDiego9's help, we made the relative path logic more robust! On my "new way" versus "old way" for downloading a file, I had forgotten that those two "ways" are really not comparable. The "driver.SetDownloadFolder" just allows you to change the download folder AFTER you have set the capabilities through "caps.SetDownloadPrefs". Here would be an example using those two methods correctly: Sub test_file_download2()
'WARNING - this currently will fail in Edge/Chrome if running in Incognito mode
'see https://github.com/GCuser99/SeleniumVBA/issues/87 for work-around(s)
Dim driver As WebDriver
Set driver = New WebDriver
driver.StartChrome
'set download preferences through capabilities
Dim caps As WebCapabilities
Set caps = driver.CreateCapabilities
caps.SetDownloadPrefs "%USERPROFILE%\Documents\myFolder"
driver.OpenBrowser caps
'delete legacy copy if it exists
driver.DeleteFiles "%USERPROFILE%\Documents\myFolder\dummy.pdf"
driver.NavigateTo "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
driver.WaitForDownload "%USERPROFILE%\Documents\myFolder\dummy.pdf"
'now change the download folder without redoing capabilities
driver.SetDownloadFolder "%USERPROFILE%\Documents\myDifferentFolder"
'delete legacy copy if it exists
driver.DeleteFiles "%USERPROFILE%\Documents\myDifferentFolder\dummy.pdf"
driver.NavigateTo "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
driver.WaitForDownload "%USERPROFILE%\Documents\myDifferentFolder\dummy.pdf"
driver.CloseBrowser
driver.Shutdown
End Sub Hope that clarifies how to use those two methods. I will modify the Wiki to explain that properly. Sorry for the confusion. |
Beta Was this translation helpful? Give feedback.
-
Revised regex to work with even more tricky file names like "test.xls - abcd.xls.xlam": |
Beta Was this translation helpful? Give feedback.
Hi @GHRyunosuke, did you set the download preferences before navigating to the pdf url?
If the new way (experimental) below does not work for you, then comment it out and try the old way: