Replies: 4 comments
-
I have never needed separate download vs parsing yet, anyway, when opening the files, what about programmatically turn on an extension like "Work Offline" that blocks the browser (or single tab) communication with Internet? |
Beta Was this translation helpful? Give feedback.
-
Thx for the link. I'd like to do this without extensions (and that particular one does not have good reviews). I did do research on whether there was a clean capabilities solution and could not find one.... |
Beta Was this translation helpful? Give feedback.
-
Open to suggestions on naming these methods - my preference is to name them in a way that they all end up in the same place on the intelisense properties/method list... Here are some options I came up with - again, suggestions welcome: WebPageToHTMLFile
SaveToHTMLFile
SavePageToHTMLFile
PageToHTMLFile
SourceToHTMLFile
|
Beta Was this translation helpful? Give feedback.
-
Good question and difficult answer. I'd vote for: or even a shorter: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Usage case: I have a VBA script that automatically saves webpage data on a daily basis for future processing. The webpages are saved to *.html files. Later, when I'm ready to process these files, I read them back into SeleniumVBA and parse them all at once. One problem with this is that when I navigate to the saved files, dynamic html content (for example "src=" attributes in iframes) causes the navigation to take a long time (sometimes 10's of seconds) as the dynamic content attempts to resolve. I need a way to "render" the saved html files as "offline" content, by disabling (or "sanitizing") the dynamic content while leaving the DOM tree intact before saving.
Additionally, I have found that sometimes when trying to load an HTMLDocument object with unsanitized webpage source, I will get an unexpected "Windows Security Warning" popup along with an instance of the IE browser, even though I'm using Chrome/Edge driver. With sanitization, this behavior does not occur.
The above examples prompted me to develop functionality for sanitizing html source, and more generally methods for supporting HTML, XML, and JSON documents:
In addition to optional sanitization of the HTML source, the Json and XML methods optionally allow "pretty print" indentation for easier reading.
Comments welcome. Any suggestions on naming these methods?
Here is a test usage:
Here is the proposed code block for adding to WebDriver class.
Beta Was this translation helpful? Give feedback.
All reactions