Skip to content

Commit c3e772c

Browse files
authored
feat/support cloud and non-cloud jira instance (#499)
* support cloud and non-cloud jira instance * update fixtures * update fixtures * add logging * refactor * add typing
1 parent 2306a76 commit c3e772c

9 files changed

+143
-261
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## 1.0.24
2+
3+
* **Handle both cloud and non-cloud jira instances**
4+
15
## 1.0.23
26

37
* **Migrate to new Mixedbread Python SDK**

test_e2e/expected-structured-output/azure/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf.json

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -359,9 +359,33 @@
359359
}
360360
}
361361
},
362+
{
363+
"type": "UncategorizedText",
364+
"element_id": "4f2dbe3656a9ebc60c7e3426ad3cb3e3",
365+
"text": "_____________________________________________________________________________________________",
366+
"metadata": {
367+
"filetype": "application/pdf",
368+
"languages": [
369+
"eng"
370+
],
371+
"page_number": 2,
372+
"data_source": {
373+
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
374+
"version": "0x8DB214A673DD8D8",
375+
"record_locator": {
376+
"protocol": "abfs",
377+
"remote_file_path": "abfs://container1/"
378+
},
379+
"date_created": "1678440764.0",
380+
"date_modified": "1678440764.0",
381+
"permissions_data": null,
382+
"filesize_bytes": 41164
383+
}
384+
}
385+
},
362386
{
363387
"type": "NarrativeText",
364-
"element_id": "c8fdefac1ae82fa42caeceff04853415",
388+
"element_id": "cd359ae8c49885ead47318021438eead",
365389
"text": "this commitment, a recent report to the NLM Director recommended working across NIH to identify and develop core skills required of a biomedical data scientist to consistency across the cohort of NIH-trained data scientists. This report provides a set of recommended core skills based on analysis of current BD2K-funded training programs, biomedical data science job ads, and practicing members of the current data science workforce.",
366390
"metadata": {
367391
"filetype": "application/pdf",
@@ -385,7 +409,7 @@
385409
},
386410
{
387411
"type": "Title",
388-
"element_id": "b5b7392d0a946f5016bfa8ad0c248a9b",
412+
"element_id": "bf8321a34edb7103ec4209f3e4a8a8da",
389413
"text": "Methodology",
390414
"metadata": {
391415
"filetype": "application/pdf",
@@ -409,7 +433,7 @@
409433
},
410434
{
411435
"type": "NarrativeText",
412-
"element_id": "d9d8e38d221ae621c0ddbcabaa4a28b4",
436+
"element_id": "1e1d3d1a5c1397fc588393568d829bc8",
413437
"text": "The Workforce Excellence team took a three-pronged approach to identifying core skills required of a biomedical data scientist (BDS), drawing from:",
414438
"metadata": {
415439
"filetype": "application/pdf",
@@ -433,7 +457,7 @@
433457
},
434458
{
435459
"type": "NarrativeText",
436-
"element_id": "ba70aa3bc3ad0dec6a62939c94c5a20c",
460+
"element_id": "45d7ff56632d66a2ab2d4dd2716d4d2e",
437461
"text": "a) Responses to a 2017 Kaggle1 survey2 of over 16,000 self-identified data scientists working across many industries. Analysis of the Kaggle survey responses from the current data science workforce provided insights into the current generation of data scientists, including how they were trained and what programming and analysis skills they use.",
438462
"metadata": {
439463
"filetype": "application/pdf",
@@ -457,7 +481,7 @@
457481
},
458482
{
459483
"type": "NarrativeText",
460-
"element_id": "24724b1f0d20a6575f2782fd525c562f",
484+
"element_id": "bf452aac5123fcedda30dd6ed179f41c",
461485
"text": "b) Data science skills taught in BD2K-funded training programs. A qualitative content analysis was applied to the descriptions of required courses offered under the 12 BD2K-funded training programs. Each course was coded using qualitative data analysis software, with each skill that was present in the description counted once. The coding schema of data science-related skills was inductively developed and was organized into four major categories: (1) statistics and math skills; (2) computer science; (3) subject knowledge; (4) general skills, like communication and teamwork. The coding schema is detailed in Appendix A.",
462486
"metadata": {
463487
"filetype": "application/pdf",
@@ -481,7 +505,7 @@
481505
},
482506
{
483507
"type": "NarrativeText",
484-
"element_id": "5e6c73154a1e5f74780c69afbc9bc084",
508+
"element_id": "ca176cbef532792b1f11830ff7520587",
485509
"text": "c) Desired skills identified from data science-related job ads. 59 job ads from government (8.5%), academia (42.4%), industry (33.9%), and the nonprofit sector (15.3%) were sampled from websites like Glassdoor, Linkedin, and Ziprecruiter. The content analysis methodology and coding schema utilized in analyzing the training programs were applied to the job descriptions. Because many job ads mentioned the same skill more than once, each occurrence of the skill was coded, therefore weighting important skills that were mentioned multiple times in a single ad.",
486510
"metadata": {
487511
"filetype": "application/pdf",
@@ -505,7 +529,7 @@
505529
},
506530
{
507531
"type": "NarrativeText",
508-
"element_id": "249f6c76b2c99dadbefb8b8811b0d4cd",
532+
"element_id": "11b170fedd889c3b895bbd28acd811ca",
509533
"text": "Analysis of the above data provided insights into the current state of biomedical data science training, as well as a view into data science-related skills likely to be needed to prepare the BDS workforce to succeed in the future. Together, these analyses informed recommendations for core skills necessary for a competitive biomedical data scientist.",
510534
"metadata": {
511535
"filetype": "application/pdf",
@@ -529,7 +553,7 @@
529553
},
530554
{
531555
"type": "NarrativeText",
532-
"element_id": "f4b34fe2b03c12e48a89276dca673bfb",
556+
"element_id": "2665aadf75bca259f1f5b4c91a53a301",
533557
"text": "1 Kaggle is an online community for data scientists, serving as a platform for collaboration, competition, and learning: http://kaggle.com",
534558
"metadata": {
535559
"filetype": "application/pdf",
@@ -553,7 +577,7 @@
553577
},
554578
{
555579
"type": "NarrativeText",
556-
"element_id": "75e0008cfdfecc18fb8c43490c53d6d4",
580+
"element_id": "8bbfe1c3e6bca9a33226d20d69b2297a",
557581
"text": "2 In August 2017, Kaggle conducted an industry-wide survey to gain a clearer picture of the state of data science and machine learning. A standard set of questions were asked of all respondents, with more specific questions related to work for employed data scientists and questions related to learning for data scientists in training. Methodology and results: https://www.kaggle.com/kaggle/kaggle-survey-2017",
558582
"metadata": {
559583
"filetype": "application/pdf",
@@ -577,7 +601,7 @@
577601
},
578602
{
579603
"type": "PageNumber",
580-
"element_id": "e5d48e29d989341ba281611d4eb9311a",
604+
"element_id": "dd4a661e1a3c898a5cf6328ba56b924d",
581605
"text": "2",
582606
"metadata": {
583607
"filetype": "application/pdf",

test_e2e/expected-structured-output/azure/spring-weather.html.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
"element_id": "0d591849d38087378b7e9f078ce2d40f",
55
"text": "National Weather Service",
66
"metadata": {
7+
"image_url": "/css/images/header.png",
78
"link_texts": [
89
"National Weather Service"
910
],
@@ -33,6 +34,7 @@
3334
"element_id": "0656e79f949578c1381a24fca46105d0",
3435
"text": "United States Department of Commerce",
3536
"metadata": {
37+
"image_url": "/css/images/header_doc.png",
3638
"link_texts": [
3739
"United States Department of Commerce"
3840
],
@@ -2615,6 +2617,7 @@
26152617
"element_id": "7ba56f7bdf3e5a1acab250c00aaebab9",
26162618
"text": "",
26172619
"metadata": {
2620+
"image_url": "/images/wrn/Infographics/2023/outside-of-heatwaves.png",
26182621
"languages": [
26192622
"eng"
26202623
],
@@ -2638,6 +2641,7 @@
26382641
"element_id": "81a3e31aa4cb3f2745bf23bae6127d95",
26392642
"text": "usa.gov",
26402643
"metadata": {
2644+
"image_url": "/css/images/usa_gov.png",
26412645
"link_texts": [
26422646
"usa.gov"
26432647
],

0 commit comments

Comments
 (0)