Skip to content

Conversation

davidangb
Copy link
Contributor

@davidangb davidangb commented Dec 17, 2024

This PR is an optimization found while researching CORE-207. This should be performance only; it has no functional impact to end users.

The makeRow() function is called, as its name implies, for each row of a TSV during download. Thus, it feels worth optimizing.

  1. optimize use of collections within TSVFormatter.makeRow. Previously, we looped over all attributes in the entity, performed a Seq.indexOf() call to determine the column number that attribute should reside in, made a map of the column number->value, and then created a Seq of values based on the map's keys. Now, we loop over the headers in order and extract each column from the entity, still in order. This is less work and makes the makeRow function ~1.5x faster.
  2. use List instead of IndexedSeq when iterating. We were using IndexedSeq all over the place. IndexedSeq is optimized more for random access, while List (which extends LinearSeq) is optimized more for sequential access. Our access was sequential. Changing this added another ~1.4x speedup, for a total of ~2.2x.

Raw numbers from the JMH benchmark:

  ops/sec error +/- vs.baseline avg ops/sec avg vs. baseline
baseline 244,195.39 19,087.54   247,734.34  
  240,787.36 21,066.81 98.60%    
  258,220.27 14,516.10 105.74%    
Map->Seq for makerow 388,638.98 48,430.52 159.15% 398,863.48 161.00%
  375,149.40 43,361.68 153.63%    
  325,224.09 45,584.79 133.18%    
  315,577.37 51,806.07 129.23%    
  410,244.36 38,258.30 168.00%    
IndexedSeq -> List 578,346.66 8,294.20 236.84% 555,535.68 224.25%
  559,476.43 72,216.37 229.11%    
  546,860.46 24,416.72 223.94%    
  537,459.14 25,614.79 220.09%    

Have you read CONTRIBUTING.md lately? If not, do that first.

I, the developer opening this PR, do solemnly pinky swear that:

  • I've followed the instructions if I've made any changes to the API, especially if they're breaking changes
  • I've updated the FISMA documentation if I've made any security-related changes, including auth, encryption, or auditing

In all cases:

  • Get two thumbsworth of review and PO signoff if necessary
  • Verify all tests go green
  • Squash and merge. Make sure your branch deletes; GitHub should do this for you.
  • Test this change deployed correctly and works on dev environment after deployment

TSVFormatter.makeEntityRows(entityData.entityType, entityData.entities, entityData.headers)(entityData.model)
blackHole.consume(result)
result
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the two example benchmarks and replaced with a benchmark that is actually useful for the current case.

@davidangb davidangb marked this pull request as ready for review December 17, 2024 19:38
@davidangb davidangb requested a review from a team as a code owner December 17, 2024 19:38
@davidangb davidangb requested review from calypsomatic and trholdridge and removed request for a team December 17, 2024 19:38
@davidangb davidangb merged commit 0575d18 into develop Dec 17, 2024
13 checks passed
@davidangb davidangb deleted the da_CORE-207_tsvOptimizations branch December 17, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants