Skip to content

Conversation

@kleintom
Copy link
Contributor

See #4582 #4584 for history.

Code mainly by chatgpt.

With 1 FO, 1 Type specimen, 6 COs, and 6 ADs we went from 92 queries to 34 queries, according to this chatgpt script run in console:
via 'load /tmp/devPerf.rb'; DevPerf.measure("otu_distribution") { ApplicationController.helpers.otu_distribution(Otu.find(<n>)) }

Perhaps we can find a cleaner way to include the 'if association already exists' checks added here?

Note the `conn.disabled_query_cache!`
```ruby
module DevPerf
  def self.measure(label)
    conn = ActiveRecord::Base.connection
    conn.disable_query_cache!
    total = 0
    sub = ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
      e = ActiveSupport::Notifications::Event.new(*args)
      next if e.payload[:name] == 'SCHEMA' || e.payload[:cached]
      total += 1
    end
    t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    yield
    ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
    ActiveSupport::Notifications.unsubscribe(sub)
    puts "[#{label}] queries=#{total}, elapsed=#{ms}ms"
  ensure
    conn.enable_query_cache!
  end
end
```

There's still a GI load we're missing:
```
DevPerf.measure("otu_distribution") { ApplicationController.helpers.otu_distribution(otu) }
  Otu Load (0.4ms)  SELECT "otus".* FROM "otus" INNER JOIN "taxon_names" ON "otus"."taxon_name_id" = "taxon_names"."id" INNER JOIN "taxon_name_hierarchies" ON "taxon_names"."cached_valid_taxon_name_id" = "taxon_name_hierarchies"."descendant_id" WHERE ("taxon_name_hierarchies"."ancestor_id" IN (1159408))
  TaxonName Load (0.4ms)  SELECT "taxon_names".* FROM "taxon_names" WHERE "taxon_names"."id" IN ($1, $2, $3, $4)  [["id", 1159417], ["id", 1159415], ["id", 1159408], ["id", 1159416]]
  TaxonDetermination Load (0.3ms)  SELECT "taxon_determinations".* FROM "taxon_determinations" WHERE "taxon_determinations"."position" = $1 AND "taxon_determinations"."taxon_determination_object_type" = $2 AND "taxon_determinations"."otu_id" IN ($3, $4, $5, $6)  [["position", 1], ["taxon_determination_object_type", "FieldOccurrence"], ["otu_id", 991278], ["otu_id", 991283], ["otu_id", 1002981], ["otu_id", 1002985]]
  TaxonDetermination Load (0.3ms)  SELECT "taxon_determinations".* FROM "taxon_determinations" WHERE "taxon_determinations"."position" = $1 AND "taxon_determinations"."taxon_determination_object_type" = $2 AND "taxon_determinations"."otu_id" IN ($3, $4, $5, $6)  [["position", 1], ["taxon_determination_object_type", "CollectionObject"], ["otu_id", 991278], ["otu_id", 991283], ["otu_id", 1002981], ["otu_id", 1002985]]
  AssertedDistribution Load (0.2ms)  SELECT "asserted_distributions".* FROM "asserted_distributions" WHERE "asserted_distributions"."asserted_distribution_object_type" = $1 AND "asserted_distributions"."asserted_distribution_object_id" IN ($2, $3, $4, $5)  [["asserted_distribution_object_type", "Otu"], ["asserted_distribution_object_id", 991278], ["asserted_distribution_object_id", 991283], ["asserted_distribution_object_id", 1002981], ["asserted_distribution_object_id", 1002985]]
  Protonym Load (0.5ms)  SELECT "taxon_names".* FROM "taxon_names" WHERE "taxon_names"."type" = $1 AND "taxon_names"."id" IN ($2, $3, $4, $5)  [["type", "Protonym"], ["id", 1159417], ["id", 1159415], ["id", 1159408], ["id", 1159416]]
  FieldOccurrence Load (0.1ms)  SELECT "field_occurrences".* FROM "field_occurrences" WHERE "field_occurrences"."id" = $1  [["id", 2]]
  TypeMaterial Load (0.2ms)  SELECT "type_materials".* FROM "type_materials" WHERE "type_materials"."protonym_id" IN ($1, $2)  [["protonym_id", 1159408], ["protonym_id", 1159416]]
  CollectionObject Load (0.5ms)  SELECT "collection_objects".* FROM "collection_objects" WHERE "collection_objects"."id" IN ($1, $2, $3, $4, $5, $6, $7)  [["id", 2707728], ["id", 2707729], ["id", 2707730], ["id", 2707731], ["id", 2707732], ["id", 2707733], ["id", 2707734]]
  Identifier Load (0.2ms)  SELECT "identifiers".* FROM "identifiers" WHERE "identifiers"."identifier_object_type" = $1 AND "identifiers"."identifier_object_id" = $2  [["identifier_object_type", "FieldOccurrence"], ["identifier_object_id", 2]]
  CollectingEvent Load (0.8ms)  SELECT "collecting_events".* FROM "collecting_events" WHERE "collecting_events"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8)  [["id", 1342936], ["id", 1342935], ["id", 1342937], ["id", 1342938], ["id", 1342939], ["id", 1342940], ["id", 1342941], ["id", 1342942]]
  Identifier Load (0.9ms)  SELECT "identifiers".* FROM "identifiers" WHERE "identifiers"."identifier_object_type" = $1 AND "identifiers"."identifier_object_id" IN ($2, $3, $4, $5, $6, $7, $8)  [["identifier_object_type", "CollectionObject"], ["identifier_object_id", 2707728], ["identifier_object_id", 2707729], ["identifier_object_id", 2707730], ["identifier_object_id", 2707731], ["identifier_object_id", 2707732], ["identifier_object_id", 2707733], ["identifier_object_id", 2707734]]
  GeographicArea Load (0.3ms)  SELECT "geographic_areas".* FROM "geographic_areas" WHERE "geographic_areas"."id" IN ($1, $2, $3, $4, $5, $6)  [["id", 33446], ["id", 33459], ["id", 33451], ["id", 28924], ["id", 33851], ["id", 31943]]
  Georeference Load (0.3ms)  SELECT "georeferences".* FROM "georeferences" WHERE "georeferences"."collecting_event_id" IN ($1, $2, $3, $4, $5, $6, $7, $8)  [["collecting_event_id", 1342936], ["collecting_event_id", 1342935], ["collecting_event_id", 1342937], ["collecting_event_id", 1342938], ["collecting_event_id", 1342939], ["collecting_event_id", 1342940], ["collecting_event_id", 1342941], ["collecting_event_id", 1342942]]
  GeographicAreasGeographicItem Load (0.2ms)  SELECT "geographic_areas_geographic_items".* FROM "geographic_areas_geographic_items" WHERE "geographic_areas_geographic_items"."geographic_area_id" IN ($1, $2, $3, $4, $5, $6) ORDER BY CASE "geographic_areas_geographic_items"."data_origin" WHEN 'gadm' THEN 1 WHEN 'ne_country' THEN 2 WHEN 'ne_state' THEN 3 ELSE 4 END  [["geographic_area_id", 28924], ["geographic_area_id", 31943], ["geographic_area_id", 33446], ["geographic_area_id", 33451], ["geographic_area_id", 33459], ["geographic_area_id", 33851]]
  GeographicItem Load (0.6ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)  [["id", 189412], ["id", 189411], ["id", 189413], ["id", 189414], ["id", 189415], ["id", 189416], ["id", 189417], ["id", 189418], ["id", 27690], ["id", 30874], ["id", 33561], ["id", 33566], ["id", 33574], ["id", 34048]]
  GeographicItem Load (0.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189412], ["LIMIT", 1]]
   (0.2ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189412
  GeographicItem Load (0.1ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189411], ["LIMIT", 1]]
   (0.1ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189411
  GeographicItem Load (0.1ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189413], ["LIMIT", 1]]
   (0.8ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189413
  GeographicItem Load (0.3ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189414], ["LIMIT", 1]]
   (0.3ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189414
  GeographicItem Load (0.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189415], ["LIMIT", 1]]
   (0.3ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189415
  GeographicItem Load (0.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189416], ["LIMIT", 1]]
   (0.2ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189416
  GeographicItem Load (0.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189417], ["LIMIT", 1]]
   (0.3ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189417
  GeographicItem Load (0.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189418], ["LIMIT", 1]]
   (0.2ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189418
  GeographicItem Load (1.2ms)  SELECT "geographic_items".* FROM "geographic_items" WHERE "geographic_items"."id" = $1 LIMIT $2  [["id", 189411], ["LIMIT", 1]]
   (0.6ms)  SELECT ST_AsGeoJSON("geographic_items"."geography") FROM "geographic_items" WHERE "geographic_items"."id" = 189411
[otu_distribution] queries=34
```
@LocoDelAssembly
Copy link
Contributor

TODO

Verify if we need georeferences.min_by { |g| [g.position.nil? ? 1 : 0, g.position.to_i] } or not. Specs are all working without it but that is likely due to georeferences not being preloaded. position is nullable.

LocoDelAssembly and others added 4 commits October 16, 2025 14:13
The affected spec:
```
Failures:

  1) Georeference methods #error_box with error_geographic_item returns a shape
     Failure/Error: expect(georeference.error_box.geo_object.to_s).to eq(box_1.to_s)

       expected: "POLYGON ((-1.0 1.0 0.0, 1.0 1.0 0.0, 1.0 -1.0 0.0, -1.0 -1.0 0.0, -1.0 1.0 0.0))"
            got: "POLYGON ((-1.0 1.0 0.0, -1.0 -1.0 0.0, 1.0 -1.0 0.0, 1.0 1.0 0.0, -1.0 1.0 0.0))"

       (compared using ==)
     # ./spec/models/georeference_spec.rb:359:in 'block (4 levels) in <top (required)>'
```

```ruby
  specify 'with error_geographic_item returns a shape' do
        # case 2a - error geo_item
        e_g_i.save!
        georeference = Georeference::VerbatimData.new(
          collecting_event: collecting_event_with_geographic_area,
          error_geographic_item: e_g_i)
        expect(georeference.save!).to be_truthy
        expect(georeference.error_box.geo_object.to_s).to eq(box_1.to_s)
      end
```

`box_1` is an RGeo square (not a geographic item). `box_1` had a CW (incorrect) winding. `e_g_i` is a GeographicItem whose shape is `box_1`. Since the `e_g_i` AR is the same as the one used to create `e_g_i`, the AR `e_g_i` used to have the wrong CW winding as well (even though it was saved in the db correctly as CCW).
That made `georeference.error_box.geo_object`, which was assigned from the `e_g_i` AR, have the same CW winding as `box_1`.

Now that align_winding is fixed for ARs, `georef.error_box.geo_object = e_g_i` has the correct winding, which was opposite of the incorrect `box_1` (non-GI shape) winding and the spec failed.

Miraculously I seem to have been able to adjust all of the box_1 through box_4 definitions to give them the correct CCW winding without affecting any other specs.
@kleintom kleintom force-pushed the tw_otu_distribution_n_plus_one branch from 055c9e0 to f0dcac7 Compare October 17, 2025 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants