Real-world testing initialization using the EntityEvent approach #120

bryanburgers · 2024-03-08T16:54:18Z

bryanburgers
Mar 8, 2024

At the 2024 RESO Winter Dev conference (March 5–6, 2024) there was discussion on the best ways for consumers to "initialize" against a producer – the best way for consumers to get all of the data they need before they get to the point where they can "keep up" with new updates.

The general consensus at the meeting was that using the new EntityEvent standard, and then fetching using keys, directly, would be the preferred way to initialize and would offer a significant speedup over the current best practice of following @odata.nextLink.

Because we also talked about using evidence to make decisions several times, I thought it would be beneficial to try out these ideas and see what results I could come up with. SCIENCE!

tl;dr

In most cases, EntityEvent approach was faster – sometimes significantly so. In at least one case it was slower.
I was never able to request 1000 keys at time. The maximum was 200.

Methodology

I tested five different MLSes across (I think) four different providers that Zenlist has access to.

I initialized between 10,000 and 40,000 properties for each MLS (depending on a variety of factors), first using the next link approach, and then using the entity event approach, and recorded information.

As far as I am aware, none of these MLSes support the very new EntityEvent standard yet. However, it's still possible to test the entity event approach because EntityEvent provides a list of keys to fetch, so it's easy to simulate as long as the test knows the list of keys in advance. Since I did the next link test first, I had the list of keys in advance.

In this test, I have not factored in the time it would take to traverse the entity event history; I assume that traversing that will not significantly add to the time.

When using the entity event approach, for four MLSes I generated a request that included a filter like

$filter=ListingKey in ('k1','k2','k3',...)

the fifth MLS did not support the in operator, so instead I generated a request that included a filter like

$filter=ListingKey eq 'k1' or ListingKey eq 'k2' or ListingKey eq 'k3' or ...

For all of the MLSes, I initially tried the entity event approach using 100 concurrent requests. In two cases, I needed to scale that down to complete the tests.

Rate limits

Rate limits were not a major concern while performing these tests. Only one API rate limited me, and turning down the concurrency took care of that.

However, I opted not to test one particular provider because I know from past experience that they wouldn't rate limit Zenlist's access token while running the test, but would rate limit our access token for a full hour later in the day.

Results

MLS	keys	nextLink	EntityEvent
`dvbgs`	10000	412s	442s
`fewdk`	10000	49s	23s
`ftzal`	10000	89s	40s
`snozd`	40000	882s	53s
`zxyar`	20000	1872s	44s

More details

MLS	keys	nl P/r	nl Reqs	nl Time	ee P/r	ee Reqs	ee Wall Time	ee AvgR	ee MaxC
`dvbgs`	10000	250	40	412s	100	100	442s	21.8s	5
`fewdk`	10000	200	50	49s	200	50	23s	17.8s	100
`ftzal`	10000	200	50	89s	200	50	40s	28.6s	100
`snozd`	40000	500	80	882s	200	200	53s	4.9s	20
`zxyar`	20000	200	100	1872s	200	100	44s	16.4s	100

MLS: A random unique ID for the MLS
keys: The total number of keys used in the test
nl P/r: Using the next link approach, how many Properties per request were requested and received
nl Reqs: Using the next link approach, the total number of requests made
nl Time: Using the next link appraoch, the sum of all of the durations of the requests
ee P/r: Using the entity event approach, how many Properties per request were requested and received
ee Reqs: Using the entity event approach, how many requests were made
ee Wall Time: Using the entity event approach, the total amount of time initialization took. Because of concurrency, this is smaller than the sum of the durations of the requests
ee AvgR: using the entity event approach, the average time it took the server to respond to a request
ee MaxC: Using the entity event approach, the maximum number of requests in flight

Chosing the number of properties per request for the entity event approach

The discussion at the dev conference suggested 1,000 properties per request. Across the 5 MLSes tested, I was never able to get more than 200 properties per request for a variety of reasons.

I typically started at 1,000, but would receive an HTTP 414: URI Too Long or an HTTP 413: Content Too Large error.

But even when I kept the URI within the HTTP spec limits, there was still typically an application limit that prevented going over 200.

For dvbgs choosing anything over 150 resulted in an error "The query specified in the URI is not valid. Max Node Limit Exceeded 750".

For fewdk and ftzal, anything over 200 would result in an error "Maximum value for $top is 200". Making a request without a $top=... would return 10 items.

For snozd, anything over 200 would result in an error "You have exceeded the maximum value count. Please limit to 200 values in a single expression."

For zxyar, anything over 200 would return successfully, but would only include 200 results.

Per-MLS caveats

dvbgs
- Frequently closed the HTTP connection when testing; I eventually needed to drop the concurrency to 5 to succeed.
- OData filter didn't support the in approach; I used the or approach instead.
fewdk
- I intended to initialize to 40,000 listings. However, following @odata.nextLink to initialize worked for 10,000 listings. After that, it returned the error Maximum value for $skip is 10000. Custom code would need to be written to initialize beyond 10,000 listings.
ftzal
- I intended to initialize to 40,000 listings. However, following @odata.nextLink to initialize worked for 10,000 listings. After that, it returned the error Maximum value for $skip is 10000. Custom code would need to be written to initialize beyond 10,000 listings.
snozd
- Connection failures then rate limiting when I tried to run 100 requests concurrently. After a bit of time, I retried with 20 concurrent requests and everything went fine. I did not experiment with values between 20 and 100.

Which MLSes were tested?

The point of this was not to glorify or shame any particular MLS or provider. It was to get an early idea of how the entity event approach performs in the real world. So I'm not actively reporting which MLSes or providers were tested.

I used the user agent Zenlist/1.0 EntityEventTest/{5-letter id} (+bryan@zenlist.com) when making requests, so if your MLS was tested and you want to correlate with the above table, you can look in your request logs. Or reach out to me privately.

darnjo · 2024-03-13T04:28:28Z

darnjo
Mar 13, 2024
Maintainer

Thanks for looking into this, Bryan. Some of the key-fetch results are similar to what we found when we were doing research into Web API Core and replication in 2018 when we first started discussing it and EntityEvent.

Current Test Results

In terms of average next link times, we haven't sampled them for DD 2.0 (and it's not guaranteed to work in all cases currently because we haven't certified people on it) so we don't have data for it yet. Most don't pass on the first try.

Also, most people aren't respecting the header value that sets a larger page size at the moment so it may not be as efficient now as it will be in 6-12mo. It will be interesting to retest this against a larger sample of providers once they've all been certified for DD 2.0.

However, we do have extensive stats on how fast it is to fetch using modification timestamp queries (which many still use because nextLink may not be generally available).

It's roughly 12s per 1,000 records across all resources, standard or local, with a page size of 100 records per page and a sample size of 661 markets and up to 100,000 records per resource. This means it would take roughly 120s for 10,000 records, on average. Many support more than 100 records per page using this approach. YMMV.

The Property Resource on its own is is roughly 22s / 1,000 records, meaning 220s for 10,000. This seems faster than the average of your results above? I'd be curious if you have a similar result locally.

As a side note, we don't test concurrent requests since their behavior varies widely between providers and when we've tried in testing we've been rate limited a lot or they didn't support them. Everything is done sequentially, one-page-at-a-time per resource for a more "apples to apples" comparison.

EntityEvent Goals

While performance is one aspect of the EntityEvent proposal, it is not the only consideration. Some others are:

EntityEvent lets people drive their replication from a single resource rather than polling on many.
Deleted items are addressed, which is not the case when trying to get changes on a per-resource basis.
Initial syncing and ongoing replication can be handled using the same fairly straightforward strategy: batch-fetch the keys and pull data.

It's great if it can be as fast as possible, and we probably wouldn't want things to be much slower than they are now for the items above alone.

However, there are some limitations:

As outlined above, we haven't tested nextLink yet and can't guarantee it's working correctly or optimally at this point for a source of comparisons. The results above are probably fairly accurate, but I'd like to see a larger sample once we've done more DD 2.0 testing. We will request a max page size of 1,000 records per page in testing, which hopefully most providers eventually support. This doesn't impact the key fetches addressed in point (3), below, more to the contribution to the overall total using next link to fetch keys or replicate data.
One reason for my point in (1), is that in DD 1.7, many providers started off slower than the average until they saw how they compared, then made adjustments where we usually saw between 5-20x improvements in response times. I expect similar with nextLink and DD 2.0. Therefore, ModificationTimestamp queries are probably the most reliable comparison at this point in terms of standards.
Perhaps more important than (1) or (2), is the fact that OData uses query strings for GET requests, and the query string max length varies widely among platforms. So, the number of records you'll be able to fetch using key queries might vary widely among providers at the moment.

0 replies

EnFinlay · 2024-03-15T18:07:28Z

EnFinlay
Mar 15, 2024
Maintainer

Thank you for doing this detailed analysis Bryan! This does help build the case for EntityEvent and it's effectiveness for replication.

Nicely done with the headers by the way, it was really handy to tell which of those were our systems. I can tell you that the nextLink behaviour and some of the limitations you hit will go away in the near future (as part of our work to be DD 2.0 compliant).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Real-world testing initialization using the EntityEvent approach #120

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Real-world testing initialization using the EntityEvent approach #120

Uh oh!

Uh oh!

bryanburgers Mar 8, 2024

tl;dr

Methodology

Rate limits

Results

Chosing the number of properties per request for the entity event approach

Per-MLS caveats

Which MLSes were tested?

Replies: 2 comments

Uh oh!

darnjo Mar 13, 2024 Maintainer

Uh oh!

EnFinlay Mar 15, 2024 Maintainer

bryanburgers
Mar 8, 2024

darnjo
Mar 13, 2024
Maintainer

EnFinlay
Mar 15, 2024
Maintainer