-
Notifications
You must be signed in to change notification settings - Fork 121
feat: contract log query #1601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: contract log query #1601
Conversation
…ct/<principal>/events` endpoint
Codecov Report
@@ Coverage Diff @@
## develop #1601 +/- ##
============================================
- Coverage 77.41% 21.45% -55.96%
============================================
Files 78 78
Lines 11161 11302 +141
Branches 2487 2521 +34
============================================
- Hits 8640 2425 -6215
- Misses 2403 8198 +5795
- Partials 118 679 +561
... and 52 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. Just left a couple comments and some discussion items.
|
|
||
| /** @param { import("node-pg-migrate").MigrationBuilder } pgm */ | ||
| exports.down = pgm => { | ||
| pgm.dropIndex('contract_logs', 'value_json_path_ops_idx'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This index is never created, is this an old line?
| contract_identifier: string; | ||
| topic: string; | ||
| value: PgBytea; | ||
| value_json: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best if we set this as string | null IMO to avoid storing empty strings
| /* | ||
| test.skip('generate', () => { | ||
| const result = clarityCompactJsonVectors.map(([, hexEncodedCV, compactJson]) => { | ||
| const encodedJson = clarityValueToCompactJson(hexEncodedCV); | ||
| const reprResult = decodeClarityValueToRepr(hexEncodedCV); | ||
| return [reprResult, hexEncodedCV, JSON.stringify(encodedJson)]; | ||
| }); | ||
| console.log(result); | ||
| }); | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
src/tests/jsonpath-tests.ts
Outdated
| /* | ||
| test('generate vector data', () => { | ||
| const result = postgresExampleVectors.map((input) => { | ||
| const complexity = containsDisallowedJsonPathOperation(input); | ||
| return [input, complexity ? complexity.operation : null]; | ||
| }); | ||
| console.log(result); | ||
| }); | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
| } | ||
| } | ||
|
|
||
| async function complete_1680181889941_contract_log_json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pressing issue that would require us to ship this feature with this kind of migration? This code looks good to me (and I like this experimentation as you specified in the PR desc), but I worry that it could create some problems if we want to add newer migrations that rely on the new columns or something. IMO it would be easier and more maintainable to just ship this as a regular new migration and rely on replay, or to explicitly mark this code as a "patch" somehow which we could drop once we switch to a new v8 release.
What do you think?
…ty and ban some operations
|
I'd like to get more feedback on this feature to prioritize work on it. My sense is that if this were available, then A) contract authors would take more advantage of structured logs (i.e. print statements) and B) apps would leverage this API feature to make more straightforward and efficient queries. |
Closes #1598
This PR adds two new query params to the
/extended/v1/contract/<principal>/eventsendpoint.?filter_path=<jsonpath expression>:Optional
jsonpathexpression to select only results that contain items matching the expression.?contains=<json object>:Optional stringified JSON to select only results that contain the given JSON.
Rationale
Clarity smart contracts are able to generate structured log outputs during the execution of a contract-call or contract-deploy transaction. For more context see the Clarity
printdocs. These contract logs can then be ingested by an event-observer (such as the API) to store and serve contract-specific data. Today we have several examples of this:bnscontract emits logs used by the API to process, track, and serve clients BNS information like ownership, zonefiles, etc.poxcontract emit logs used by the API to store and track stacking related data, which is used in account balance endpoints (i.e.lockedbalance), Rosetta queries, endpoints for Stacking pool operators, etc.send-many-stxcontract, used by essentially all exchanges, performs batch stx-transfers and uses contract logs to emit a corresponding memo, which the API stores and returns in an endpoint also used by exchanges to track when their wallets have received a batch stx-transfer and the associated memo.These are some of the contracts and logs that this API and others currently have special handling for. Typically, for each of these contracts, the log events have unique code implemented to denormalize the log payload, store in a new table, and new endpoints + sql queries.
There are many more deployed contracts which have their own application-specific usage of contract logs, with their own unique payloads. The API doesn't and cannot reasonably implement custom handling for all of the current and future contracts in the same way it has for the contracts listed above. Currently, clients must use the
/extended/v1/contract/{contract_id}/eventsendpoint. This simply returns the latest N logs for a given contract. Clients must paginate through these requests, and manually parse out and filter the logs they need. Even for the contracts that already receive special treatment (e.g.pox,bns), there are requests for new endpoints with different query criteria which would require us to implement new endpoints and sql queries.In the Ethereum world, the ability to emit and query structured logs is a critical element of their app ecosystem. Token standards that you may be familiar with (e.g. ERC20) specify the logs that contracts must emit when token operations happen. The
eth_getLogsRPC endpoint provides a mechanism for efficiently querying those logs. For example "give me all transfers for a given FT associated with a given recipient address".In Stacks, we have have the building blocks for offering similar capabilities.
Contact log query implementation
Prior to this PR, contract logs (i.e. structured Clarity print outputs) were stored in a table in the
contract_logsin their original consensus-serialized binary encoding. This encoding made it essentially impossible to perform queries against the contents of the log object. Postgres supports a storing arbitrary structured documents in ajsonbcolumn type. This column type also supports a few types of advanced indexes (e.g.GIN(jsonb_ops)andGIN(jsonb_path_ops)) that make it possible perform efficient querys against the arbitrary JSON.So what we can do is 1) decode the binary Clarity buffers into an object, 2) encode that into a JSON object, 3) store that in a new
jsonbcolumn in thecontracts_logtable, then 4) implement API endpoint(s) that allow more precise queries against these logs.This PR implements the above. One of the design decisions was to implement yet-another way to encode Clarity values into JSON. The existing schemas we have are lossless (you can reverse the JSON back into the original Clarity value), however, that comes with overhead from having to store Clarity type information (because there is not a 1-1 mapping between JSON primitives and Clarity primitives), so that JSON is less readability and takes up more space. In our use-case, we aren't necessarily concerned with preserving exact Clarity typing, rather we want the JSON to be easy-to-query, readable, space-efficient, index-efficient, and have good interoperability with JSON-based workflows (e.g.
jsonpathexpressions). Here is the spec for how the Clarity values are encoded into a JSON object in this PR:null._errorwhich is set to the unwrapped error value.hexwith the hex-encoded string as the value, and the keyutf8with the utf8-encoded string as the value. When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD�will be used to represent those errors. Both of these are included because Clarity Buffers are used for storing both binary data and string data (e.g. BNS names).<address>or<address>.<contract_name>.Usage
Here's an example of how the subnets-nft-demo app can be refactored. The app fetches a list of all events, then on the client decodes of the Clarity values and filters for specific events. This can now be done in a single query leveraging the new
filter_pathquery param. This accepts ajsonpathexpression.Previous code:
https://github.com/hirosystems/subnets-nft-demo/blob/34e433a2d2893f36e1767ce635ee280baf1acbf6/src/stacks/apiCalls.ts#L96-L111
Example of doing the same thing with a single fetch using a
jsonpathexpression filter:Example of using the new
containsparam to perform a similar query:Here's another example of using
jsonpathexpression filter to fetch recent BNSname-revokeandname-renewalevents:Here's an example of using the new
containsparam. This is also a filter, but it takes a JSON object and returns only the contract log events that contain the given object. Here's an example of fetching recent BNSname-revokeevents:Considerations
The
jsonpathexpressions supported in Postgres can include relatively complex operations. For example, recursive key-value lookups, regex, math operations, type coercions, and more. Many of these cannot take advantage of the postgres JSON-specific indexes, which can result in unreasonably expensive queries. The vast majority of expected use-cases only need simple jsonpath expressions which my testing so far have shown to be reasonably efficient. So we need to determine when a given jsonpath expression is "too complex" and reject the request. This PR has a very simple regex-based version of that implemented, however, these regexes are easy to trick and bypass. We need to validate against the actual AST of a jsonpath expression. Existingjsonpathparsing libraries in js do not support the postgres flavor of expressions, so I'm in the process of switching the regex-based validation tojsonpath-pg. This is a library I've wrote which compiles the grammar and lexer C code from the postgres codebase into WASM and exposes the AST object. This can be used to reliable determine the complexity of an expression.Alternatives
The capabilities described above could potentially be implemented with other tools. For example, no-sql solutions like GraphQL or MongoDB. However, I'd like to explore the possibility of getting this approach to work before considering adding new systems/layers into the project's tech stack.
Upgrading + database migration
This PR changes the sql schema and adds a new non-nullable column to an existing table. Typically, we'd modify the existing migration tables (which breaks existing dbs), and release this as a major version that requires an event-replay. This is easy for API contributors, however, these kinds of breaking changes are difficult for deployment, both internal and external, because event-replays increasingly take more time and resources.
So this PR is experimenting with an more advanced sql schema migration which is compatible with existing deployments and does not require a major version bump or event-replay. Migration libraries (including the one we use) tend not to support advanced table manipulation like what this PR requires, so a new function is called directly after the regular migrations in order to perform the "advanced" migrations. In this case, it takes around half an hour to run this migration because it involves querying and updating every row in the
contract_logstable (which internally postgres treats as a DELETE and INSERT, without the ability to batch).In a future release when a breaking change is strictly required, the "advanced migration" function for this case can be dropped.
TODO