Skip to content

Commit f08dcf3

Browse files
committed
adding pct howto doc and removing outdated PCT Report
1 parent 8df7789 commit f08dcf3

File tree

2 files changed

+346
-1046
lines changed

2 files changed

+346
-1046
lines changed

docs/community/legend-pct-howto.md

Lines changed: 346 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,346 @@
1+
### Required Development Setup
2+
Set up develoment environment for *legend-pure* and *legend-engine* repositories:
3+
- https://github.com/finos/legend-pure/blob/master/README.md#development-setup
4+
- https://github.com/finos/legend-engine/blob/master/README.md#development-setup
5+
- Note: legend-engine depends on legend-pure. **Modify** ```legend.pure.version``` of the legend-engine pom.xml to depend on the legend-pure **SNAPSHOT** version, once you have built legend-pure and before building legend-engine
6+
7+
#### Test your Development Setup
8+
Once you have built the below repos, try running Pure code
9+
10+
*legend-pure*
11+
- ```TestFunction_TestHelper_Compiled.java```
12+
- ```TestFunction_TestHelper_Interpreted.java```
13+
14+
15+
*legend-engine*
16+
17+
Provided with PureIDE is a ```welcome.pure``` file. F9 within this file to execute pure code.
18+
19+
# PCT Contribution Steps
20+
*Awareness of the dev environment in which each step of the development loop occurs is key to maintaining velocity.*
21+
22+
| Language | Development Environment |
23+
| --- | --- |
24+
| Pure | [PureIDE](https://github.com/finos/legend-engine/blob/master/README.md#starting-pure-ide) |
25+
| Java | IntelliJ |
26+
27+
The Pure Runtime cross-compiles *Pure native functions* to multiple targets. Native functions have a reference implementation in the Pure Runtime, and are cross-compiled by the Pure Runtime to *equivalent* implementations in target runtimes.
28+
29+
Pure has two code paths for cross-compilation
30+
1) *Interpreted* - Pure code is compiled on the fly (useful for interactive dev).
31+
2) *Compiled* - Pure code has already been compiled to a jar (useful for prod).
32+
33+
Note that PCT Function can be Pure-only (implemented in Pure), or "Native" (implemented in Java). The below steps cover Native Functions, but Pure-only PCT should follow the same principles and similar dev loops.
34+
35+
## Native PCT Functions
36+
#### 1. Define the: a) the Native Function Signature, and b) Reference/Spec Implementation
37+
The *Pure Reference/Spec Implementation* encapsulates the platform-specific expectations for the behavior of the function; it is implemented as a *Native Function*. It executes natively in the Pure Runtime and thus is written in Java.
38+
39+
When defining a Native Function Signature, it is ***important to compare the equivalent function on at least 2 targets***. A minimum of 2 targets helps to ensure our function signature and reference implementation is properly abstracted.
40+
41+
##### Example
42+
These two queries will not yield equivalent results. Further, you can see that the parameters they accept are different.
43+
```Java
44+
// DuckDb
45+
Select time_bucket(Interval '2 Day', timestamp '2024-01-31 00:32:34');
46+
// results in
47+
2024-01-31 00:00:00
48+
49+
// Snowflake
50+
SELECT TIME_SLICE(TIMESTAMP_FROM_PARTS(2024, 01, 31, 00, 32, 34), 2, 'DAY', 'START')
51+
// results in
52+
2024-01-30 00:00:00.000
53+
```
54+
Understanding the minimal common elements to determine the appropriate Native Function signature is key. Consider the set of parameters and input/output required for cross-platform compilation capabilities.
55+
56+
```Java
57+
// for the above, the native function signature we ended up with was:
58+
native function
59+
<<PCT.function>>
60+
{
61+
doc.doc='calculates a time bucket for DateTime, where the bucket is of size quantity unit (e.g. 5 Hour). Uses unix Epoch as the origin for the calculation.'
62+
}
63+
meta::pure::functions::date::timeBucket(date:DateTime[1], quantity:Integer[1], unit:DurationUnit[1]):DateTime[1];
64+
65+
//notice how we have a docstring as part of the function signature
66+
```
67+
**Important: docstrings are Required**
68+
69+
In this example, we only needed one function signature. A native function can have more than one signature. **All signatures for the same native function should be written in the same file.** E.g. the above native function signature and its associated PCT functions exist in *timeBucket.pure*.
70+
71+
###### Dev Envs: PureIDE (native function signature); IntelliJ (reference/spec implementaiton), Target runtimes for testing behavior of equivalent function (e.g. DuckDb, Snowflake)*
72+
73+
#### 2. Native Function Development Loop
74+
##### 2.i. Start by writing a draft native function signature in pure
75+
Depending on which repo was determined in Step 2, you would use welcome.pure in legend-engine or TestHelper_Interpreted.java in legend-pure.
76+
77+
##### How to use TestFunction_TestHelper_Interpreted.java
78+
TestFunction_TestHelper_Interpreted.java (legend-pure)
79+
```Java
80+
@After
81+
public void cleanRuntime()
82+
{
83+
runtime.delete("myScratchFn.pure");
84+
}
85+
86+
protected static FunctionExecution getFunctionExecution()
87+
{
88+
return new FunctionExecutionInterpreted();
89+
}
90+
91+
@Test
92+
public void testNativeFunctionTesterHelperBeforeAddingToPCT()
93+
{
94+
compileTestSource("myScratchFn.pure",
95+
"native function meta::pure::functions::standard::myScratchFn(myString:String[1]):String[2];" +
96+
// this is an example of a pure function call
97+
"function meta::functions::myScratchFn(myString:String[1]):String[2]" +
98+
"{" +
99+
" println('123');" +
100+
" let myStrList = [$myString, $myString];" +
101+
" println($myStrList);" +
102+
" $myStrList;" +
103+
"}" +
104+
"function test():Any[*]\n" +
105+
"{" +
106+
" let myString = 'abc';" +
107+
" meta::functions::myScratchFn($myString);" +
108+
// if you call this function the call would fail due to missing wiring
109+
" meta::pure::functions::standard::myScratchFn($myString); " +
110+
"}");
111+
this.execute("test():Any[*]");
112+
runtime.delete("myScratchFn.pure");
113+
}
114+
```
115+
116+
##### How to use PureIDE welcome.pure (legend-engine)
117+
The same code outlined in the second param of "compileTestSource" in the TestHelper example above, can be written in the go():Any[*] function of welcome.pure. In PureIDE, hitting F9 will execute the code.
118+
```Java
119+
function go():Any[*]
120+
{
121+
// Your pure code here
122+
}
123+
```
124+
125+
###### Dev Envs: PureIDE welcome.pure (legend-engine) or IntelliJ TestFunction_TestHelper_Interpreted.java (legend-pure)
126+
127+
##### 2.ii. Determine where the signature of the PCT Function and associated tests should reside.
128+
Once you know how to run pure code, begin your development loop. Leverage the PCT Taxonomy conventions and the results of weekly Taxonomy review sessions to determine the repo/package in which to put your pure code.
129+
130+
###### PCT Taxonomy Conventions
131+
The structure of *.pure files containing Platform Functions has been harmonized to reflect the four categories of Pure Functions:
132+
1. Grammar (legend-pure) - functions needed for the grammar of the platform. Should almost never be modified. If needed, consult with a Legend CODEOWNER.
133+
2. Essential (legend-pure) - foundational functions required for running tests int he platform (e.g. assert, eq). Should almost never be modified. If needed, consult with a Legend CODEOWNER.
134+
3. Standard (legend-engine) - majority of platform functions will fall into this category
135+
4. Relation (legend-engine) - platform functions specific to operating on relations (e.g. join)
136+
137+
It was not yet possible to harmonize pure packages to the new taxonomy reflected by the filesystem. Where it makes sense (e.g. adding a brand new package or function), the package name should mirror the new taxonomy reflected by the filesystem. The best example of the ideal taxonomy is the directory structure of legend-pure essential functions; however, please consult with an SME to confirm the proposed package/naming for your PCT function. In the future, pure package naming will be harmonized to follow the new taxonomy.
138+
139+
##### Example
140+
```Java
141+
// PCT Function Signature containing docstring; note that it specifies the expected params, output, and multiplicity
142+
native function
143+
<<PCT.function>>
144+
{
145+
doc.doc='calculates a time bucket for DateTime, where the bucket is of size quantity unit (e.g. 5 Hour). Uses unix Epoch as the origin for the calculation.'
146+
}
147+
meta::pure::functions::date::timeBucket(date:DateTime[1], quantity:Integer[1], unit:DurationUnit[1]):DateTime[1];
148+
149+
// this is a "standard" function and so its signature and implemention live in this package:
150+
legend-engine-core/legend-engine-core-pure/legend-engine-pure-code-functions-standard/
151+
152+
// in the above function signature, you can see the current pure package is
153+
meta::pure::functions::date
154+
155+
// filesystem taxonomy of timeBucket.pure file containing the PCT function (in legend-engine)
156+
core_functions_standard/date/operation/timeBucket.pure
157+
158+
// based on the filesystem taxonomy, ideally the package would be named thusly:
159+
meta::pure::functions::date::operation
160+
161+
//however, in this instance we kept the existing pure package taxonomy for dates - until the taxonomy has been upgraded, please speak to an SME to double check your proposed package/naming
162+
```
163+
164+
###### Dev Envs: PureIDE code search (legend-engine) IntelliJ code search (legend-pure)
165+
166+
##### 2.iii. Start writing some Pure Compatibility Tests (PCT) to encapsulate the understanding gained in 1., being careful to test for edge cases.
167+
**PCT Tests should be written in the same file as your native function signature**
168+
169+
Note: test package naming is important (examples below).
170+
###### Example
171+
```Java
172+
// Example Good package name: a test package named like so
173+
meta::pure::functions::date::tests::timeBucket
174+
175+
// can be registered in expectedFailuers with one line for unsupported targets
176+
pack("meta::pure::functions::date::tests::timeBucket", "\"meta::pure::functions::date::timeBucket_DateTime_1__Integer_1__DurationUnit_1__DateTime_1_ is not supported yet!\"")
177+
178+
// Example Bad package name: a test package named like so can only be registered in expectedFailures one by one
179+
meta::pure::functions::date::tests
180+
```
181+
182+
###### Dev Envs: PureIDE welcome.pure (legend-engine) or IntelliJ TestFunction_TestHelper_Interpreted.java (legend-pure)
183+
184+
##### 2.iv. Write the Reference/Spec implementation
185+
Your native function implementation (Java code) will be split across 3 modules in the taxonomy package:
186+
1. -compiled- the compiled module of the function taxonomy will contain the wiring needed by the Pure Runtime to *generate the compiled* java code
187+
2. -interpreted- the interpreted module of the function taxonomy will contain the wiring needed for Pure Runtime to *interpret* and compile pure code on the fly
188+
3. -shared- the class containing the code common to 1 and 2 should be put in this module. These utility classes help avoid duplication.
189+
190+
Note that you can wire the function in even if it hasn't been fully written. Use the helpers described in step 3.i. to test out the wiring of your native function.
191+
192+
###### Example
193+
see [TimeBucketShared in the example PR](https://github.com/finos/legend-engine/pull/3491/files#diff-6fdf9132208dc083a35c61ff70294ea2eb3b51e12f167a89f69c4a66bc741c6a).
194+
195+
###### Dev Env: IntelliJ
196+
197+
##### 2.v. Running PCTs against Native Function
198+
Once the native function is wired across the compiled and interpreted paths, you can loop and iterate to refine the signature, implementation, and tests, using the *InMemory PCT Adapter*. PCT Adapters are functions that enable the cross-compilation and evaluation of PCT functions on target environments. The InMemory adapter does a pass-through to the Pure Runtime and is useful for running the native reference/spec implementation.
199+
200+
###### Example Usage
201+
```Java
202+
let inmemoryadapter = meta::pure::test::pct::testAdapterForInMemoryExecution_Function_1__X_o_;
203+
204+
// runs the PCT "testTimeBucketSeconds" against Reference Implementation (eval in pure runtime)
205+
meta::pure::functions::date::tests::testTimeBucketSeconds($inmemoryadapter);
206+
```
207+
208+
###### Dev Envs: PureIDE welcome.pure (legend-engine) or IntelliJ TestFunction_TestHelper_Interpreted.java (legend-pure)
209+
210+
#### 3. Target (cross-compilation) development loop
211+
Once you have completed the Native Function development loop, it is time to wire your native function to cross-compile to the target environments (legend-engine)
212+
213+
##### 3.i. PCTs and PureToTarget wiring (e.g. PureToSQL).
214+
PCT are useful for testing your wiring, as they are evaluated in the context of the target platform. The assert happens in the Pure Runtime, but the assertion is over the result of an eval on the target platform (i.e. that the cross-compilation behaves as the platform expects).
215+
216+
PCT Adapters are functions that enable the cross-compilation of PCT to target environments. To find a current list of adapters, search for the phrase ```<<PCT.adapter>>``` in PureIDE. Notes on selected adapter are below:
217+
218+
| Target | Notes |
219+
| --- | --- |
220+
| Snowflake | Snowflake integration requires connection to your Snowflake account. Look at the code in SnowflakeTestConnectionIntegration and set up neded env vars for accessing snowflake in order to run this adapter. |
221+
| Java Platform Binding | Legend platform has the capability of generating java code from pure code. Think of it as the equivalent of pureToSQL but it's pureToJava instead. This is distinct from Pure Runtime java code. This is code generated by the Pure Runtime itself to encapsulate queries meant to run on a Java Target Environment. |
222+
223+
##### Example Target Adapter Usage
224+
```Java
225+
let duckdbadapter = meta::relational::tests::pct::testAdapterForRelationalWithDuckDBExecution_Function_1__X_o_;
226+
227+
// runs the PCT "testTimeBucketSeconds" against DuckDb (eval on DuckDb)
228+
meta::pure::functions::date::tests::testTimeBucketSeconds($duckdbadapter);
229+
```
230+
231+
##### Example Target SQL wiring for timeBucket native function
232+
```Java
233+
// PureToDuckDb - this code was added in duckdbExtension.pure
234+
dynaFnToSql('timeBucket', $allStates, ^ToSql(format='cast(time_bucket(%s) as timestamp_s)', transform={p:String[3] | constructIntervalFunction($p->at(2), $p->at(1)) + ', ' + $p->at(0) + ', ' + constructTimeBucketOffset($p->at(2))})),
235+
236+
// DuckDb uses a different origin for calculation of timebuckets; this offset helps to standardize toward unix epoch as origin and
237+
// the offset for intervals < WEEK are set to align with Snowflake's methodology, as opposed to that which is outlined in DuckDb
238+
// ref: https://github.com/duckdb/duckdb/blob/68bd4a5277430245e3d9edf1abbb9813520a3dff/extension/core_functions/scalar/date/time_bucket.cpp#L18
239+
function meta::relational::functions::sqlQueryToString::duckDB::constructTimeBucketOffset(unit:String[1]):String[1]
240+
{
241+
let unitWithoutQuotes = $unit->removeQuotesIfExist();
242+
let ISOMondayEpochOffset = 'timestamp \'1969-12-29 00:00:00\'';
243+
let EpochOffset = 'timestamp \'1970-01-01 00:00:00\'';
244+
245+
let offset = [
246+
pair(DurationUnit.YEARS->toString(), $EpochOffset),
247+
pair(DurationUnit.MONTHS->toString(), $EpochOffset),
248+
pair(DurationUnit.WEEKS->toString(), $ISOMondayEpochOffset),
249+
pair(DurationUnit.DAYS->toString(), $EpochOffset),
250+
pair(DurationUnit.HOURS->toString(), $EpochOffset),
251+
pair(DurationUnit.MINUTES->toString(), $EpochOffset),
252+
pair(DurationUnit.SECONDS->toString(), $EpochOffset)
253+
]->filter(p | $p.first == $unitWithoutQuotes).second->toOne('Unit not found: ' + $unitWithoutQuotes);
254+
}
255+
256+
//------------------------
257+
// PureToSnowflake - this code was added in snowflakeExtension.pure
258+
dynaFnToSql('timeBucket', $allStates, ^ToSql(format='TIME_SLICE(%s)', transform={p:String[3]|$p->at(0) + ', ' + constructInterval($p->at(2), $p->at(1))})),
259+
260+
function meta::relational::functions::sqlQueryToString::snowflake::constructInterval(unit:String[1], i:String[1]):String[1]
261+
{
262+
let unitWithoutQuotes = $unit->removeQuotesIfExist();
263+
264+
let interval= [
265+
pair(DurationUnit.YEARS->toString(), '\'YEAR\''),
266+
pair(DurationUnit.MONTHS->toString(), '\'MONTH\''),
267+
pair(DurationUnit.WEEKS->toString(), '\'WEEK\''),
268+
pair(DurationUnit.DAYS->toString(), '\'DAY\''),
269+
pair(DurationUnit.HOURS->toString(), '\'HOUR\''),
270+
pair(DurationUnit.MINUTES->toString(), '\'MINUTE\''),
271+
pair(DurationUnit.SECONDS->toString(), '\'SECOND\'')
272+
]->filter(p | $p.first == $unitWithoutQuotes).second->toOne('Unit not supported: ' + $unitWithoutQuotes);
273+
274+
$i + ', ' + $interval;
275+
}
276+
```
277+
278+
###### Dev Envs: PureIDE (legend-engine)
279+
280+
##### 3.ii. Native Function Registration in Handlers.java and other relevant files.
281+
To find the places where you may need to wire in the function, you should Ctrl+Shift+F (IntelliJ) to see how similar functions were wired.
282+
283+
###### Dev Env: IntelliJ
284+
285+
##### 3.iii. Update ExpectedFailures of PCT Targets where you have not yet implemented the function.
286+
In legend-engine, Ctrl+Shift+N (IntelliJ file-search) in IntelliJ for ```Test_*_PCT``` to find the relevant files of targets where you will need to register your expectedFailures.
287+
288+
You can build these modules individually to run the tests - avoid rebuilding the entire project.
289+
290+
###### Dev Env: IntelliJ
291+
292+
##### 3.iv. Leverage Adapters to easily run the PCT Tests written in the Native Function Development Loop against target environments.
293+
You may need to loop between this step and earlier steps to improve what you wrote previously. Remember that if you are developing in legend-pure, ```TestFunction_TestHelper_Compiled.java``` and ```TestFunction_TestHelper_Interpreted.java``` enable you to run pure code without having to rebuild/restart PureIDE in legend-engine.
294+
295+
###### Dev Envs: PureIDE (legend-engine) or IntelliJ TestFunction_TestHelper_Interpreted.java (legend-pure)
296+
297+
-----------------
298+
#### 4. FINAL Step: Preparing for PR
299+
This should be the Final Step and should only happen Once. ***Maven builds are expensive and should be avoided until absolutely necessary.***
300+
Run mvn clean install with tests in order to identify any potential tests that could fail due to your new module
301+
302+
##### Example command to build with threads
303+
```
304+
mvn clean install -T 3
305+
```
306+
307+
#### Example PRs
308+
##### New Native Function
309+
[timeBucket](https://github.com/finos/legend-engine/pull/3491/files)
310+
311+
##### Conversion of Existing Function to PCT
312+
<https://github.com/finos/legend-engine/pull/3424/files#diff-00c42b86368a6fcd15328a4041dbacdd70df22b3ecab95e3c75138af224c3f2e>
313+
314+
-------------------------
315+
316+
# Appendix
317+
## Legend Platform Conventions
318+
Conventions ensure that we can achieve **Goal - a clean (minimal) and transparent platform api**.
319+
320+
It is critical that utmost care is taken when deciding on:
321+
* Function Signature
322+
* Code Location
323+
* Naming/Style
324+
325+
### Style
326+
- One file per PCT Function
327+
- All Function Signatures for the PCT Function belong in the same file
328+
- All PCT tests for the PCT function belong in the same file
329+
- package names are all lower-case
330+
- function names are camelCase, with the first letter lower-case
331+
- function signatures **must have docstring (doc.doc)**
332+
333+
### Practices
334+
PCT measure the level of cross-target support for a given Platform Function. When contributing to PCT on Legend, keep this preference order in mind:
335+
336+
``` PCT Passed(Green) > Failed PCT with Good Error Message > Failed PCT ```
337+
338+
*One key priority is to improve error messages on the platform - Good Error Messages are important.*
339+
340+
Note: It is highly unlikely you will need to make changes to existing reference specs/implementations. If you feel the need, contact a CODEOWNER on your proposed change.
341+
342+
343+
344+
345+
346+

0 commit comments

Comments
 (0)