Skip to content

Commit 295d109

Browse files
HenriqueTolentinoyaminialexahaushalterkirit93nina-xu
authored
consoleGA Blueprint Release (#574)
* Update blueprints for launch (#553) * add evaluate step * update blueprints for GA * new blueprint cards for DD & SS - Pulled in notebooks and new structure for DD notebooks from main - Added new blueprint cards -- DD multi-turn, rag-eval, text-to-code -- SS hipaa and gdpr - Added details pages - Added sample datasets for SS * fixed details file link and added placeholder details for transfor * updates to links, new transform details * update copy * minor changes * tweaks * still tweaking copy * copy changes * Navigator removal & Transform config updates (#558) * Add otnotes5 data sample and preview (#562) * Lots of blueprint updates for GA & v2 (#561) * Many blueprint updates for GA * Adding rope_scaling_factor to the default config * Fixing where rope_scaling_factor goes * Updating .md files to remove outdated info & v1 notebook links * Reducing tabFT num records to 1000 default for faster runs * Removing old images & icons * Fixing name of file * Trying to get rid of errors * underscore to hyphen typo * Changing model blueprints to use a workflow config (#563) * Changing model blueprints to use a workflow config * Updating to use new evaluate task name * Updating task names (#564) * updating task names * Swapping order for consistency * Fixing hyphen vs underscore typo * Updated dd image for blueprint card (#569) * Updated dd image * Updated blueprint card to add 101 notebook * Updated gretel.json --------- Co-authored-by: Kirit93 <kthadaka@nvidia.com> * Updated notebook links (#572) Co-authored-by: Kirit93 <kthadaka@nvidia.com> * Fixing small typo (#573) * Updating num_records to be 1000 across the board (#575) * Update gretel.json with DD notebook link (#580) * Update gretel.json (#583) * nit fixes to workflow blueprints (#585) * nit: make spacing consistent * fix num_records for one blueprint * change hyphen to underscore in task names * add collab links pointing to main and 101 notebook (#587) * Update workflow tests * Improve tests * Ensure we catch errors that are returned with 200 * Also log in "200 with error" responses --------- Co-authored-by: Yamini <yamini@users.noreply.github.com> Co-authored-by: alexahaushalter <alexahaushalter@hotmail.com> Co-authored-by: Kirit Thadaka <kirit.thadaka@gmail.com> Co-authored-by: Kirit93 <kthadaka@nvidia.com> Co-authored-by: Nina Xu <nina.ning.xu@gmail.com> Co-authored-by: Matt Kornfield <mckornfield@gmail.com> Co-authored-by: Henrique Tolentino <htolentino@nvidia.com>
1 parent 73bfa97 commit 295d109

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+9084
-462
lines changed

config_templates/gretel/synthetics/navigator-ft-differential-privacy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ models:
1717
order_training_examples_by: null
1818

1919
generate:
20-
num_records: 5000
20+
num_records: 1000
2121

2222
# With DP, enabling structured generation can help with
2323
# increasing the percentage of valid records.

config_templates/gretel/synthetics/navigator-ft.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ models:
1717
order_training_examples_by: null
1818

1919
generate:
20-
num_records: 5000
20+
num_records: 1000
2121

2222
params:
2323
# The parameter below is a proxy for training time.
@@ -28,4 +28,13 @@ models:
2828
# (we downsample), larger (we resample), or the same
2929
# size as your input dataset. A starting value to
3030
# experiment with is 25,000.
31-
num_input_records_to_sample: auto
31+
num_input_records_to_sample: auto
32+
33+
# Scale the base LLM's context length by this factor
34+
# using RoPE scaling to handle datasets with more
35+
# columns, or datasets containing groups with more
36+
# than a few records. You can try increasing the
37+
# rope_scaling_factor (you could first try the value 2)
38+
# if you hit an error for maximum tokens. It must be
39+
# an integer value. The default is 1 and maximum is 6.
40+
rope_scaling_factor: 1

config_templates/gretel/tasks/tabular_ft__default.yaml

Lines changed: 31 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,38 @@ name: default
33
task:
44
name: tabular_ft
55
config:
6-
train:
7-
# Optionally group records by the column(s) set below.
8-
# This is useful if you need to maintain correlations
9-
# across multiple records. Otherwise, the training
10-
# assumes records are independent.
11-
group_training_examples_by: null
6+
train:
7+
# Optionally group records by the column(s) set below.
8+
# This is useful if you need to maintain correlations
9+
# across multiple records. Otherwise, the training
10+
# assumes records are independent.
11+
group_training_examples_by: null
1212

13-
# Optionally order records by the column set below.
14-
# This is useful if your records are sequential.
15-
# Note that this parameter can only be used when
16-
# your records are grouped using the above parameter.
17-
order_training_examples_by: null
13+
# Optionally order records by the column set below.
14+
# This is useful if your records are sequential.
15+
# Note that this parameter can only be used when
16+
# your records are grouped using the above parameter.
17+
order_training_examples_by: null
1818

19-
params:
20-
# The parameter below is a proxy for training time.
21-
# If set to 'auto', we will automatically choose an
22-
# appropriate value. An integer value will set the
23-
# number of records from the input dataset that the
24-
# model will see during training. It can be smaller
25-
# (we downsample), larger (we resample), or the same
26-
# size as your input dataset. A starting value to
27-
# experiment with is 25,000.
28-
num_input_records_to_sample: auto
29-
30-
generate:
31-
num_records: 5000
19+
params:
20+
# The parameter below is a proxy for training time.
21+
# If set to 'auto', we will automatically choose an
22+
# appropriate value. An integer value will set the
23+
# number of records from the input dataset that the
24+
# model will see during training. It can be smaller
25+
# (we downsample), larger (we resample), or the same
26+
# size as your input dataset. A starting value to
27+
# experiment with is 25,000.
28+
num_input_records_to_sample: auto
3229

30+
# Scale the base LLM's context length by this factor
31+
# using RoPE scaling to handle datasets with more
32+
# columns, or datasets containing groups with more
33+
# than a few records. You can try increasing the
34+
# rope_scaling_factor (you could first try the value 2)
35+
# if you hit an error for maximum tokens. It must be
36+
# an integer value. The default is 1 and maximum is 6.
37+
rope_scaling_factor: 1
3338

39+
generate:
40+
num_records: 1000

config_templates/gretel/tasks/tabular_ft__differential_privacy.yaml

Lines changed: 50 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -3,46 +3,53 @@ name: differential_privacy
33
task:
44
name: tabular_ft
55
config:
6-
train:
7-
# Optionally group records by the column(s) set below.
8-
# This is useful if you need to maintain correlations
9-
# across multiple records. Otherwise, the training
10-
# assumes records are independent.
11-
group_training_examples_by: null
12-
13-
# Optionally order records by the column set below.
14-
# This is useful if your records are sequential.
15-
# Note that this parameter can only be used when
16-
# your records are grouped using the above parameter.
17-
order_training_examples_by: null
18-
19-
privacy_params:
20-
dp: true
21-
22-
# Defines the privacy budget - the larger the value, the
23-
# less privacy we get. A value between 2 and 8 is deemed
24-
# reasonable, usually.
25-
epsilon: 8
26-
27-
params:
28-
# The parameter below is a proxy for training time.
29-
# If set to 'auto', we will automatically choose an
30-
# appropriate value. An integer value will set the
31-
# number of records from the input dataset that the
32-
# model will see during training. It can be smaller
33-
# (we downsample), larger (we resample), or the same
34-
# size as your input dataset. A starting value to
35-
# experiment with is 25,000.
36-
num_input_records_to_sample: auto
37-
38-
# You can try increasing this until you run out-of-memory.
39-
batch_size: 4
40-
41-
generate:
42-
num_records: 5000
43-
44-
# With DP, enabling structured generation can help with
45-
# increasing the percentage of valid records.
46-
use_structured_generation: true
47-
48-
6+
train:
7+
# Optionally group records by the column(s) set below.
8+
# This is useful if you need to maintain correlations
9+
# across multiple records. Otherwise, the training
10+
# assumes records are independent.
11+
group_training_examples_by: null
12+
13+
# Optionally order records by the column set below.
14+
# This is useful if your records are sequential.
15+
# Note that this parameter can only be used when
16+
# your records are grouped using the above parameter.
17+
order_training_examples_by: null
18+
19+
privacy_params:
20+
dp: true
21+
22+
# Defines the privacy budget - the larger the value, the
23+
# less privacy we get. A value between 2 and 8 is deemed
24+
# reasonable, usually.
25+
epsilon: 8
26+
27+
params:
28+
# The parameter below is a proxy for training time.
29+
# If set to 'auto', we will automatically choose an
30+
# appropriate value. An integer value will set the
31+
# number of records from the input dataset that the
32+
# model will see during training. It can be smaller
33+
# (we downsample), larger (we resample), or the same
34+
# size as your input dataset. A starting value to
35+
# experiment with is 25,000.
36+
num_input_records_to_sample: auto
37+
38+
# Scale the base LLM's context length by this factor
39+
# using RoPE scaling to handle datasets with more
40+
# columns, or datasets containing groups with more
41+
# than a few records. You can try increasing the
42+
# rope_scaling_factor (you could first try the value 2)
43+
# if you hit an error for maximum tokens. It must be
44+
# an integer value. The default is 1 and maximum is 6.
45+
rope_scaling_factor: 1
46+
47+
# You can try increasing this until you run out-of-memory.
48+
batch_size: 4
49+
50+
generate:
51+
num_records: 1000
52+
53+
# With DP, enabling structured generation can help with
54+
# increasing the percentage of valid records.
55+
use_structured_generation: true

config_templates/gretel/tasks/tabular_gan__default.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ task:
1313
batch_size: auto
1414
auto_transform_datetimes: False
1515
generate:
16-
num_records: 5000
16+
num_records: 1000

config_templates/gretel/tasks/text_ft__default.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,5 @@ task:
1313
lr_scheduler: "linear"
1414
learning_rate: 0.0001
1515
generate:
16-
num_records: 80
16+
num_records: 1000
1717
maximum_text_length: 100

config_templates/gretel/tasks/text_ft__differential_privacy.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
schema_version: "1.0"
2-
name: default
2+
name: differential_privacy
33
task:
44
name: text_ft
55
config:
@@ -23,5 +23,5 @@ task:
2323
epsilon: 5 # Privacy budget (lower values = stronger privacy)
2424
delta: auto # Probability of privacy leakage (auto-calculated)
2525
generate:
26-
num_records: 80 # Number of records to generate
26+
num_records: 1000 # Number of records to generate
2727
maximum_text_length: 128 # Maximum length of generated texts in tokens

0 commit comments

Comments
 (0)