Skip to content

SparkMLModel SAGEMAKER_SPARK_ML_SCHEMA can only accept 16 features #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rchazelle opened this issue May 16, 2020 · 4 comments
Open

Comments

@rchazelle
Copy link

Hello, I would like to understand why this limitation is in place. Presumably most machine learning models take in much more than 16 features.

I created a model and had over 100 features. I tried to pass in all those features to my SAGEMAKER_SPARK_ML_SCHEMA but got the following error:

An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]
Traceback (most recent call last):
  File "<stdin>", line 46, in deploy_model
  File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 479, in deploy
    self._create_sagemaker_model(instance_type, accelerator_type, tags)
  File "/usr/local/lib/python2.7/site-packages/sagemaker/model.py", line 195, in _create_sagemaker_model
    tags=tags,
  File "/usr/local/lib/python2.7/site-packages/sagemaker/session.py", line 2125, in create_model
    self.sagemaker_client.create_model(**create_model_request)
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python2.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: 1 validation error detected: Value '{SAGEMAKER_SPARKML_SCHEMA={"input": [list_of_column_names_and_types_omitted_due_to_privacy], "output": {"type": "double", "name": "prediction"}}}' at 'primaryContainer.environment' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\S\s]*]

list_of_column_names_omitted_due_to_privacy is the correctly formatted input, the names are not > 1024 characters, all of them are less than 50 chacters.

This led me to some googling and I found the following at: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html
under create_model

Environment (dict) --
The environment variables to set in the Docker container. Each key and value in the Environment string to string map can have length of up to 1024. We support up to 16 entries in the map.

So I reduced the number to features to 15 and it works. How can I make this work for 100+ features? My pipeline includes a bunch of StringIndexers -> OneHotEncoderEstimators.

I tried to increase it to 17, that worked. I tried 53 next, that didn't work. 117 was what I first tried and that also doesn't work.

@orchidmajumder
Copy link
Contributor

For you, right now, I feel the best bet would be to build a Docker image using the code from this repository and then define the schema as environment variable in your Dockerfile itself. The limitation you are facing is of SageMaker platform, not this library per se.

@rchazelle
Copy link
Author

Sweet thanks for the response. Is there a github for that or should I reach out to AWS directly?

@orchidmajumder
Copy link
Contributor

That's part of the standard AWS SDK for SageMaker. You probably need to reach out to AWS for that to pass the request on to the appropriate service team.

@chelseacjole1
Copy link

Same issue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@orchidmajumder @chelseacjole1 @rchazelle and others