Skip to content

Feature: Dynamic EMR clusters #86

@yanivkrol

Description

@yanivkrol

Have you searched for this feature request?

  • I searched but did not find similar requests

Problem Statement

Today the MCP requires all the spark servers to be configured in advance in config.yaml.
This is not useable for teams that have dozens and hundreds of static or ephemeral EMR clusters.

Possible Solution

The MCP tools should be able to accept cluster ID/name/ARN dynamically from the agent.

  • In case it's ARN, create the client with the same logic.
  • in case it's ID, get the cluster ARN by ID using EMR API and then create the client with the same logic.
  • in case it's name, find the cluster ARN by name using EMR API and then create the client with the same logic.

cache:
global cache by id/arn
session scoped cache by name (since different emr cluster can reuse a terminated cluster's name)

current static servers configuration options should be kept. In order to use the new feature, set dynamic_emr_clusters_mode=true in the configuration or env.
in case dynamic_emr_clusters_mode=true, server can not be specified (mutual exclusion of modes).

example prompts:

  • by name:

    use spark mcp to understand how long did application_1711941627784_93864
    take on cluster in-site-graviton-prod

  • by ID:

    use spark mcp to understand how long did application_1711941627784_93864
    take on cluster j-17MUJH7WF1HKH

  • by ARN:

    use spark mcp to understand how long did application_1711941627784_93864
    take on cluste with ARN arn:aws:elasticmapreduce:us-east-1:135511037392:cluster/j-I4VIWMNGOIP7

* I have a draft PR for this that I will open soon

Alternatives Considered

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions