Calculate member based ensemble loss without running the forward pass twice. #26318

HarveySouth · 2025-02-04T22:29:36Z

HarveySouth
Feb 4, 2025

I want to implement negative correlation learning (NCL) in JAX. NCL is a regression ensemble training algorithm which updates each member in the ensemble with it's own loss function: the squared error between the member prediction and the target value AND the squared error between the ensemble output and the member prediction

member_loss = (0.5 * jnp.square(member_output - y)) 
    - (λ * jnp.square(ensemble_output - member_output))

Ideally I can:

run the ensembles in parallel
find the ensemble output (in the case of NCL this is just the mean: jnp.mean(predictions) )
Update each ensemble member with each member's own call to the loss function.

I'm having two difficulties:

The gradient of the member's loss with respect to it's weights has a dependency on the ensemble output
Since the ensemble output is needed in the member_loss, I'm finding it hard to overcome the first difficulty without running the whole ensemble twice for every training step.

I solved this inefficiently in PyTorch by ignoring the second difficulty, and just running the ensemble twice:

with torch.no_grad():
# runs all ensemble members with vmap to get an output (member_number, batch_number, output_size)
        _, member_outputs = ens_runner.forward(batch_x, ensemble_models) 

for i, member in enumerate(ensemble_models):
        optims[i].zero_grad()
        member_prediction = member(batch_x)
        member_grad_output = torch.cat((member_output[:i], member_prediction.unsqueeze(dim=0), member_output[i+1:]))
        ens_grad_output = torch.mean(member_grad_output)
        losses[i] = ((0.5 * squared_error(member_prediction, batch_y.unsqueeze(dim=-1))) 
            - (λ * squared_error(member_prediction, ens_grad_output)))
        losses[i].backward()
        optims[i].step()

I've been able to run the ensemble members in parallel with vmap as I did in python, but I haven't been able to come up with an alternative approach to efficiently running the training step with JAX and looking for help

model_init_key = jax.random.PRNGKey(SEED)
model_keys = jax.random.split(model_init_key, n_ensemble_members)

# simple as possible flax linen FFNN to make a homogeneous ensemble
base_model = SimpleFFNN(hidden_dim=base_model_hidden_dim, output_dim=base_model_output_dim) 
representative_input = jnp.zeros((1, 11))

params = list(map(lambda x, y: x.init(y, representative_input), [base_model]*n_ensemble_members, list(model_keys)))
params = jax.tree_util.tree_map(lambda *x: jnp.stack(x), *params)

base_optimizer = optax.sgd(learning_rate = lr)
optims = map(lambda x, y: x.init(y), [base_optimizer]*n_ensemble_members, params)

v_forward = jax.vmap(base_model.apply, in_axes=(0,None), out_axes=0)

for epoch in range(epoch_num):
    ...
    for step, (batch_x, batch_y) in enumerate(training_set):
        ...
        predictions = v_forward(params, batch_x)

HarveySouth · 2025-02-09T03:15:11Z

HarveySouth
Feb 9, 2025
Author

I've moved my simple FFNN from linen to nnx and come up with:

@nnx.jit
def jit_loss_calculation(member_prediction, training_labels, non_current_member_outputs):
    member_error = (jnp.square(member_prediction - training_labels))
    
    member_contribution_to_ensemble = jnp.divide(member_prediction, n_ensemble_members)
    ensemble_centroid = member_contribution_to_ensemble + non_current_member_outputs

    member_diversity = jnp.square(ensemble_centroid - member_prediction)
    full_loss = member_error - (resolved_lambda * member_diversity)
    return full_loss.mean()

def run_ensemble_member_loss_and_grad_in_parallel(training_input, training_labels, shared_data, lock_memory, condition, member_index, model):
    def all_predictions_set():
        return all(lock_memory)
    def ncl_member_loss(model):
        member_prediction = model(training_input).squeeze()

        with condition:
            shared_data.at[member_index].set(member_prediction)
            lock_memory[member_index] = True
            condition.notify_all()
            
        with condition:
            condition.wait_for(all_predictions_set)

        jax.block_until_ready(shared_data)

        non_current_member_outputs = jnp.sum( jnp.concatenate((shared_data[:member_index], shared_data[member_index+1:])), axis=0) / n_ensemble_members

        return jit_loss_calculation(member_prediction, training_labels, non_current_member_outputs)
    
    return nnx.value_and_grad(ncl_member_loss)(model)


def setup_parallel_execution(training_input, training_labels, models):
    shared_data = jnp.zeros((n_ensemble_members, batch_size))
    lock_memory = [False] * n_ensemble_members
    condition = threading.Condition()  # Use threading.Condition to try and avoid deadlock with jax

    with concurrent.futures.ThreadPoolExecutor(max_workers=n_ensemble_members) as executor:
        futures = [executor.submit(run_ensemble_member_loss_and_grad_in_parallel, 
                                   training_input, training_labels, shared_data, lock_memory, condition, member_index, model) for member_index, model in enumerate(models)]
        losses_and_grads = [future.result() for future in futures]
    
    losses, grads = zip(*losses_and_grads)

    return losses, grads

with

for epoch in range(epoch_num):
    ...
    for step, (batch_x, batch_y) in enumerate(training_set):
        ...
        losses, grads = setup_parallel_execution(batch_x, batch_y, ensemble_models)
        for i in range(len(member_optimizers)):
            member_optimizers[i].update(grads[i])

Validity TBD, and definitely not the best solution, but seems to work and doesn't require running the ensemble more than necessary

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculate member based ensemble loss without running the forward pass twice. #26318

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Calculate member based ensemble loss without running the forward pass twice. #26318

Uh oh!

Uh oh!

HarveySouth Feb 4, 2025

Replies: 1 comment

Uh oh!

Uh oh!

HarveySouth Feb 9, 2025 Author

HarveySouth
Feb 4, 2025

HarveySouth
Feb 9, 2025
Author