How to do a simple parameter merge in a memory efficient manner? #17010

aniquetahir · 2023-08-07T19:29:03Z

aniquetahir
Aug 7, 2023

I have parameters of a moderately large model and I wrote a function to do LORA operation i.e. $(W + AB)x$ instead of $Wx$. where AB are the additional parameters. Now I want to add AB to W. Where AB is added to only some of the parameters (not all).
params are the original parameters as a nested NamedTuple
params_flat is just the flattened tree from params
`lora_params' are:

    lora_params = {
        'q_lora_A': q_lora_A,
        'q_lora_B': q_lora_B,
        'v_lora_A': v_lora_A,
        'v_lora_B': v_lora_B,
    }

Here is my code:

    @partial(jax.jit, static_argnames=('lora_r', 'lora_alpha'))
    def merge_lora_params(lora_params: Dict[str, chex.Array]):
        lora_r, lora_alpha = LORA_R, LORA_ALPHA

        assert isinstance(lora_params['q_lora_A'], chex.Array)
        assert isinstance(lora_params['q_lora_B'], chex.Array)
        assert isinstance(lora_params['v_lora_A'], chex.Array)
        assert isinstance(lora_params['v_lora_B'], chex.Array)
        assert lora_params['q_lora_A'].shape == (DB, H, N_REP, N_HEADS, LORA_R)
        assert lora_params['q_lora_B'].shape == (DB, LORA_R, N_REP, N_HEADS, D_K)
        assert lora_params['v_lora_A'].shape == (DB, H, N_HEADS, LORA_R)
        assert lora_params['v_lora_B'].shape == (DB, LORA_R, N_HEADS, D_K)

        q_AB = (lora_alpha/lora_r) * ops.einsum(
            lora_params['q_lora_A'],
            lora_params['q_lora_B'],
            'db h n_rep n_heads r, db r n_rep n_heads d_k -> db h n_rep n_heads d_k')

        v_AB = (lora_alpha/lora_r) * ops.einsum(
            lora_params['v_lora_A'],
            lora_params['v_lora_B'],
            'db h n_heads r, db r n_heads d_k -> db h n_heads d_k')

        return q_AB + params_flat[2], v_AB + params_flat[4]

    # warm up and compile
    for i in range(2):
        tmp = merge_lora_params(lora_params)

    @jax.jit
    def insert_q_v_params(q_params: chex.Array, v_params:chex.Array):
        return (*params_flat[0:2], q_params, params_flat[3], v_params, *params_flat[5:])

    # warm up and compile
    for i in range(2):
        _ = insert_q_v_params(params_flat[2], params_flat[4])
        del _

    @jax.jit
    def get_merged_params(lora_params: Dict[str, chex.Array]):
        q_params, v_params = merge_lora_params(lora_params)
        flat_insertion = insert_q_v_params(q_params, v_params)
        return tree.unflatten_as(params, flat_insertion)

Here is the issue. merge_lora_params (which adds W to AB for some parameters) works fine. insert_q_v_params which adds the W+AB to runs out of memory. get_merged_params, which gets W+AB, puts it in a flat parameter list and then unflattens the tree takes so much RAM that the compilation fails.

Is there a workaround for this. Everything works without jit compile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to do a simple parameter merge in a memory efficient manner? #17010

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to do a simple parameter merge in a memory efficient manner? #17010

Uh oh!

Uh oh!

aniquetahir Aug 7, 2023

Replies: 0 comments

aniquetahir
Aug 7, 2023