derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented

Hello,

When trying to apply the Sine Wave example approach to a transformer based model I get the following output:

  
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented


Regression task setup. Multiple sequences. 

Is it possible to somehow work around this ?

Thank you,


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions