Train on single H200 possible?

For a "hard-core" test, is it possible to train e.g. spider (7000 sequences) on a single H200 system (141GB GPU, 700GB RAM 32 Cores) as well?

I had to adapt the code to use Python 3.13, transformers 5 and flash-attn-4 
(transformers 5 lacks `tokenizer.batch_encode_plus()` , so I adjusted it for both, transformers 4 and 5 with  
```python
if hasattr(tokenizer, 'batch_encode_plus'):
  # use existing code
  result = tokenizer.batch_encode_plus(sequences, ..)
else:
  result = tokenizer(sequences, ..)
```
and the *accelerate_config_7b.yaml* `num_processes: 1`

Unfortunately, after 63 batches doing `accelerator.backward(loss)`, the GPU runs out of memory.
So my humble question is: *what changes in the configs must be done, in order to keep the GPU alive while training?*
or *what changes (apart from `use_cpu: true` are needed to switch to CPU training?*



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on single H200 possible? #46

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Train on single H200 possible? #46

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions