One error should be solved and one improvement for reducing the CUDA memory

1.One error should be solved
when install apex, there will be 4 erors about "convert unsigned long to long", you need to edit:
(1) line 65 in apex_22.01_pp/csrc/mlp.cpp
`    auto reserved_space = at::empty({reserved_size}, inputs[0].type());`
change to:
`    auto reserved_space = at::empty({static_cast<long>(reserved_size)}, inputs[0].type());`

(2) line 138 in apex_22.01_pp/csrc/mlp.cpp
`    auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());`
change to:
`    auto work_space = at::empty({static_cast<long>(work_size / sizeof(scalar_t))}, inputs[0].type());`

or you need to change the compile option

2.one improvement for reducing the CUDA memory
when launch the owl_demo.py using a GPU with 16G, I ran into a CUDA memory overflow error. Then I edit here:
line 33 and 34 in interface.py:
```
    model = model.to(device)
    model = model.to(dtype)
```
change to:
```
    model = model.to(dtype)
    model = model.to(device)
```

Then, After the demo is started, the memory usage is about 14 GB. It can run very well on a 16GB GPU.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One error should be solved and one improvement for reducing the CUDA memory #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

One error should be solved and one improvement for reducing the CUDA memory #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions