-
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery - https://arxiv.org/pdf/2304.09974
-
Building and better understanding vision-language models: insights and future directions - https://arxiv.org/pdf/2408.12637

- GeLu
- 2D RoPE rotational positional embedding