VILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge, including Jetson Orin and laptop by AWQ 4bit ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results