Indexed by:
Abstract:
In this paper, we propose a novel technique for zero-shot generation of 3 -dimension models using only target text prompt. This paper builds upon existing research results by utilizing a pre-trained CLIP model as the core, which bridges the gap between text and visual entities by comparing the semantic similarity between the input text prompt and the 3D model renderings. Furthermore, we integrate a diffusion model that generates corresponding CLIP image embeddings based on the text embedding information. This integration enables the generation of more faithful results to the text prompt in the stage of predicting the color and local geometric details that match the target text prompt. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 443-447
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: