收录:
摘要:
In this paper, we propose a novel technique for zero-shot generation of 3 -dimension models using only target text prompt. This paper builds upon existing research results by utilizing a pre-trained CLIP model as the core, which bridges the gap between text and visual entities by comparing the semantic similarity between the input text prompt and the 3D model renderings. Furthermore, we integrate a diffusion model that generates corresponding CLIP image embeddings based on the text embedding information. This integration enables the generation of more faithful results to the text prompt in the stage of predicting the color and local geometric details that match the target text prompt. © 2024 IEEE.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
年份: 2024
页码: 443-447
语种: 英文
归属院系: