3D-aware picture synthesis encompasses quite a lot of duties, equivalent to scene era and novel view synthesis from pictures. Regardless of quite a few task-specific strategies, creating a complete mannequin stays difficult. On this paper, we current SSDNeRF, a unified strategy that employs an expressive diffusion mannequin to be taught a generalizable prior of neural radiance fields (NeRF) from multi-view pictures of various objects. Earlier research have used two-stage approaches that depend on pretrained NeRFs as actual knowledge to coach diffusion fashions. In distinction, we suggest a brand new single-stage coaching paradigm with an end-to-end goal that collectively optimizes a NeRF auto-decoder and a latent diffusion mannequin, enabling simultaneous 3D reconstruction and prior studying, even from sparsely accessible views. At check time, we are able to straight pattern the diffusion prior for unconditional era, or mix it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates sturdy outcomes similar to or higher than main task-specific strategies in unconditional era and single/sparse-view 3D reconstruction.