We examine the good thing about combining blind audio recordings with 3D scene data for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and materials of a scene containing a number of unknown sound sources, we estimate the sound anyplace within the scene. We establish the principle challenges of novel-view acoustic synthesis as sound supply localization, separation, and dereverberation. Whereas naively coaching an end-to-end community fails to provide high-quality outcomes, we present that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms permits the identical community to collectively sort out these duties. Our technique outperforms current strategies designed for the person duties, demonstrating its effectiveness at using 3D visible data. In a simulated research on the Matterport3D-NVAS dataset, our mannequin achieves near-perfect accuracy on supply localization, a PSNR of 26.44dB and a SDR of 14.23dB for supply separation and dereverberation, leading to a PSNR of 25.55 dB and a SDR of 14.20 dB on novel-view acoustic synthesis. We launch our code and mannequin on our challenge web site at https://github.com/apple/ml-nvas3d. Please put on headphones when listening to the outcomes.