In this paper, we introduce a novel method for the estimation of vehicle pose and extent in traffic surveillance scenarios based on camera data. The state estimation is performed in a common world frame, enabling the seamless integration of the image data from different viewpoints. Our approach incorporates the non-linear transformation between the measurements and the states directly into the framework of an Unscented Kalman filter. Two measurement models are proposed: one designed for bounding boxes and another for discretized object contours extracted from segmentation masks. The method is evaluated using data from a real-world traffic surveillance system, demonstrating the high effectiveness and good feasibility of our approach for localizing passing cars.