How to Localize Humanoids with a Single Camera? Pablo F. Alcantarilla, Olivier Stasse, Sebastien Druon, Luis M. Bergasa and Frank Dellaert. Autonomous Robots, September 2012. General Description --------------------------------------------------------------------------------------------------------------------------- We provide several stereo-vision datasets with calibration parameters and ground truth information for visual SLAM and localization. In this way, other researchers working with humanoid robots can use these datasets to evaluate their vision-based algorithms. Ground truth was obtained by a MOCAP system (only translation for the Toulouse dataset). We also provide the output of our stereo visual SLAM system + Bundle Adjustment as an "approximate" ground truth. The visual SLAM framework was validated in the challenging KITTI visual odometry / SLAM benchmark sequences obtaining competitive results with state-of-the-art methods. Each .tar.gz file contains one stereo sequence where the left and right images are stored in the sub-folders left and right respectively. The images are provided as greyscale PGM format. The images are provided in raw format, i.e. no undistortion and rectification are performed. Format of the calibration file --------------------------------------------------------------------------------------------------------------------------- We provide two calibration files for the stereo rig: Tsukuba and Toulouse. The calibration file comprises of several rows with information regarding the stereo rig. You can check the calibration_file.txt example that is included in the folder. The first row of each calibration file is a description about the calibration and the stereo rig. The rest of the rows have the following format: Image Width Image Height Left Camera Fx Left Camera Fy Left Camera u0 Left Camera v0 Right Camera Fx Right Camera Fy Right Camera u0 Right Camera v0 R11 R12 R13 R21 R22 R23 R31 R32 R33 Tx Ty Tz Left kc1 Left kc2 Left kc3 Left kc4 Right kc1 Right kc2 Right kc3 Right kc4 The rotation matrix between both camera is denoted by R, and T is the translation vector. kc is a vector that contains the radial and tangential distortion parameters. The camera coordinate frame is the typical one from Matlab calibration: X positive axis is pointing right Y positive axis is pointing down Z positive axis is pointing forward The calibration parameters were obtained using Matlab camera calibration toolbox. For more information about the calibration parameters, please check: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html Format of the Visual SLAM output files --------------------------------------------------------------------------------------------------------------------------- For each sequence we also provide the output of the stereo visual SLAM system described in the paper. The visual SLAM output comprises of two different files (translation and rotation) The translation file contains Nx3 rows and 1 column, being N the number of frames in the sequence. The translation is represented as a 3D vector with respect to a global coordinate frame, which is the first frame of the sequence. The format of the file is as follows: Tx1 Ty1 Tz1 . . . TxN TyN TzN The orientation file contains Nx4 rows and 1 column, being N the number of frames in the sequence. The orientation is represented by means of unit quaternions with respect to a global coordinate frame, which is the first frame of the sequence. The format of the file is as follows: q01 qx1 qy1 qz1 . . . q0N qxN qyN qzN Given the translation and orientation, a 3D point expressed in the world coordinate frame can be transformed to any of the camera poses in the dataset as: hi = R_CW( Yi - Tcam ) Where Tcam is the translation vector for a given camera, and R_CW is the rotation matrix that can be derived from the quaternion. The set of quaternions given in the output file represent a rotation from the camera frame to the world coordinate frame: qCam -> R_WC, and R_CW = R_WC.transpose Given a quaternion qCam = [q0 qx qy qz], being q0 the scalar value, the rotation matrix R_WC can be obtained as: R(1,1) = q0*q0 + qx*qx - qy*qy - qz*qz R(2,1) = 2.0*(qx*qy + q0*qz); R(3,1) = 2.0*(qx*qz - q0*qy); R(1,2) = 2.0*(qx*qy - q0*qz); R(2,2) = q0*q0 - qx*qx + qy*qy - qz*qz; R(3,2) = 2.0*(qy*qz + q0*qx); R(1,3) = 2.0*(qx*qz + q0*qy); R(2,3) = 2.0*(qy*qz - q0*qx); R(3,3) = q0*q0 - qx*qx - qy*qy + qz*qz; Format of the MOCAP output file --------------------------------------------------------------------------------------------------------------------------- Ground truth was obtained by a MOCAP system (only translation for the Toulouse dataset). The MOCAP data was subsampled and interpolated to match the number of frames of each sequence. In the circular sequence from the Toulouse dataset, there are some parts of the sequence were the MOCAP couldn't get measurements. The translation file contains Nx3 rows and 1 column, being N the number of frames in the sequence. The translation is represented as a 3D vector with respect to a global coordinate frame, which is the first frame of the sequence. The format of the file is as follows: Mx1 My1 Mz1 . . . MxN MyN MzN Contact Info --------------------------------------------------------------------------------------------------------------------------- If you have any questions or comments, please contact me: Pablo F. Alcantarilla email: pablofdezalc@gmail.com