I saw that Rtabmap allows exporting camera poses in Bundler format but I somehow fail to find any way in Bundler to import these camera poses. I guess, I would still have to re-run a big chunk of Bundler pipeline to recreate the 3D data even when the poses would be known. So my main question is: is there any way in Rtabmap to export the 3D scene information in bundle format, i.e. as bundle.out? And if no, what is the intended way to use exported Bundler-poses and whether they can be somehow used in Bundler processing
This function has been only tested with MeshLab. This tutorial may be a little out of date (export dialog has changed), but it shows how the bundler format can be used for. Originally, we added the bundler format to export poses and images to MeshLab in order to use its mesh texturing tools. We didn't tested the compatibility with Bundler.
Thank you Mathieu, it makes sense.
I guess, I will have to write some code myself then :) could you hint me into a right direction perhaps? I'd like to use the Rtabmap datasets with the ACG Localizer (aka Active Search). The ACG Localizer expects the data in Bundler format (luckily it's all ASCII, so should not be hard to reverse-engineer), namely it expects a list of images, a list of SIFT feature descriptors in every image and finally a list of 3D points, while for every 3D point the following is stored: 3D coordinates, color in RGB, list of frame indices it is visible in, and the indices of the SIFT features which correspond to the 3D point in the views.
edit: so I have actually implemented a big part of it. I'd be thankful if you could correct me if I am wrong somewhere. I found almost all the data in the Map_Node_Word table:
1) depth_x, depth_y, depth_z are the 3D coordinates of the word in the 3D space. The RGB color is missing here, but I can retrieve it from the actual image file by getting the x,y coordinate.
2) by node ID I can retrieve the camera pose and frame indices where the feature is visible
3) by word ID I can retrieve the descriptor. - There is something that I don't get yet: one word ID can have multiple different descriptor values for different node IDs - I guess you store here the descriptors of the same feature seen from different perspectives. Then what is stored with the same ID in the Words table? Is it some averaged feature descriptor vector?
Just for the case I attach the description straight from the bundler's documentation.
The bundle files contain the estimated scene and camera geometry have
the following format:
# Bundle file v0.3
<num_cameras> <num_points> [two integers]
Each camera entry <cameraI> contains the estimated camera intrinsics
and extrinsics, and has the form:
<f> <k1> <k2> [the focal length, followed by two radial distortion coeffs]
<R> [a 3x3 matrix representing the camera rotation]
<t> [a 3-vector describing the camera translation]
The cameras are specified in the order they appear in the list of
Each point entry <pointI> has the form:
<position> [a 3-vector describing the 3D position of the point]
<color> [a 3-vector describing the RGB color of the point]
<view list> [a list of views the point is visible in]
The view list begins with the length of the list (i.e., the number of
cameras the point is visible in). The list is then given as a list of
quadruplets <camera> <key> <x> <y>, where <camera> is a camera index,
<key> the index of the SIFT keypoint where the point was detected in
that camera, and <x> and <y> are the detected positions of that
keypoint. Both indices are 0-based (e.g., if camera 0 appears in the
list, this corresponds to the first camera in the scene file and the
first image in "list.txt").
We cannot directly translate RTAB-Map database to bundler format without some processing. It is because a word in the dictionary can be linked to more than one 3D point. We could get all <pointN> using Optimizer::computeBACorrespondences() using "points3DMap" for the 3D points and "wordReferences" for the list of frame references and keypoint positions. To export SIFT descriptors linked to correct Ids, that function would have to return the corresponding descriptors of the 2D keypoints (maybe adding a cv::Mat field in "wordReferences" for the descriptor). Ideally, we would have to do a global bundle adjustment before the export. To get the color for each point, we would have to uncompress the corresponding frame. I'll try to code something today and let you know.
For your edit:
Map_Node_Word table is now called Feature in recent versions (since 0.13.0). The descriptor in Word table is the first descriptor seen for the word (in case multiple features link to it). Map_Node_Word/Feature table contains descriptor of each features extracted from the referred frame. Multiple features can be linked to same word. Each feature has a 2d keypoint and the corresponding 3d point (in ROS coordinates, not image coordinates). The 3D points are not 3D map points like in feature-based visual SLAM approaches, they are just 3D points of that frame.
I implemented a first version. Update rtabmap to have the latest commits. When exporting in bundler format, a new checkbox can be checked to export 3D points and all descriptors. Obviously, to export SIFT descriptors we should set the vocabulary to use SIFT (and uncheck use odometry features). I cannot test really the output files to know if everything is fine (like if exported descriptor format called "*.key" is okay), though in MeshLab I can see the 3D points with the cameras.
Thank you for the beer(s)! If you have problems with the export format or you can get a small example working with the ACG Localizer, let me know.
Right know there could be some duplicates of 3D points as I just match features between pairs of frames. I'll take a look to see how I can get more than 2 frames per 3D points.
EDIT: done! Also fixed inverted colors.