The video to colmap step will populate the scan db with new entries with the right camera parameters. And select a spatially optimal subset of frames from the full video for a photogrammetry with 1000 pictures.
It will also create several txt files with list of file paths :
- `video_frames_for_thorough_scan.txt` : all images used in the first thorough photogrammetry
- `georef.txt` : all images with GPS position, and XYZ equivalent, with system and minus centroid of Lidar file.
And finally, it will divide long videos into chunks with corresponding list of filepath so that we don't deal with too large sequences (limit here is 4000 frames)
We also recommand you make your own vocab_tree with image indexes, this will make the next matching steps faster.
```
colmap vocab_tree_retriever \
--database_path /path/to/scan.db\
--vocab_tree_path /path/to/vocab_tree \
--output_index /path/to/indexed_vocab_tree
```
5. Second COLMAP step : matching. For less than 1000 images, you can use exhaustive matching (this will take around 2hours). If there is too much images, you can use either spatial matching or vocab tree matching
if first chunk, simply copy `/path/to/chunk_n_model` to `/path/to/full_video_model`.
Otherwise:
```
colmap model_merger \
--input1 /path/to/full_video_model \
--input2 /path/to/chunk_n_model \
--output /path/to/full_video_model
```
At the end of this step, you should have a model with all the (localizable) frames of the videos + the other frames that where used for the first thorough photogrammetry
6. Extract the frame position from the resulting model
At the end of these per-video-tasks, you should have a model at `/path/to/georef_full` with all photogrammetry images + localization of video frames at 1fps, and for each video a TXT file with positions with respect to the first geo-registered reconstruction.
The video to colmap step will populate the scan db with new entries with the right camera parameters. And select a spatially optimal subset of frames from the full video for a photogrammetry with 1000 pictures.
It will also create several txt files with list of file paths :
9. Point cloud densification
-`video_frames_for_thorough_scan.txt` : all images used in the first thorough photogrammetry
-`georef.txt` : all images with GPS position, and XYZ equivalent, with system and minus centroid of Lidar file.
```
colmap image_undistorter \
--image_path /path/to/images \
--input_path /path/to/georef_full \
--output_path /path/to/dense \
--output_type COLMAP \
--max_image_size 1000
```
And finally, it will divide long videos into chunks with corresponding list of filepath so that we don't deal with too large sequences (limit here is 4000 frames)
`max_image_size` option is optional but recommended if you want to save space when dealing with 4K images
```
colmap patch_match_stereo \
--workspace_path /path/to/dense \
--workspace_format COLMAP \
--PatchMatchStereo.geom_consistency 1
```
5. First COLMAP step : feature extraction
```
colmap stereo_fusion \
--workspace_path /path/to/dense \
--workspace_format COLMAP \
--input_type geometric \
--output_path /path/to/georef_dense.ply
```
This will also create a `/path/to/georef_dense.ply.vis` file which describes frames from which each point is visible.
We also recommand you make your own vocab_tree with image indexes, this will make the next matching steps faster.
```
colmap vocab_tree_retriever \
--database_path /path/to/scan.db\
--vocab_tree_path /path/to/vocab_tree \
--output_index /path/to/indexed_vocab_tree
```
6. Second COLMAP step : matching. For less than 1000 images, you can use exhaustive matching (this will take around 2hours). If there is too much images, you can use either spatial matching or vocab tree matching