README.md 36.5 KB
Newer Older
Clément Pinard's avatar
Clément Pinard committed
1
2
# Photogrammetry and georegistration tools for Parrot drone videos

Clément Pinard's avatar
Clément Pinard committed
3
4
This is a set of python scripts  and c++ programs used to construct a depth validation set with a Lidar generated point cloud.
For a brief recap of what it does, see section [How it works](#how-it-works)
Clément Pinard's avatar
Clément Pinard committed
5

Clément Pinard's avatar
Clément Pinard committed
6
7
8
9
10
11
12
13
14
## Table of contents

* [Software Dependencies](#software-dependencies)
* [Hardware Dependencies](#hardware-dependencies)
* [How it works](#how-it-works)
* [Step by step guide](#usage)
* [Detailed method with the manoir example](#detailed-method-with-the-manoir-example)


Clément Pinard's avatar
Clément Pinard committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Software Dependencies

These are the used tools, make sure to install them before running the scripts.

 - [CUDA](https://developer.nvidia.com/cuda-downloads)
 - [OpenCV](https://opencv.org/)
 - [ETH3D Dataset-pipeline](https://github.com/ETH3D/dataset-pipeline)
 - [Pytorch](https://pytorch.org/)
 - [COLMAP](https://colmap.github.io/)
 - [PDrAW from AnafiSDK](https://developer.parrot.com/docs/pdraw/)

Apart from CUDA, which you need to install by yourself, you can use the help script `install_dependencies.sh` to install them on ubuntu 20.04.

For PDrAW, there should be a `native-wrapper.sh` file that you to keep a track of. It's usually in `groundsdk/out/pdraw-linux/staging/native-wrapper.sh`(see [here](https://developer.parrot.com/docs/pdraw/installation.html))

30
31
32
For COLMAP, you will need a vocab tree for feature matching. You can download them at https://demuc.de/colmap/#download . In our tests, we took the 256K version.

## Hardware dependecies
Clément Pinard's avatar
Clément Pinard committed
33
34
35
36
37
38
39
40
41
42
43
44

To recreate the results of the study, you will need these hardware pieces :
 - Parrot Anafi
 - DJI Matrice 600
 - Velodyne Puck VLP16

Note that for our study, we provided the Anafi drone (\~700€), and the point cloud was created by a private company (\~3500€ for the whole scan process)


# How it works

Here are the key steps of the dataset creation :
Clément Pinard's avatar
Clément Pinard committed
45
See [Detailed method with the manoir example](#detailed-method-with-the-manoir-example) for a concrete example with options used.
Clément Pinard's avatar
Clément Pinard committed
46
47

1. Data acquisition on a particular scene
Clément Pinard's avatar
Clément Pinard committed
48
49
    - Make a photogrammetry flight plan with any drone, You can use e.g. the Anafi with the Pix4D capture app (it's free). It is important that pictures have GPS info in the exif
    - Make some natural flights in the same scene, use either a Bebop2 or a Anafi to be able to use the PDraw tool. In theory this is possible to adapt the current scripts to any IMU-GPS-powered camera.
Clément Pinard's avatar
Clément Pinard committed
50
51
    - Make a Lidar scan of this very scene, and clean the resulting 3D point cloud : this is a crucial part as Lidar data will be assumed perfect for the rest of the workflow. You need to also note the projection system used (e.g. `EPSG 2154`) for geo registration. The file will a priori be a `.las` file with float64 values.

Clément Pinard's avatar
Clément Pinard committed
52
53
54
55
56
57
58
59
2. Convert the `.las` float64 point cloud into a `.ply` float32
    - As 3D values are global, x and y will be huge. You need to make the cloud 0-centered by subtracting its centroid to it.
    - The centroid needs to be logged somewhere for future frame registration
    - This step is done by the script `las2ply.py`

3. Extract optimal frames from video for a thorough photogrammetry that will use a mix of pix4D flight plan pictures and video still frames.
    - The total number of frame must not be too high to prevent the reconstruction from lasting too long on a single desktop (we recommand between 500 an 1000 images)
    - At the same time, extract if possible information on camera parameters to identify which video sequences share the same parameters (e.g 4K videos vs 720p videos, or different levels of zooming)
Clément Pinard's avatar
Clément Pinard committed
60
    - This step is done by the script `videos_to_colmap.py`
Clément Pinard's avatar
Clément Pinard committed
61
62
63
64
65

4. Georeference your images.
    - For each frame with *GPS* position, convert them in *XYZ* coorindates in the projection system used by the Lidar point cloud (Here, EPSG:2154 was used)
    - Substract to these coordinates the centroid that logged when converting the LAS file to PLY.
    - Log image filename and centered *XYZ* position in a file for georegistration of the reconstruction point cloud
Clément Pinard's avatar
Clément Pinard committed
66
    - This step is also done by the script `videos_to_colmap.py`
Clément Pinard's avatar
Clément Pinard committed
67

Clément Pinard's avatar
Clément Pinard committed
68
4. Generate sky maps of your drone pictures to help the photogrammetry filter out noise during matching
Clément Pinard's avatar
Clément Pinard committed
69
70
71
72
    - Use a Neural Network to segment the drone picture and generate masks so that the black areas will be ignored
    - This is done with the script `generate_sky_masks.py`

3. Perform a photogrammetry on your pictures
Clément Pinard's avatar
Clément Pinard committed
73
74
    - The recommended tool is COLMAP because further tools will use its output format
    - You should get a sparse 3D model, exhaustive enough to : 
Clément Pinard's avatar
Clément Pinard committed
75
76
77
78
        - Be matched with the Lidar Point cloud
        - Localize other video frames in the reconstruction

4. Change reconstructed point cloud with shift and scale to match Lidar point cloud
Clément Pinard's avatar
Clément Pinard committed
79
80
81
82
83
84
85
86
87
88
    - See [here](https://colmap.github.io/faq.html#geo-registration) for point cloud georegistration with colmap

5. For each video : continue the photogrammetry with video frames at a low fps (we took 1fps)
    - We do this in order to keep the whole mapping at a linear time
    - Merge all the resulting models into one full model with thorough photogrammetry frames and all the 1fps frames
    - Finish registering the remaning frames. For RAM reasons, every video is divided into chunks, so that a sequence registered is never more than 4000 frames.
    - Filter the final model at full framerate : remove points with absurd angular and translational accleration. Interpolate the resulting discarded points (but keep a track of them). This is done in the script `filter_colmap_model.py`

6. Densify the resulting point cloud with COLMAP (see [here](https://colmap.github.io/tutorial.html#dense-reconstruction))
    - Export a PLY file along with the VIS file with `colmap stereo_fusion`
Clément Pinard's avatar
Clément Pinard committed
89

Clément Pinard's avatar
Clément Pinard committed
90
7. Match the Lidar Point cloud and full reconstruction point cloud together with ICP.
Clément Pinard's avatar
Clément Pinard committed
91
    - The georegistration of the reconstructed point cloud should be sufficient to get a good starting point.
Clément Pinard's avatar
Clément Pinard committed
92
93
94
95
96
97
98
99
100
101
    - By experience, the best method here is to use CloudCompare, but You can use ETH3D or PCL to do it
    - The resulting transformation matrix should be stored in a TXT file, the same way cloudcompare proposes to do it

8. Construct a PLY+VIS file pair based on lidar scan
    - The PLY file is basically every lidar point
    - The VIS file stores frames from which point is visible : we reuse the PLY+VIS from step 6, and assume a lidar point has the same visibility as the closest point in the denified reconstructed point.

9. Run the delauney mesher tool from COLMAP to construct an occlusion mesh

10. Construct the Splats with ETH3D
Clément Pinard's avatar
Clément Pinard committed
102

Clément Pinard's avatar
Clément Pinard committed
103
104
105
11. Construct the ground truth Depth with ETH3D

12. Visualize and Convert the resulting dataset to match the format of a more well known dataset, like KITTI.
Clément Pinard's avatar
Clément Pinard committed
106
107


Clément Pinard's avatar
Clément Pinard committed
108
## Usage
Clément Pinard's avatar
Clément Pinard committed
109

Clément Pinard's avatar
Clément Pinard committed
110
### Running the full script
Clément Pinard's avatar
Clément Pinard committed
111
112


Clément Pinard's avatar
Clément Pinard committed
113
Structure your input folder so that it looks like this:
Clément Pinard's avatar
Clément Pinard committed
114
```
Clément Pinard's avatar
Clément Pinard committed
115
├── Pictures
116
117
│   ├── anafi
│   │   ├── raw
Clément Pinard's avatar
Clément Pinard committed
118
│   │   └── rectilinear
119
│   └── apn
Clément Pinard's avatar
Clément Pinard committed
120
121
122
123
124
125
├── Videos
│   ├── 4K30
│   └── 720p120
└── Lidar
```

Clément Pinard's avatar
Clément Pinard committed
126
127
128
129
130
131
132
You can run the whole script with ```python main_pipeline.py```. If you don't have a lidar point cloud and want to use COLMAP reconstructed cloud as Groundtruth, you can use ```python main_pipeline_no_lidar.py``` which will be very similar, minus point cloud cleaning and registration steps.

#### Parameters breakdown

All the parameters for `main_pipeline.py` are defined in the file `cli_utils.ply`.You will find below a summary :

1. Main options
Clément Pinard's avatar
Clément Pinard committed
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
    * `--input_folder` : Input Folder with LAS/PLY point clouds, videos, and images, defined above
    * `--workspace` : Path to workspace where COLMAP operations will be done. It needs to be on a SSD, and size needed depends on video size, but should at least be 20 Go.
    * `--raw_output_folder` : Path to output folder for raw depth maps. Must be very big, especially with 4K videos. for 4K30fps video, count around 60Go per minute of video.
    * `--converted_output_folder` : Path to output folder for converted depth maps and visualization. Must be big but usually smaller than raw output because depth map is still uncompressed, but downscaled.
    * `--show_steps` : If selected, will make a dry run just to list steps and their numbers.
    * `--skip_step` : Skip the selected steps. Can be useful an operation is done manually)
    * `--begin_step` : Skip all steps before this step. Useful when the script failed at some point
    * `--resume_work` : If selected, will try to skip video aready localized, and ground truth already generated
    * `--inspect_dataset` : If selected, will open a window to inspect the dataset at key steps. See https://github.com/ETH3D/dataset-pipeline#dataset-inspection
    * `--save_space` : If selected, will try to save space in workspace by only extracting needed frames and removing them as soon as they are no longer needed. Strongly advised.
    * `--vid_ext` : Video extensions to scrape from input folder. By default will search for `mp4` and `MP4` files
    * `--pic_ext` : Same as Video extensions, but for Image. By default will search for `jpg`, `JPG`, `png`and `PNG` files.
    * `--raw_ext` : Same as Video extensions, but for RAW image. By default will search for `ARW`, `NEF` and  `DNG` files.

2. Executable files
    * `--nw` : Native wrapper location. See https://developer.parrot.com/docs/pdraw/installation.html#run-pdraw
    * `--colmap` : Colmap exec location. Usually just `Colmap` if it has been installed system-wide.
    * `--ffmpeg` : ffmpeg exec location. Usually just `ffmpeg` if it has been installed system-wide.
    * `--eth3d` : ETH3D dataset pipeline exec files folder location. Usually at `dataset-pipeline/build/`.
    * `--pcl_util` : PCL util exec files. Usually at `pcl_util/build` (source in this repo)
    * `--log` : If set, will output stdout and stderr of these exec files to a log file, which can be read from anther terminal with `tail`.
Clément Pinard's avatar
Clément Pinard committed
154
155

3. Lidar point cloud preparation
Clément Pinard's avatar
Clément Pinard committed
156
157
158
    * `--pointcloud_resolution` : If set, will subsample the Lidar point clouds at the chosen resolution.
    * `--SOR` : Satistical Outlier Removal parameters. This accepts 2 arguments : Number of nearest neighbours and max relative distance to standard deviation. See https://pcl.readthedocs.io/projects/tutorials/en/latest/statistical_outlier.html
    * `--registration_method` : Method use for point cloud registration, chose between "simple", "eth3d" and "interactive" ("simple" by default). See Manual step by step : step 11')
Clément Pinard's avatar
Clément Pinard committed
159
160

4. Video extractor
Clément Pinard's avatar
Clément Pinard committed
161
162
163
164
165
166
167
168
    * `--total_frames` : Total number of frames that will be used for the first thorough photogrammetry. By default 500, keep this number below 1000.
    * `--orientation_weight` : Weight applied to orientation during optimal sample. Higher means two pictures with same location but different orientation will be considered further apart.
    * `--resolution_weight` : Same as orientation, but with image size.
    * `--max_sequence_length` : COLMAP needs to load ALL the feature matches to register new frames. As such, some videos are too long to fit in RAM, and we need to divide the video in Chunks that will treated separately and then merged together. This parameter is the number max of frames for a chunk. Ideal value is around 500 frames for 1Go of RAM, regardless of resolution.
    * `--num_neighbours` : number of frames overlapping between chunks. This is for merge purpose.
    * `--system` : coordinates system used for GPS, should be the same as the LAS files used.
    * `--lowfps`: framerate at which videos will be scanned WITH reconstruction. 1fps by default
    * `--include_lowfps_thorough` : if selected, will include videos frames at lowfps for thorough scan (longer). This can be useful when some videos are not GPS localized (e.g. handhel camera) and are still relevant for the thorough photogrammetry.
Clément Pinard's avatar
Clément Pinard committed
169
170
171
172
173
174

5. Photogrammetry
    * `--max_num_matches` : Max number of matches, lower it if you get GPU memory error.
    * `--vocab_tree` : Pah to vocab tree, can be downloaded [here](https://demuc.de/colmap/#download)
    * `--multiple_models` : If selected, will let colmap mapper do multiple models. The biggest one will then be chosen
    * `--more_sift_features` : If selected, will activate the COLMAP options ` SiftExtraction.domain_size_pooling` and `--SiftExtraction.estimate_affine_shape` during feature extraction. Be careful, this does not use GPU and is thus very slow. More info : https://colmap.github.io/faq.html#increase-number-of-matches-sparse-3d-points
Clément Pinard's avatar
Clément Pinard committed
175
    * `--add_new_videos` : If selected, will skip the mapping steps to directly register new video with respect to an already existing colmap model.
Clément Pinard's avatar
Clément Pinard committed
176
177
178
179
180
181
182
183
184
185
186
    * `--stereo_min_depth` : Min depth for PatchMatch Stereo used during point cloud densification
    * `--stereo_max_depth` : Same as min depth but for max depth.

6. Occlusion Mesh
    * `--normals_method` : Method used for normal computation between radius and nearest neighbours.
    * `--normals_radius` : If radius method for normals, radius within which other points will be considered neighbours
    * `--normals_neighbours` : If nearest neighbours method chosen, number of neighbours to consider. Could be very close or very far points, but has a constant complexity.
    * `--mesh_resolution` : Mesh resolution for occlusion in meters. Higher means more coarse. (default 0.2, i.e. 20cm)
    * `--splats` : If selected, will create splats for points in the cloud that are far from the occlusion mesh.
    * `--splat_threshold` : Distance from occlusion mesh at which a splat will be created for a particular point (default, 10cm)
    * `--max_splate_size` : Splat size is defined by mean istance from its neighbours. You can define a max splat size for isolated points which otherwise would make a very large useless splat. If not set, will be `2.5*splat_threshold`.
Clément Pinard's avatar
Clément Pinard committed
187
188
189
190
191
192
193

### Manual step by step

This will essentially do the same thing as the script, in order to let you change some steps at will.

1. Point cloud preparation

Clément Pinard's avatar
Clément Pinard committed
194
195
    ```
    python las2ply.py /path/to/cloud.las \
Clément Pinard's avatar
Clément Pinard committed
196
    --output_folder /path/to/output_folder
Clément Pinard's avatar
Clément Pinard committed
197
    ```
Clément Pinard's avatar
Clément Pinard committed
198

Clément Pinard's avatar
Clément Pinard committed
199
200
201
202
    This will save a ply file along with a centroid file
     - `/path/to/output_folder/cloud.ply`
     - `/path/to/output_folder/centroid.txt`

Clément Pinard's avatar
Clément Pinard committed
203

Clément Pinard's avatar
Clément Pinard committed
204
205
2. Point Cloud Cleaning
    For each ply file :
Clément Pinard's avatar
Clément Pinard committed
206

Clément Pinard's avatar
Clément Pinard committed
207
208
    ```
    ETHD3D/build/PointCloudCleaner \
Clément Pinard's avatar
Clément Pinard committed
209
210
    --in /path/to/output_folder/cloud.ply \
    --filter 5,10
Clément Pinard's avatar
Clément Pinard committed
211
    ```
Clément Pinard's avatar
Clément Pinard committed
212
    (local outliers removal, doesn't necessarily remove isolated points)
Clément Pinard's avatar
Clément Pinard committed
213
214
215
216
217
218
219
    or
    ```
    pcl_util/build/CloudSOR \
    --input/path/to/cloud_lidar.ply \
    --output /path/to/cloud_lidar_filtered.ply \
    --knn 5 --std 6
    ```
Clément Pinard's avatar
Clément Pinard committed
220

Clément Pinard's avatar
Clément Pinard committed
221
222
223
224
225
226
227
3. Meshlab Project creation
    ```
    python meshlab_xml_writer.py create \
    --input_models /path/to/cloud1 [../path/to/cloudN] \
    --output_meshlab /path/to/lidar.mlp
    ```

Clément Pinard's avatar
Clément Pinard committed
228
    Optionally, if we have multiple lidar scans, we can run a registration step with ETH3D
Clément Pinard's avatar
Clément Pinard committed
229
230
231
232
233
234
235
236

    ```
    ETHD3D/build/ICPScanAligner \
    -i /path/to/lidar.mlp \
    -o /path/to/lidar.mlp
    --number_of_scales 5
    ```

Clément Pinard's avatar
Clément Pinard committed
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
4. First COLMAP step (divided in two parts) : feature extraction for photogrammetry frames

    ```
    python generate_sky_masks.py \
    --img_dir /path/to/images \
    --colmap_img_root /path/to/images \
    --maskroot /path/to/images_mask \
    --batch_size 8
    ```

    ```
    colmap feature_extractor \
    --database_path /path/to/scan.db \
    --image_path /path/to/images \
    --ImageReader.mask_path Path/to/images_mask/ \
    --ImageReader.camera_model RADIAL \
    --ImageReader.single_camera_per_folder 1 \
    ```

Clément Pinard's avatar
Clément Pinard committed
256
    We don't explicitely need to extract features before having video frames, but this will populate the `/path/to/scan.db` file with the photogrammetry pictures and corresponding id that will be reserved for future version of the file. Besides, it automatically set a camera per folder too.
Clément Pinard's avatar
Clément Pinard committed
257
258

5. Video frame addition to COLMAP db file
Clément Pinard's avatar
Clément Pinard committed
259

Clément Pinard's avatar
Clément Pinard committed
260
    ```
Clément Pinard's avatar
Clément Pinard committed
261
    python video_to_colmap.py \
Clément Pinard's avatar
Clément Pinard committed
262
263
264
    --video_folder /path/to/videos \
    --system epsg:2154 \
    --centroid_path /path/to/centroid.txt \
Clément Pinard's avatar
Clément Pinard committed
265
    --colmap_img_root /path/to/images \
Clément Pinard's avatar
Clément Pinard committed
266
267
268
269
270
271
272
    --nw /path/to/anafi/native-wrapper.sh \
    --fps 1 \
    --total_frames 1000 \
    --save_space \
    --thorough_db /path/to/scan.db
    ```

Clément Pinard's avatar
Clément Pinard committed
273
    The video to colmap step will populate the scan db with new entries with the right camera parameters, and select a spatially optimal subset of frames from the full video for a photogrammetry with 1000 pictures.
Clément Pinard's avatar
Clément Pinard committed
274
275
276
277
    It will also create several txt files with list of file paths :
     - `video_frames_for_thorough_scan.txt` : all images used in the first thorough photogrammetry
     - `georef.txt` : all images with GPS position, and XYZ equivalent, with system and minus centroid of Lidar file.

Clément Pinard's avatar
Clément Pinard committed
278
279
    The system parameter (here epsg:2154) is the one used in the point cloud. The geo localized frame will then be localized inside the point cloud, which will help register the COLMAP reconstructed point with the Lidar PointCloud. See more info [here](https://en.wikipedia.org/wiki/Spatial_reference_system). It must be compatible with [Proj](https://proj.org).

280
    And finally, it will divide long videos into chunks with corresponding list of filepath so that we don't deal with too large sequences (limit here is 4000 frames). Each chunk will have the list of frames stored in a file `full_chunk_N.txt` inside the Video folder.
Clément Pinard's avatar
Clément Pinard committed
281
282


Clément Pinard's avatar
Clément Pinard committed
283
6. Second part of first COLMAP step : feature extraction for video frames used for thorough photogrammetry
Clément Pinard's avatar
Clément Pinard committed
284
285
286
287
288
289


    ```
    python generate_sky_masks.py \
    --img_dir /path/to/images \
    --colmap_img_root /path/to/images \
Pinard Clement's avatar
Pinard Clement committed
290
    --mask_root /path/to/images_mask \
Clément Pinard's avatar
Clément Pinard committed
291
292
293
    --batch_size 8
    ```

Clément Pinard's avatar
Clément Pinard committed
294
295
    (this is the same command as step 4)

Clément Pinard's avatar
Clément Pinard committed
296
297
298
299
    ```
    colmap feature_extractor \
    --database_path /path/to/scan.db \
    --image_path /path/to/images \
Pinard Clement's avatar
Pinard Clement committed
300
    --image_list_path /path/toimages/video_frames_for_thorough_scan.txt
Clément Pinard's avatar
Clément Pinard committed
301
302
303
    --ImageReader.mask_path Path/to/images_mask/ \
    ```

304
    We also recommand you make your own vocab_tree with image indexes, this will make the next matching steps faster. You can download a vocab_tree at https://demuc.de/colmap/#download : We took the 256K version in our tests.
Clément Pinard's avatar
Clément Pinard committed
305
306
307
308
309
310
311
312

    ```
    colmap vocab_tree_retriever \
    --database_path /path/to/scan.db\
    --vocab_tree_path /path/to/vocab_tree \
    --output_index /path/to/indexed_vocab_tree
    ```

Clément Pinard's avatar
Clément Pinard committed
313
7. Second COLMAP step : matching. For less than 1000 images, you can use exhaustive matching (this will take around 2hours). If there is too much images, you can use either spatial matching or vocab tree matching
Clément Pinard's avatar
Clément Pinard committed
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333

    ```
    colmap exhaustive_matcher \
    --database_path scan.db \
    --SiftMatching.guided_matching 1
    ```
    or
    ```
    colmap spatial_matcher \
    --database scan.db \
    --SiftMatching.guided_matching 1
    ```
    or
    ```
    colmap vocab_tree_matcher \
    --database scan.db \
    --VocabTreeMatching.vocab_tree_path /path/to/indexed_vocab_tree
    --SiftMatching.guided_matching 1
    ```

Clément Pinard's avatar
Clément Pinard committed
334
8. Third COLMAP step : thorough mapping.
Clément Pinard's avatar
Clément Pinard committed
335
336
337

    ```
    mkdir -p /path/to/thorough/
Clément Pinard's avatar
Clément Pinard committed
338
    colmap mapper --database_path scan.db --output_path /path/to/thorough/ --image_path images
Clément Pinard's avatar
Clément Pinard committed
339
340
    ```

Clément Pinard's avatar
Clément Pinard committed
341
    This will create multiple models located in folder named `output/sparse/N` , `N`being a number, starting from 0. Each model will be, in the form of 3 files
Clément Pinard's avatar
Clément Pinard committed
342
343
    ```
    └── thorough
Clément Pinard's avatar
Clément Pinard committed
344
        └── N
Clément Pinard's avatar
Clément Pinard committed
345
346
347
348
349
350
            ├── cameras.bin
            ├── images.bin
            ├── points3D.bin
            └── project.ini
    ```

Clément Pinard's avatar
Clément Pinard committed
351
352
353
    COLMAP creates multiple models in the case the model has multiple sets of images that don't overlap. Most of the time, there will be only 1 model (named `0`). Depending on the frame used for initialization, it can happen that the biggest model is not the first. Here we will assume that it is indeed the first (`0`), but you are exepcted to change that number if it is not the most complete model COLMAP could construct.

    You can finally add a last bundle adjustment using Ceres, supposedly better than the multicore used in mapper (albeit slower)
Clément Pinard's avatar
Clément Pinard committed
354
355
356
357
358
359
360

    ```
    colmap bundle_adjuster \
    --input_path /path/to/thorough/0
    --output_path /path/to/thorough/0
    ```

Clément Pinard's avatar
Clément Pinard committed
361
9. Fourth COLMAP step : [georeferencing](https://colmap.github.io/faq.html#geo-registration)
Clément Pinard's avatar
Clément Pinard committed
362
363
364
365
366
367

    ```
    mkdir -p /path/to/geo_registered_model
    colmap model_aligner \
    --input_path /path/to/thorough/0/ \
    --output_path /path/to/geo_registered_model \
Pinard Clement's avatar
Pinard Clement committed
368
    --ref_images_path /path/toimages/georef.txt
Clément Pinard's avatar
Clément Pinard committed
369
370
371
372
373
374
    --robust_alignment_max_error 5
    ```

    This model will be the reference model, every further models and frames localization will be done with respect to this one.
    Even if we could, we don't run Point cloud registration right now, as the next steps will help us to have a more complete point cloud.

Clément Pinard's avatar
Clément Pinard committed
375
10. Video Localization
Clément Pinard's avatar
Clément Pinard committed
376
377
    All these substep will populate the db file, which is then used for matching. So you need to make a copy for each video.

378
    1. If `--save_space` option was used during step 5. when calling script `video_to_colmap.py` , you now need to extract all the frames of the video to same directory the `videos_to_colmap.py` script exported the frame subset of this video.
Clément Pinard's avatar
Clément Pinard committed
379
380
381
382
383

        ```
        ffmpeg \
        -i /path/to/video.mp4 \
        -vsync 0 -qscale:v 2 \
Pinard Clement's avatar
Pinard Clement committed
384
        /path/toimages/videos/dir/<video_name>_%05d.jpg
Clément Pinard's avatar
Clément Pinard committed
385
386
387
388
389
390
        ```

    2. continue mapping with low fps images, use sequential matcher

        ```
        python generate_sky_masks.py \
Pinard Clement's avatar
Pinard Clement committed
391
        --img_dir /path/toimages/videos/dir \
Clément Pinard's avatar
Clément Pinard committed
392
        --colmap_img_root /path/to/images \
Pinard Clement's avatar
Pinard Clement committed
393
        --mask_root /path/to/images_mask \
Clément Pinard's avatar
Clément Pinard committed
394
395
396
397
398
        --batch_size 8
        ```

        ```
        python add_video_to_db.py \
Pinard Clement's avatar
Pinard Clement committed
399
400
        --frame_list /path/toimages/videos/dir/lowfps.txt \
        --metadata /path/toimages/videos/dir/metadata.csv\
Clément Pinard's avatar
Clément Pinard committed
401
402
403
404
405
406
407
        --database /path/to/video_scan.db
        ```

        ```
        colmap feature_extractor \
        --database_path /path/to/video_scan.db \
        --image_path /path/to/images \
Pinard Clement's avatar
Pinard Clement committed
408
        --image_list_path /path/toimages/videos/dir/lowfps.txt
Clément Pinard's avatar
Clément Pinard committed
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
        --ImageReader.mask_path Path/to/images_mask/
        ```

        ```
        colmap sequential_matcher \
        --database_path /path/to/video_scan.db \
        --SequentialMatching.loop_detection 1 \
        --SequentialMatching.vocab_tree_path /path/to/indexed_vocab_tree
        ```

        ```
        colmap mapper \
        --input /path/to/geo_registered_model \
        --output /path/to/lowfps_model \
        --Mapper.fix_existing_images 1
Pinard Clement's avatar
Pinard Clement committed
424
425
        --database_path /path/to/video_scan.db \
        --image_path /path/to/images
Clément Pinard's avatar
Clément Pinard committed
426
427
428
429
430
431
432
433
434
        ```

    3.  Re-georeference the model

        This is a tricky part : to ease convergence, the mapper normalizes the model, losing the initial georeferencing.
        To avoid this problem, we merge the model back to the first one. the order between input1 and input2 is important!

        ```
        colmap model_merger \
Pinard Clement's avatar
Pinard Clement committed
435
436
        --input_path1 /path/to/geo_registered_model \
        --input_path2 /path/to/lowfps_model \
Clément Pinard's avatar
Clément Pinard committed
437
438
439
440
441
442
443
        --output /path/to/lowfps_model
        ```

    4. Add mapped frame to the full model that will be used for Lidar registration

        ```
        colmap model_merger \
Pinard Clement's avatar
Pinard Clement committed
444
445
        --input_path1 /path/to/geo_registered_model \
        --input_path2 /path/to/lowfps_model \
Clément Pinard's avatar
Clément Pinard committed
446
447
448
449
450
        --output /path/to/georef_full
        ```

        For next videos, replace input1 with `/path/to/georef_full` , which will incrementally add more and more images to the model.

451
    5. Register the remaining frames of the videos, without mapping. This is done by chunks in order to avoid RAM problems. Chunks are created during step 5, when calling script `videos_to_colmap.py`. For each chunk `N`, make a copy of the scan database and do the same operations as above, minus the mapping, replaced with image registration.
Clément Pinard's avatar
Clément Pinard committed
452
453
454
455
456
457
458

        ```
        cp /path/to/video_scan.db /path/to/video_scan_chunk_n.db
        ```

        ```
        python add_video_to_db.py \
Pinard Clement's avatar
Pinard Clement committed
459
460
        --frame_list /path/toimages/videos/dir/full_chunk_n.txt \
        --metadata /path/toimages/videos/dir/metadata.csv\
Clément Pinard's avatar
Clément Pinard committed
461
462
463
464
465
466
467
        --database /path/to/video_scan_chunk_n.db
        ```

        ```
        colmap feature_extractor \
        --database_path /path/to/video_scan_chunk_n.db \
        --image_path /path/to/images \
Pinard Clement's avatar
Pinard Clement committed
468
        --image_list_path /path/toimages/videos/dir/full_n.txt
Clément Pinard's avatar
Clément Pinard committed
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
        --ImageReader.mask_path Path/to/images_mask/
        ```

        ```
        colmap sequential_matcher \
        --database_path /path/to/video_scan_chunk_n.db \
        --SequentialMatching.loop_detection 1 \
        --SequentialMatching.vocab_tree_path /path/to/indexed_vocab_tree
        ```

        ```
        colmap image_registrator \
        --database_path /path/to/video_scan_chunk_n.db \
        --input_path /path/to/lowfps_model
        --output_path /path/to/chunk_n_model
        ```

        (optional bundle adjustment)

        ```
        colmap bundle_adjuster \
        --input_path /path/to/chunk_n_model \
        --output_path /path/to/chunk_n_model \
        --BundleAdjustment.max_num_iterations 10
        ```

        if first chunk, simply copy `/path/to/chunk_n_model` to `/path/to/full_video_model`.
        Otherwise:

        ```
        colmap model_merger \
        --input1 /path/to/full_video_model \
        --input2 /path/to/chunk_n_model \
        --output /path/to/full_video_model
        ```

        At the end of this step, you should have a model with all the (localizable) frames of the videos + the other frames that where used for the first thorough photogrammetry

    6. Extract the frame position from the resulting model

        ```
        python extract_video_from_model.py \
        --input_model /path/to/full_video_model \
        --output_model /path/to/final_model \
Pinard Clement's avatar
Pinard Clement committed
513
        --metadata_path /path/toimages/video/dir/metadata.csv
Clément Pinard's avatar
Clément Pinard committed
514
515
        --output_format txt
        ```
Clément Pinard's avatar
Clément Pinard committed
516

Clément Pinard's avatar
Clément Pinard committed
517
    7. Filter the image sequence to exclude frame with an absurd acceleration and interpolate them instead. We keep a track of interpolated frames, which will not be used for depth validation but can be used for depth estimation algorithms that need odometry of previous frames.
Clément Pinard's avatar
Clément Pinard committed
518
519
        ```
        python filter_colmap_model.py \
Clément Pinard's avatar
Clément Pinard committed
520
521
        --input_images_colmap /path/to/final_model/images.txt \
        --output_images_colmap /path/to/final_model/images.txt \
Pinard Clement's avatar
Pinard Clement committed
522
523
        --metadata /path/toimages/video/dir/metadata.csv \
        --interpolated_frames_list /path/toimages/video/dir/interpolated_frames.txt
Clément Pinard's avatar
Clément Pinard committed
524
525
        ```
    At the end of these per-video-tasks, you should have a model at `/path/to/georef_full` with all photogrammetry images + localization of video frames at 1fps, and for each video a TXT file with positions with respect to the first geo-registered reconstruction.
Clément Pinard's avatar
Clément Pinard committed
526

Clément Pinard's avatar
Clément Pinard committed
527
11. Point cloud densification
Clément Pinard's avatar
Clément Pinard committed
528

Clément Pinard's avatar
Clément Pinard committed
529
530
531
532
533
534
535
536
    ```
    colmap image_undistorter \
    --image_path /path/to/images \
    --input_path /path/to/georef_full \
    --output_path /path/to/dense \
    --output_type COLMAP \
    --max_image_size 1000
    ```
Clément Pinard's avatar
Clément Pinard committed
537

Clément Pinard's avatar
Clément Pinard committed
538
    `max_image_size` option is optional but recommended if you want to save space when dealing with 4K images
Clément Pinard's avatar
Clément Pinard committed
539

Clément Pinard's avatar
Clément Pinard committed
540
541
542
543
544
545
    ```
    colmap patch_match_stereo \
    --workspace_path /path/to/dense \
    --workspace_format COLMAP \
    --PatchMatchStereo.geom_consistency 1
    ```
Clément Pinard's avatar
Clément Pinard committed
546

Clément Pinard's avatar
Clément Pinard committed
547
548
549
550
551
552
553
    ```
    colmap stereo_fusion \
    --workspace_path /path/to/dense \
    --workspace_format COLMAP \
    --input_type geometric \
    --output_path /path/to/georef_dense.ply
    ```
Clément Pinard's avatar
Clément Pinard committed
554

Clément Pinard's avatar
Clément Pinard committed
555
    This will also create a `/path/to/georef_dense.ply.vis` file which describes frames from which each point is visible.
Clément Pinard's avatar
Clément Pinard committed
556

Clément Pinard's avatar
Clément Pinard committed
557
12. Point cloud registration
Clément Pinard's avatar
Clément Pinard committed
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
    
    Convert meshlab project to PLY with normals :

    Determine the transformation to apply to `/path/to/lidar.mlp` to get to `/path/to/georef_dense.ply` so that we can have the pose of the cameras with respect to the lidar.

    Option 1 : construct a meshlab project similar to `/path/to/lidar.mlp` with `/path/to/georef_dense.ply` as first mesh and run ETH3D's registration tool 
    ```
    python meshlab_xml_writer.py add \
    --input_models /path/to/georef_dense.ply \
    --start_index 0 \
    --input_meshlab /path/to/lidar.mlp \
    --output_meshlab /path/to/registered.mlp
    ```
    ```
    ETHD3D/build/ICPScanAligner \
    -i /path/to/registered.mlp \
    -o /path/to/registered.mlp \
    --number_of_scales 5
    ```

    The second matrix in `/path/to/register.mlp` will be the matrix transform from `/path/to/lidar.mlp` to `/path/to/georef_dense.ply`

Clément Pinard's avatar
Clément Pinard committed
580
    Importante note : This operation doesn't work for scale adjustments. Theoretically, if the video frames are gps localized, it should no be a problem, but it can be a problem with very large models where a small scale error will be responsible for large displacement errors locally.
Clément Pinard's avatar
Clément Pinard committed
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599

    Option 2 : construct a PLY file from lidar scans and register the reconstructed cloud with respect to the lidar, with PCL or CloudCompare. We do this way (and not from lidar to reconstructed), because it is usually easier to register the cloud with less points with classic ICP)
    ```
    ETHD3D/build/NormalEstimator \
    -i /path/to/lidar.mlp \
    -o /path/to/lidar_with_normals.ply
    ```

    ```
    pcl_util/build/CloudRegistrator \
    --georef /path/to/georef_dense.ply \
    --lidar /path/to/lidar_with_normals.ply \
    --output_matrix /path/toregistration_matrix.txt
    ```

    Note that `/path/toregistration_matrix.txt`stored the inverse of the matrix we want, so you have to invert it and save back the result.

    Or use CloudCompare : https://www.cloudcompare.org/doc/wiki/index.php?title=Alignment_and_Registration
    Best results were maintened with these consecutive steps :
Clément Pinard's avatar
Clément Pinard committed
600
601
602
603
604
    - Crop the /path/georef_dense.ply cloud, otherwise the Octomap will be very inefficient, and the cloud usually has very far outliers. See [Cross section](https://www.cloudcompare.org/doc/wiki/index.php?title=Cross_Section).
    - Apply noise filtering on cropped cloud . See [Noise filter](https://www.cloudcompare.org/doc/wiki/index.php?title=Noise_filter).
    - (Optional, especially if the frames are gps localized) Manually apply a rough registration with point pair picking. See [Align](https://www.cloudcompare.org/doc/wiki/index.php?title=Align).
    - Apply fine registration, with final overlap of 50%, scale adjustment, and Enable farthest point removal. See [ICP](https://www.cloudcompare.org/doc/wiki/index.php?title=ICP)
    - Save resulting registration matrix
Clément Pinard's avatar
Clément Pinard committed
605
606
607
608
609
610
611
612
613
614

    For the fine registration part, as said earlier, the aligned cloud is the reconstruction and the reference cloud is the lidar

    finally, apply the registration matrix to `/path/to/lidar/mlp` to get `/path/to/registered.mlp`

    ```
    python meshlab_xml_writer.py transform \
    --input_meshlab /path/to/lidar.mlp \
    --output_meshlab /path/to/registered.mlp \
    --transform /path/to/registration_matrix.txt
Clément Pinard's avatar
Clément Pinard committed
615
    --inverse
Clément Pinard's avatar
Clément Pinard committed
616
617
    ```

Clément Pinard's avatar
Clément Pinard committed
618
13. Occlusion Mesh generation
Clément Pinard's avatar
Clément Pinard committed
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656

    Use COLMAP delaunay mesher to generate a mesh from PLY + VIS.
    Normally, COLMAP expect the cloud it generated when running the `stereo_fusion` step, but we use the lidar point cloud instead.

    Get a PLY file for the registered lidar point cloud
    
    ```
    ETHD3D/build/NormalEstimator \
    -i /path/to/registered.mlp \
    -o /path/to/lidar_with_normals.ply
    ```

    ```
    pcl_util/build/CreateVisFile \
    --georef_dense /path/to/georef_dense.ply \
    --lidar lidar_with_normals.ply \
    --output_cloud /path/to/dense/fused.ply \
    --resolution 0.2
    ```

    This is important to place the resulting point cloud at root of COLMAP MVS workspace `/path/to/dense` that was used for generating `/path/to/georef_dense.ply` and name it `fused.ply` because it is hardwritten on COLMAP's code.
    The file `/path/to/fused.ply.vis` will also be generated. The resolution option is used to reduce the computational load of the next step.

    ```
    colmap delaunay_mesher \
    --input_type dense \
    --input_path /path/to/dense \
    --output_path /path/to/occlusion_mesh.ply
    ```

    Generate splats for lidar points outside of occlusion mesh close range. See https://github.com/ETH3D/dataset-pipeline#splat-creation

    ```
    ETH3D/build/SplatCreator \
    --point_normal_cloud_path /path/tolidar_with_normals.ply \
    --mesh_path /path/to/occlusion_mesh.ply \
    --output_path /path/to/splats.ply
    --distance_threshold 0.1
Clément Pinard's avatar
Clément Pinard committed
657
    --max_splat_size 0.25
Clément Pinard's avatar
Clément Pinard committed
658
659
660
661
    ```

    The ideal distance threshold is what is considered close range of the occlusion mesh, and the distance from which a splat (little square surface) will be created.

Clément Pinard's avatar
Clément Pinard committed
662
14. Raw Groundtruth generation
Clément Pinard's avatar
Clément Pinard committed
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
    
    For each video :

    ```
    ETH3D/build/GroundTruthCreator \
    --scan_alignment_path /path/to/registered.mlp \
    --image_base_path /path/to/images \
    --state_path path/to/final_model \
    --output_folder_path /path/to/raw_GT \
    --occlusion_mesh_path /path/to/occlusion_mesh.ply \
    --occlusion_splats_path /path/to/splats/ply \
    --max_occlusion_depth 200 \
    --write_point_cloud 0 \
    --write_depth_maps 1 \
    --write_occlusion_depth 1 \
    --compress_depth_maps 1
    ```

    This will create for each video a folder `/path/to/raw_GT/groundtruth_depth/<video name>/` with compressed files with depth information. Option `--write_occlusion_depth` will make the folder `/path/to/raw_GT/` much heavier but is optional. It is used for inspection purpose.

Clément Pinard's avatar
Clément Pinard committed
683
15. Dataset conversion
Clément Pinard's avatar
Clément Pinard committed
684
685
686
687
688
689

    For each video :

    ```
    python convert_dataset.py \
    --depth_dir /path/to/raw_GT/groundtruth_depth/<video name>/ \
Pinard Clement's avatar
Pinard Clement committed
690
    --images_root_folder /path/toimages/ \
Clément Pinard's avatar
Clément Pinard committed
691
    --occ_dir /path/to/raw_GT/occlusion_depth/<video name>/ \
Pinard Clement's avatar
Pinard Clement committed
692
    --metadata_path /path/toimages/videos/dir/metadata.csv \
Clément Pinard's avatar
Clément Pinard committed
693
694
    --dataset_output_dir /path/to/dataset/ \
    --video_output_dir /path/to/vizualisation/ \
Pinard Clement's avatar
Pinard Clement committed
695
    --interpolated_frames_list /path/toimages/video/dir/interpolated_frames.txt \
Clément Pinard's avatar
Clément Pinard committed
696
697
698
699
    --final_model /path/to/final_model/ \
    --video \
    --downscale 4 \
    --threads 8
Clément Pinard's avatar
Clément Pinard committed
700
701
702
703
704
705
    ```

## Detailed method with the "Manoir" example

### Scene presentation

706
707
708
709
The scene is a Manoir in french country side
 - Terrain dimensions : 350m x 100m
 - Max altitude : 20m

Pinard Clement's avatar
Pinard Clement committed
710
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/plan1.jpg]
711
712
713
714
715
716


### Lidar Data acquisition

3D Lidar data was captured by a DJI Matrice 600 with a Velodyne VLP-16 on board, with RTK GPS system.

Pinard Clement's avatar
Pinard Clement committed
717
718
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/drone1.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/drone2.jpg)
719
720
721
722
723

### Photogrammetry images acquisition

For photogrammetry oriented pictures, we used an Anafi drone with the Pix4D app that lets us make one grid and two orbits above the field we wanted to scan. We also used a personal DSLR (Sony alpha-6000) for additional photo.

Pinard Clement's avatar
Pinard Clement committed
724
725
726
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/plan2.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/plan3.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/plan4.jpg)
727
728
729

Here is a vizualisation of the resulting point cloud :

Pinard Clement's avatar
Pinard Clement committed
730
731
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/pointcloud1.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/pointcloud2.jpg)
732
733
734
735
736
737
738
739
740

### Piloting videos acquisition

We took videos a two different quality settings : 
 - 4K, 30fps very good quality
 - 720p, 120 fps bad quality (but high framerate)

We have 65k frames in total.

Pinard Clement's avatar
Pinard Clement committed
741
742
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/piloting1.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/piloting2.jpg)
743
744
745
746
747
748

### Optimal video sampling

The first image shows the video localisation with each other according to anafi metadata. (made with COLMAP gui)
The second image shows the frames that have been kept in order to stay at 1000 frames with an optimal spatial sampling.

Pinard Clement's avatar
Pinard Clement committed
749
750
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/optimal_sample1.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/optimal_sample2.jpg)
751
752
753
754
755

### Thorough photogrammetry

Thorough photogrammetry was done with 1000 frames. Notice that not all the area was mapped. It is expected to be completed once we take care of each video.

Pinard Clement's avatar
Pinard Clement committed
756
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/photog1.jpg)
757
758
759

### Video localisation

Pinard Clement's avatar
Pinard Clement committed
760
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/photog2.jpg)
761
762
763
764
765
766
767

### Dataset inspection

 - First image : black and white drone image
 - Second image : depth map vizualisation
 - Third image : Occlusion depth map

Pinard Clement's avatar
Pinard Clement committed
768
769
770
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/result1.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/result3.jpg)
![h](https://gitlab.ensta.fr/pinard/drone-depth-validation-set/raw/master/images/result3.jpg)
771
772
773

### Resulting video

Pinard Clement's avatar
Pinard Clement committed
774
[![Alt text](https://img.youtube.com/vi/NLIvrzUB9bY/0.jpg)](https://www.youtube.com/watch?v=NLIvrzUB9bY&list=PLMeM2q87QjqjAAbg8RD3F_J5D7RaTMAJj)