Tutorials ======== 1. Data Preparation -------------------- Two operations can be performed in PyLandslide for data preparation: (1) factor data co-registration (i.e., re-projection and alignment) and(1) Machine Learning data extraction. Reclassified factor data are provided to PyLandslide in a raster format (i.e., tif) and should have the same spatial resolution. The co-registration operation ensures that the reclassified factor data are in the same projection system and has the same dimensions. All rasters are clipped and aligned with the raster that has the smallest dimensions. Machine learning data extraction involves extracting the values of factors contributing to the occurance of landslides at past landslide and non-landslide locations. The input to this operation is the co-registered raster layers and two point shapefiles. One of the shapefiles includes the locations of past landslides and the other includes random locations at which no landslides occurred in the past. The output of the Machine learning data operation is two CSV file, one including the features (i.e., the values of factors) and the other the targets (i.e., the corresponding status as landslide or non-landslide) 1.1. Factor data co-registration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As shown below, to perform the factor data co-registration operation, use the `coregister command <https://WRHGroup.github.io/PyLandslide/coregister.html>`__ through the Command Line Interface (CLI), where ``raster_data`` is a folder in which the raw input factor raster data are stored. Sample reclassified factor data are available for Italy in the `Github Repository of PyLandslide <https://github.com/WRHGroup/PyLandslide/tree/main/tests/files/raster_data>`__. It is important to note that factor data needs to be reclassified with an integer score ranging from 0 to 100 for each class; PyLandslide **DOES NOT** currently perform reclassification and score estimation. .. code-block:: console PyLandslide coregister -f raster_data Running the ``coregister`` command results in two sub-folders within the ``raster_data`` folder: ``uint8`` and ``alinged_rasters``. A version of the raw factor data is converted to uint8 type to reduce size and is saved under the ``uint8`` folder. The ``alinged_rasters`` folder includes the final co-registered raster layers to be used in the next steps. 1.2. Machine Learning data extraction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As shown below, to perform the Machine Learning data extraction operation, use the `mldata command <https://WRHGroup.github.io/PyLandslide/mldata.html>`__ through the Command Line Interface (CLI), where ``1_weight_range_data_preparation.json`` is a JSON (JavaScript Object Notation) text file containing the information needed to running the mldata command. .. code-block:: console PyLandslide mldata -f 1_weight_range_data_preparation.json An example JSON file for the mldata command is available in the `Github Repository of PyLandslide <https://github.com/WRHGroup/PyLandslide/blob/main/tests/files/1_weight_range_data_preparation.json>`__. The JSON file is used to point to the output directory for the operation, the landslide and non-landslide shapefiles, and locations and names of the factor raster files. Below is an example JSON file. Please follow the exact structure and only change input information where needed. .. code-block:: console { "output_directory":"outputs", "landslide_locations":"shp_data/landslide_locations.shp", "nonlandslide_locations":"shp_data/nonlandslide_locations.shp", "factors":[ { "name":"landcover", "file":"raster_data/landcover_scores.tif" }, { "name":"lithology", "file":"raster_data/lithology_scores.tif" }, { "name":"rainfall", "file":"raster_data/rainfall_historical_chirps_scores.tif" }, { "name":"roads", "file":"raster_data/roads_scores.tif" }, { "name":"slope", "file":"raster_data/slope_scores.tif" } ] } Running the ``mldata`` command results in two CSV files saved in the output directory specified in the JSON file. One file is ``features.csv`` and contains the values of factors and landslide and non-landslide locations. The other file is ``targets.csv`` and contains the corresponding status in terms of landslide or non-landslide location (1 or 0, respectively). 2. Weight Range Analysis ------------------------- PyLandslide uses the feature and target CSV files generated using the `mldata command <https://WRHGroup.github.io/PyLandslide/mldata.html>`__ to train many Random Forest Regression Machine Learning models on predicting whether a location is a landslide or non-landslide. Models that show a performance above a threshold (define based on overall accuracy in testing and training) are used to calculate feature importance, which represents the contribution of each factor in determining the status of a location. By random sampling from the feature and target data, the range of factor weights can be derived. As shown below, to perform Weight Range Analysis, use the `weightrange command <https://WRHGroup.github.io/PyLandslide/weightrange.html>`__ through the Command Line Interface (CLI), where ``2_weight_range_json_file.json.json`` is a JSON (JavaScript Object Notation) text file containing the information needed to running the weightrange command. .. code-block:: console PyLandslide weightrange -f 2_weight_range_json_file.json An example JSON file for the weightrange command is available in the `Github Repository of PyLandslide <https://github.com/WRHGroup/PyLandslide/blob/main/tests/files/2_weight_range_json_file.json>`__. The JSON file is used to point to the features CSV file, the targets CSV file, and the output file in which weight ranges are saved. Furthermore, for the random forest model, the JSON file defines the number of trees, the maximum tree depth, the sized of the testing sample, and lower performance cutoff for using the trained model, the number of processing cores to be used when training models, and the number of random sampling iteration to determine the uncertainty ranges. Please follow the exact structure and only change input information where needed. .. code-block:: console { "features_file":"csv_data/features.csv", "targets_file":"csv_data/targets.csv", "number_trees":15, "max_tree_depth":9, "size_testing_sample":0.2, "number_of_iterations":20000, "cores":6, "performance_cutoff":0.75, "output_file":"outputs/weight_ranges.csv" } Running the ``weightrange`` command results in the CSV file output file specified in the JSON file. The CSV file includes the different sets of weights resulting from the Machine Leaning models that satisfied the overall accuracy cutoff requirement. The output CSV file also includes the overall accuracy values of each satisfactory model iteration. 3. Sensitivity Analysis ------------------------ PyLandslide uses the weight range CSV file generated using the `weightrange command <https://WRHGroup.github.io/PyLandslide/weightrange.html>`__ to perform a sensitivity analysis by randomly sampling from the set of weights available in the CSV file, performing landslide susceptibility mapping based on each set of weights, and calculating the percentage area under each susceptibility class. The susceptibility classes and the number of sensitivity iterations (or trials) are also provided as an inputs. As shown below, to perform Sensitivity Analysis, use the `sensitivity command <https://WRHGroup.github.io/PyLandslide/sensitivity.html>`__ through the Command Line Interface (CLI), where ``3_sensitivity_json_file_historical_rainfall.json`` is a JSON (JavaScript Object Notation) text file containing the information needed to running the sensitivity command, and the number following ``-t`` is an integer specifying the number of sensitivity iterations (or trials). .. code-block:: console PyLandslide sensitivity -f 3_sensitivity_json_file_historical_rainfall.json -t 5 An example JSON file for the sensitivity command is available in the `Github Repository of PyLandslide <https://github.com/WRHGroup/PyLandslide/blob/main/tests/files/3_sensitivity_json_file_historical_rainfall.json>`__. The JSON file is used to point to the output directory for the operation, the weight range CSV file generated using the `weightrange command <https://WRHGroup.github.io/PyLandslide/weightrange.html>`__, the factors and their locations, the upper and lower bounds of the landslide susceptibility classes. Please follow the exact structure and only change input information where needed. .. code-block:: console { "output_directory":"outputs", "weight_csv_sensitivity_file":"outputs/weight_ranges.csv", "factors":[ { "name":"landcover", "file":"raster_data/landcover_scores.tif" }, { "name":"lithology", "file":"raster_data/lithology_scores.tif" }, { "name":"rainfall", "file":"raster_data/rainfall_historical_chirps_scores.tif" }, { "name":"roads", "file":"raster_data/roads_scores.tif" }, { "name":"slope", "file":"raster_data/slope_scores.tif" } ], "susceptibility_classes":[ { "name":"very low", "class_lower_bound":0.0001, "class_upper_bound":19.9999 }, { "name":"low", "class_lower_bound":20, "class_upper_bound":39.9999 }, { "name":"moderate", "class_lower_bound":40, "class_upper_bound":59.9999 }, { "name":"high", "class_lower_bound":60, "class_upper_bound":79.9999 }, { "name":"extremly high", "class_lower_bound":80, "class_upper_bound":1009 } ] } Running the ``sensitivity`` command results in a CSV file named ``sensitivity_results.csv`` saved in the output directory that has been specified in the JSON file. The CSV file includes, for each sensitivity trial, the weights used and the percentage area under each landslide susceptibility class. 4. Generating susceptibility raster layer ------------------------------------------ After performing a sensitivity analysis, one might want to generate a susceptibility raster layer for a specific sensitivity trial. PyLandslide enables doing this through the `generate command <https://WRHGroup.github.io/PyLandslide/generate.html>`__. As shown below, the ``generate`` command is used through the Command Line Interface (CLI), where ``3_sensitivity_json_file_historical_rainfall.json`` is the `same JSON (JavaScript Object Notation) text file <https://github.com/WRHGroup/PyLandslide/blob/main/tests/files/3_sensitivity_json_file_historical_rainfall.json>`__ used to run the sensitivity command, ``sensitivity_results.csv`` is the output CSV file of the sensitivity analysis, and the number following ``-i`` is an integer specifying the index of the sensitivity trial for which a raster layer is to be generated. The ``sensitivity_results.csv`` must be under a folder named ``outputs`` for PyLandslide to be able to find it. .. code-block:: console PyLandslide generate -f 3_sensitivity_json_file_historical_rainfall.json -c sensitivity_results.csv -i 0 The result of this command is a raster layer saved under the output directory and uses the index of the sensitivity trial as a name suffix. 5. Comparing two susceptibility raster layer --------------------------------------------- One might need to compare two landslide susceptibility raster layers. This can be performed through the `compare command <https://WRHGroup.github.io/PyLandslide/compare.html>`__ of PyLandslide. As shown below, the ``compare`` command is used through the Command Line Interface (CLI), where ``3_sensitivity_json_file_historical_rainfall.json`` is the `same JSON (JavaScript Object Notation) text file <https://github.com/WRHGroup/PyLandslide/blob/main/tests/files/3_sensitivity_json_file_historical_rainfall.json>`__ used to run the sensitivity command, ``layer1.tif`` is the first raster layer and ``layer2.tif`` is the second raster layer to be compared. Both ``layer1.tif`` and ``layer2.tif`` must be under a folder named ``outputs`` for PyLandslide to be able to find them. .. code-block:: console PyLandslide compare -f 3_sensitivity_json_file_historical_rainfall.json -l1 layer1.tif -l2 layer2.tif Running ``compare`` results in comparison stats printed on the terminal (see the example below) showing the percentage area under each susceptibility class with each layer. Additionally, a raster layer named ``susceptibility_difference.tif`` is generated in the output directory showing ``layer1.tif`` minus ``layer2.tif``. .. code-block:: console layer1-------------------- very low : 0.219 low : 0.319 moderate : 0.17 high : 0.215 extremly high : 0.076 layer2-------------------- very low : 0.179 low : 0.325 moderate : 0.172 high : 0.228 extremly high : 0.096