Tutorials

1. Data Preparation

Two operations can be performed in PyLandslide for data preparation: (1) factor data co-registration (i.e., re-projection and alignment) and(1) Machine Learning data extraction.

Reclassified factor data are provided to PyLandslide in a raster format (i.e., tif) and should have the same spatial resolution. The co-registration operation ensures that the reclassified factor data are in the same projection system and has the same dimensions. All rasters are clipped and aligned with the raster that has the smallest dimensions.

Machine learning data extraction involves extracting the values of factors contributing to the occurance of landslides at past landslide and non-landslide locations. The input to this operation is the co-registered raster layers and two point shapefiles. One of the shapefiles includes the locations of past landslides and the other includes random locations at which no landslides occurred in the past. The output of the Machine learning data operation is two CSV file, one including the features (i.e., the values of factors) and the other the targets (i.e., the corresponding status as landslide or non-landslide)

1.1. Factor data co-registration

As shown below, to perform the factor data co-registration operation, use the coregister command through the Command Line Interface (CLI), where raster_data is a folder in which the raw input factor raster data are stored. Sample reclassified factor data are available for Italy in the Github Repository of PyLandslide. It is important to note that factor data needs to be reclassified with an integer score ranging from 0 to 100 for each class; PyLandslide DOES NOT currently perform reclassification and score estimation.

PyLandslide coregister -f raster_data

Running the coregister command results in two sub-folders within the raster_data folder: uint8 and alinged_rasters. A version of the raw factor data is converted to uint8 type to reduce size and is saved under the uint8 folder. The alinged_rasters folder includes the final co-registered raster layers to be used in the next steps.

1.2. Machine Learning data extraction

As shown below, to perform the Machine Learning data extraction operation, use the mldata command through the Command Line Interface (CLI), where 1_weight_range_data_preparation.json is a JSON (JavaScript Object Notation) text file containing the information needed to running the mldata command.

PyLandslide mldata -f 1_weight_range_data_preparation.json

An example JSON file for the mldata command is available in the Github Repository of PyLandslide. The JSON file is used to point to the output directory for the operation, the landslide and non-landslide shapefiles, and locations and names of the factor raster files. Below is an example JSON file. Please follow the exact structure and only change input information where needed.

{
"output_directory":"outputs",
"landslide_locations":"shp_data/landslide_locations.shp",
"nonlandslide_locations":"shp_data/nonlandslide_locations.shp",
"factors":[
    {
    "name":"landcover",
    "file":"raster_data/landcover_scores.tif"
    },
    {
    "name":"lithology",
    "file":"raster_data/lithology_scores.tif"
    },
    {
    "name":"rainfall",
    "file":"raster_data/rainfall_historical_chirps_scores.tif"
    },
    {
    "name":"roads",
    "file":"raster_data/roads_scores.tif"
    },
    {
    "name":"slope",
    "file":"raster_data/slope_scores.tif"
    }
]
}

Running the mldata command results in two CSV files saved in the output directory specified in the JSON file. One file is features.csv and contains the values of factors and landslide and non-landslide locations. The other file is targets.csv and contains the corresponding status in terms of landslide or non-landslide location (1 or 0, respectively).

2. Weight Range Analysis

PyLandslide uses the feature and target CSV files generated using the mldata command to train many Random Forest Regression Machine Learning models on predicting whether a location is a landslide or non-landslide. Models that show a performance above a threshold (define based on overall accuracy in testing and training) are used to calculate feature importance, which represents the contribution of each factor in determining the status of a location. By random sampling from the feature and target data, the range of factor weights can be derived.

As shown below, to perform Weight Range Analysis, use the weightrange command through the Command Line Interface (CLI), where 2_weight_range_json_file.json.json is a JSON (JavaScript Object Notation) text file containing the information needed to running the weightrange command.

PyLandslide weightrange -f 2_weight_range_json_file.json

An example JSON file for the weightrange command is available in the Github Repository of PyLandslide. The JSON file is used to point to the features CSV file, the targets CSV file, and the output file in which weight ranges are saved. Furthermore, for the random forest model, the JSON file defines the number of trees, the maximum tree depth, the sized of the testing sample, and lower performance cutoff for using the trained model, the number of processing cores to be used when training models, and the number of random sampling iteration to determine the uncertainty ranges. Please follow the exact structure and only change input information where needed.

{
    "features_file":"csv_data/features.csv",
    "targets_file":"csv_data/targets.csv",
    "number_trees":15,
    "max_tree_depth":9,
    "size_testing_sample":0.2,
    "number_of_iterations":20000,
    "cores":6,
    "performance_cutoff":0.75,
    "output_file":"outputs/weight_ranges.csv"
}

Running the weightrange command results in the CSV file output file specified in the JSON file. The CSV file includes the different sets of weights resulting from the Machine Leaning models that satisfied the overall accuracy cutoff requirement. The output CSV file also includes the overall accuracy values of each satisfactory model iteration.

3. Sensitivity Analysis

PyLandslide uses the weight range CSV file generated using the weightrange command to perform a sensitivity analysis by randomly sampling from the set of weights available in the CSV file, performing landslide susceptibility mapping based on each set of weights, and calculating the percentage area under each susceptibility class. The susceptibility classes and the number of sensitivity iterations (or trials) are also provided as an inputs.

As shown below, to perform Sensitivity Analysis, use the sensitivity command through the Command Line Interface (CLI), where 3_sensitivity_json_file_historical_rainfall.json is a JSON (JavaScript Object Notation) text file containing the information needed to running the sensitivity command, and the number following -t is an integer specifying the number of sensitivity iterations (or trials).

PyLandslide sensitivity -f 3_sensitivity_json_file_historical_rainfall.json -t 5

An example JSON file for the sensitivity command is available in the Github Repository of PyLandslide. The JSON file is used to point to the output directory for the operation, the weight range CSV file generated using the weightrange command, the factors and their locations, the upper and lower bounds of the landslide susceptibility classes. Please follow the exact structure and only change input information where needed.

{
"output_directory":"outputs",
"weight_csv_sensitivity_file":"outputs/weight_ranges.csv",
"factors":[
    {
    "name":"landcover",
    "file":"raster_data/landcover_scores.tif"
    },
    {
    "name":"lithology",
    "file":"raster_data/lithology_scores.tif"
    },
    {
    "name":"rainfall",
    "file":"raster_data/rainfall_historical_chirps_scores.tif"
    },
    {
    "name":"roads",
    "file":"raster_data/roads_scores.tif"
    },
    {
    "name":"slope",
    "file":"raster_data/slope_scores.tif"
    }
],
"susceptibility_classes":[
    {
    "name":"very low",
    "class_lower_bound":0.0001,
    "class_upper_bound":19.9999
    },
    {
    "name":"low",
    "class_lower_bound":20,
    "class_upper_bound":39.9999
    },
    {
    "name":"moderate",
    "class_lower_bound":40,
    "class_upper_bound":59.9999
    },
    {
    "name":"high",
    "class_lower_bound":60,
    "class_upper_bound":79.9999
    },
    {
    "name":"extremly high",
    "class_lower_bound":80,
    "class_upper_bound":1009
    }
]
}

Running the sensitivity command results in a CSV file named sensitivity_results.csv saved in the output directory that has been specified in the JSON file. The CSV file includes, for each sensitivity trial, the weights used and the percentage area under each landslide susceptibility class.

4. Generating susceptibility raster layer

After performing a sensitivity analysis, one might want to generate a susceptibility raster layer for a specific sensitivity trial. PyLandslide enables doing this through the generate command.

As shown below, the generate command is used through the Command Line Interface (CLI), where 3_sensitivity_json_file_historical_rainfall.json is the same JSON (JavaScript Object Notation) text file used to run the sensitivity command, sensitivity_results.csv is the output CSV file of the sensitivity analysis, and the number following -i is an integer specifying the index of the sensitivity trial for which a raster layer is to be generated. The sensitivity_results.csv must be under a folder named outputs for PyLandslide to be able to find it.

PyLandslide generate -f  3_sensitivity_json_file_historical_rainfall.json -c sensitivity_results.csv -i 0

The result of this command is a raster layer saved under the output directory and uses the index of the sensitivity trial as a name suffix.

5. Comparing two susceptibility raster layer

One might need to compare two landslide susceptibility raster layers. This can be performed through the compare command of PyLandslide.

As shown below, the compare command is used through the Command Line Interface (CLI), where 3_sensitivity_json_file_historical_rainfall.json is the same JSON (JavaScript Object Notation) text file used to run the sensitivity command, layer1.tif is the first raster layer and layer2.tif is the second raster layer to be compared. Both layer1.tif and layer2.tif must be under a folder named outputs for PyLandslide to be able to find them.

PyLandslide compare -f 3_sensitivity_json_file_historical_rainfall.json -l1 layer1.tif -l2 layer2.tif

Running compare results in comparison stats printed on the terminal (see the example below) showing the percentage area under each susceptibility class with each layer. Additionally, a raster layer named susceptibility_difference.tif is generated in the output directory showing layer1.tif minus layer2.tif.

layer1--------------------
very low : 0.219
low : 0.319
moderate : 0.17
high : 0.215
extremly high : 0.076

layer2--------------------
very low : 0.179
low : 0.325
moderate : 0.172
high : 0.228
extremly high : 0.096