Using the U-net convolutional network to map forest types and disturbance in the Atlantic rainforest with very high resolution images

Mapping forest types and tree species at regional scales to provide information for ecologists and forest managers is a new challenge for the remote sensing community. Here, we assess the potential of a U-net convolutional network, a recent deep learning algorithm, to identify and segment (1) natural forests and eucalyptus plantations, and (2) an indicator of forest disturbance, the tree species Cecropia hololeuca , in very high resolution images (0.3 m) from the World-View-3 satellite in the Brazilian Atlantic rainforest region. The networks for forest types and Cecropia trees were trained with 7611 and 1568 red-green-blue (RGB) images, respectively, and their dense labeled masks. Eighty per cent of the images were used for training and 20% for validation. The U-net network segmented forest types with an overall accuracy > 95% and an intersection over union (IoU) of 0.96. For C. hololeuca , the overall accuracy was 97% and the IoU was 0.86. The predictions were produced over a 1600 km 2 region using WorldView-3 RGB bands pan-sharpened at 0.3 m. Natural and eucalyptus forests compose 79 and 21% of the region’s total forest cover (82 250 ha). Cecro-pia crowns covered 1% of the natural forest canopy. An index to describe the level of disturbance of the natural forest fragments based on the spatial distribution of Cecropia trees was developed. Our work demonstrates how a deep learning algorithm can support applications such as vegetation, tree species distributions and disturbance mapping on a regional scale.


Introduction
Brazil is among the most biodiverse nations in the world, containing an estimated 20% of the Earth's biodiversity (SSC, 2012). Among Brazilian biomes, the Atlantic rainforest is a global priority for biodiversity conservation due to the abundance of flora and fauna species that are found there (Laurance 2009;Joly et al. 2010;SSC, 2012). This biome has been subjected to major changes, with a reduction of an estimated 12.5% of its original forest cover (INPE, 2013). The remaining Atlantic forest is extremely degraded: over 80% of the fragments are <50 ha, almost half of the forest is <100 m from its edges, the average distance between fragments is large (1440 m), and nature reserves protect only 9% of the remaining forest and 1% of the original forest (Ribeiro et al. 2009). Aside from deforestation, other human-induced processes such as logging and fire cause degradation and thus further loss of ecosystem services provided by the forest.
On the other hand, large areas of this biome are currently recovering from past deforestation, as seen by an increase in tree cover since the year 2000 (Hansen et al. 2013). This increase is mainly driven by the plantation of eucalyptus forests but it also includes a significant proportion of natural regeneration (Silva et al. 2018). These secondary forests cover an estimated 4.7% of the area of the original biome for the Atlantic forest (FAO, 2010). This natural regeneration of abandoned pasturelands can improve the provision of ecosystem services and habitat availability (Strassburg et al. 2016). However, currently, little is known about how these forests recover during secondary succession, thereby adding uncertainty to the estimation of the ecosystem services it provides (World Resources Institute, 2005;Diaz et al. 2006).
This leads to one of the major current challenges for conservation, which is to obtain reliable and accurate information on a large scale to monitor biodiversity, resources, ecosystem services as well as the human impact on natural ecosystems. Remote sensing is considered as being a key to this effort, mainly because of spatial resolution and temporal resolution of the datasets, which currently enable tracking elements of biodiversity, and also because of the increase in available data and computational capacity (Pettorelli et al. 2014;Turner 2014;He et al. 2015;Kwok 2018). The species occurrence measurement by satellite is among the 10 proposed biodiversity metrics to monitor the progress toward the Aichi Biodiversity Targets (Skidmore et al. 2015) as well as a recommended action to reach millennium goal 7: 'Ensure environmental sustainability' (United Nations, 2005).
Currently, significant efforts are being made to map forest cover and the changes therein, mainly based on Landsat data, with a spatial resolution of 30 m and a 1-year temporal resolution, such as Global Forest Change map (Hansen et al. 2013) and project MapBiomas specifically for Brazil (MapBiomas, 2018). Such resolution has revealed that fragmentation is increasing in all tropical forests (Taubert et al. 2018). However, it is still too coarse to retrieve information regarding species or the distribution of individual trees that can inform on the successional stage, diversity or disturbance levels of these ecosystems, which play key roles in maintaining environmental processes such as the water cycle, soil conservation, carbon sequestration and habitat protection (FAO, 2016).
To estimate disturbances in neotropical forests from satellite images, tree species of the genus Cecropia are good candidates. Cecropia is a widespread and abundant genus in the Neotropics (Franco-Rosselli and Berg 1997;Zalamea et al. 2012). Its abundance is related to disturbance intensity and has been recently shown to be a reliable indicator of forest biomass (Guitet et al. 2018). Cecropia trees have also been used to accurately date disturbances in secondary forests, as the age of the tree can be estimated with simple measurement of the tree node number (Zalamea et al. 2012). The trees of the species Cecropia hololeuca also present remarkable characteristics to support remote sensing methods of disturbance estimation in the Atlantic forest biome; these trees are abundant and can be easily visually detected in very high resolution images due to the morphologic and spectral characteristics of their leaves which are large and bright gray.
The last years have seen the revolution of deep learning for image classification, which began with the introduction of AlexNet in 2012 (Krizhevsky et al. 2012 deep learning computer vision algorithms have a growing role in the remote sensing field. The main advantage is that these supervised deep convolutional networks take raw data and automatically learn features through training, with minimal prior knowledge about the task (LeCun et al. 1998). For example, in the case of image segmentation, prior information is only given by labeled masks of the objects to recognize in the training images. These deep learning algorithms are used for land cover classification (Arief et al. 2018;Sun et al. 2018), scene classification (Maggiori et al. 2017;Wang et al. 2017;Liu et al. 2018) and object extraction (Xu et al. 2018). One particular type of network used for object extraction, the Unet network, is highly promising as it has been shown to outperform all traditional classification methods (Ronneberger et al. 2015;Huang et al. 2018). The architecture of this network consists of a contracting path to capture context and a symmetric expanding path that enables precise localization (Ronneberger et al. 2015). In remote sensed ecology, the applications of deep learning methods are at the beginning. Such applications are few, but successful; for example, oil palm tree detection and counting (Li et al. 2017), as well as recognition of tree type (deciduous/ evergreen) or species (Onishi and Ise 2018).
The following are the main objectives of this study. First, to assess the capacity of deep learning convolutional networks known as U-net (Ronneberger et al. 2015) to identify and segment (1) forest types (natural vs. plantations) and (2) Cecropia hololeuca, a tree species indicator of forest degradation, in a~1600 km 2 region of fragmented Atlantic forest near São Paulo, Brazil. Second, to measure the disturbance within the natural forest fragments with a new disturbance metric based on the individual tree distribution of C. hololeuca. To our knowledge, this is the first time that all adult individuals of a natural tree species are identified and segmented at a regional scale with very high resolution multispectral images.

Study site
This study was undertaken in a region of the Atlantic Forest biome located in the São Paulo State, Brazil, and centered at 23°11 0 43″S and 45°21 0 50″W, as shown in Figure 1A. This area was selected for the study because it contains several remnants of the Atlantic Forest biome as well as secondary forests at different stages of regeneration and planted eucalyptus forests ( Figure 1B). Most of the forest plots from the BIOTA project (Joly et al. 2010) are also included in this region to study the effect of fragmentation of the biome.

WorldView-3 images and preprocessing
The three WorldView-3 images (DigitalGlobe, Inc., Westminster, CO, USA) over the region were acquired on 13 August 2017, at an average off nadir view angle of 20.1°. Dig-italGlobe catalog Ids of the images were A01001032124B200, A01001032124CE00 and A01001032124D900. These three images were distributed in tiles of 16 384 9 16 384 pixels, which represents 40 tiles for each image. Only 90 tiles without significant cloud cover were retained for the analysis ( Figure 1B). The spatial resolution of the bands was 0.3 m for the panchromatic band (464-801 nm) and 1.2 m for the selected multispectral bands: Red (629-689 nm), Green (511-581 nm) and Blue (447-508 nm). All bands were scaled from raw image digital numbers (11 bits) to 0-255  (8 bits). The red-green-blue (RGB) bands were pansharpened with the panchromatic band using the method Simple RCS of the Orfeo toolbox addon otbcli_ BundleToPerfectSensor (Grizonnet et al. 2017) to create a single high resolution RGB image at a spatial resolution of 0.3 m. We used only RGB bands as the targets were already identifiable with a high confidence in the pan-sharpened RGB image and also for the parsimony of the model. No atmospheric correction was performed. Some thin transparent clouds were still present in the images, and a cloud mask was created manually to remove these pixels from the analysis.

Field data
Natural/planted forests mask All occurrences of natural forests and eucalyptus plantations were manually delineated in two tiles (16 384 9 16 384 pixels, 4.9152 km of side) of the Worldview-3 images. The natural forest and eucalyptus plantations are easily identifiable in the images, as shown in Figure 2. At this step, 550.3 and 2711.9 ha were delineated as planted and natural forests, respectively. Then, a raster mask coded in RGB was produced with the following values: background [0,0,0], natural forest [254,254,254] and eucalyptus plantation [127,127,127]. Cropping the tiles in images of 256 9 256 pixels resulted in 7611 available images for training, as one of the tiles have a large band of no data on the border thereby making it impossible to use the entire tile.

Cecropia trees mask
All occurrences of C. hololeuca trees were manually delineated in one tile (16 384 9 16 384 pixels) of the Worldview-3 image. This species is identifiable in the images due to its bright gray leaves, as shown in Figure 3. The delineated sample comprises 2228 polygons for Cecropia trees, where each polygon can represent more than one individual tree. With the delineated polygons, a raster mask coded in RGB was produced with the following values: background [0,0,0] and Cecropia sp. [254,254,254]. Cropping the mask into images of 128 9 128 pixels resulted in 1568 available images for training (images that contained at least one Cecropia tree).

Forest and land cover data
To test if our forest-type classification was consistent with independent datasets of land cover/use maps of the region, we  maps using automatic classification processes applied to satellite images. A complete description of the project can be found at http://mapbiomas.org. The MapBiomas map was produced based on the Landsat Data Archive available in the Google Earth Engine platform, encompassing the years from 2000 through the present days. The MapBiomas map results from a pixels-based classification based on random forest machine learning to overcome empirical calibration of the input parameters for image classification; details of the methods are given in the manual of the product. Overall accuracy of MapBiomas level 2 product for the Atlantic forest biome is 84.1%, the fraction of the error attributed to the amount of area allocated incorrectly to the classes by the mapping (Allocation Disagreement) is 4.7% and the mismatch allocation to the ratio of class-displacement errors (Area Disagreement) is 11.2% (Map-Biomas, 2018).

Architecture
In this study, we used a convolutional network for multiclass image segmentation known as U-net (Ronneberger et al. 2015). This network performs a per-pixel classification, predicting the probability of each pixel to belong to a particular class. This U-net model has recently proven to become a new standard in image dense labeling (Huang et al. 2018). We adapted the U-net architecture from Ronneberger et al. (2015) with twice less filters, since our training set is limited and a smaller number of filters helps in preventing overfitting. Furthermore, we used a three-band RGB image as the input, and have adapted the network architecture accordingly ( Figure 4). Sigmoid activation functions were used to ensure that output pixel values range between 0 and 1. For the training, we used an input size of 256 9 256 pixels for forest-type segmentation and 128 9 128 pixels for the Cecropia trees segmentation.

Network training
The training samples comprised 1568 images of 128 9 128 pixels for Cecropia trees and 7611 images of 256 9 256 pixels for forest types. The size of 128 9 128 pixels was selected because Cecropia crowns are smaller than 128 pixels in diameter (128 pixels = 38.4 m).
An image size of 256 9 256 pixels was used for natural forest/plantations to include a better contextual information, as the crowns of some large trees can entirely cover an image of Figure 4. U-net architecture for the forest types segmentation, adapted from Ronneberger et al. (2015). The number of channels is indicated above the cuboids and the vertical numbers indicate the row and column size in pixels.
128 9 128 pixels; and also because in plantation, trees are generally planted in lines and this information is more visible with this image size. Each image contained at least one object. The images were extracted from uniform grids of 128 9 128 and 256 9 256 pixels, without any overlap between neighboring images. Eighty per cent of these images were used for training and 20% used for validation. During network training, we used a standard stochastic gradient descent optimization. The loss function was designed as a sum of two terms: mutual cross-entropy and Dice coefficientrelated loss (Dice 1945;Chollet et al. 2015;Allaire and Chollet 2016). We used the optimizer RMSprop (unpublished, adaptive learning rate method proposed by Geoff Hintonhttp://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_ slides_lec6.pdf) with an initial learning rate of 1e-4. We trained our network for 100 epochs, where each epoch comprised of 78 batches with 16 images per batch. The optimization was stopped when the loss function improvement did not exceed 1e-4. Data augmentation was applied randomly to the input images, including 0/90/180/270°rotations and change in the brightness, saturation and hue by converting RGB to brightness-saturation-hue space and modulated the current values by between 95 and 110% for brightness, 95-105% for saturation and 99-101% for hue (as changes in the plant hues are not expected).

Segmentation accuracy assessment
Three performance metrics were computed. First, the overall accuracy was computed as the percentage of correctly classified pixels. Second, the intersection over union (IoU) of the object class, which is the number of pixels labeled as object in both the prediction and the reference, divided by the number of pixels labeled as object in the prediction and in the reference. Third, the F1 score was computed for each class i as the harmonic average of the precision and recall (eqn 1), where precision was the ratio of the number of segments classified correctly as i and the number of all segments (true and false positive) and recall was the ratio of the number of segments classified correctly as i and the total number of segments belonging to class i (true positive and false negative). This score varies between 0 (lowest value) and 1 (best value).

Prediction
For prediction, each Worldview-3 tile of 16 384 9 16 384 pixels was cropped on a regular grid of 512 9 512 pixels and 64 neighbor pixels were added on each side to create an overlap between the patches. If there was a remaining blank portionfor example due to the tile borderit was filled by the symmetrical image of the non-blank portion. The prediction of both models (forest-types and Cecropia tree segmentations) were made on these images of 640 9 640 pixels, and the resulting images were cropped to 512 9 512 pixels and merged to reconstitute the 16 384 9 16 384 pixels WV-3 tile. This overlapping method was used to avoid the artifact of prediction on the border, a known problem for the Unet algorithm (Ronneberger et al. 2015). To belong to a given class, the pixel prediction value must be above or equal to 0.5 for the given class.

Algorithm
The model was coded in R language (R Core Team, 2016) with Rstudio interface to Keras (Chollet et al. 2015;Allaire and Chollet 2016) and Tensorflow backend (Abadi et al. 2015). R code is available upon request. The training of the models took~2-20 h using Graphics Processing Unit (GPU) on a Nvidia Quadro K6000 with a 12 GB dedicated memory. Prediction using GPU of a single tile of 16 384 9 16 384 pixels (4.9152 km 2 ) took approximately 35 min.

Index of disturbance
Here, we define disturbance inside natural forest fragments as the presence of Cecropia trees, which are an indicator of an important past disturbance (Zalamea et al. 2012;Guitet et al. 2018). A pixel inside a fragment is considered as disturbed if it is closer to a Cecropia tree than to the edge of the fragment. To measure the disturbance of the fragments, we developed an D disturbance,f , as shown in eqn 2: where f indicates an index of the fragment, nbpixel f denotes the number of pixels of the fragment f, i denotes an index of the pixel and D max is the maximum distance of the disturbance to be considered. For a fragment f, D disturbance,f is computed as the sum of the minimum distance of each pixel to the fragment edge or to a Cecropia tree divided by the sum of the minimum distance of each pixel to the fragment edge. The obtained value ranges already between 0 and 1; however, it was subtracted to 1 to have D disturbance,f equal to 0 if there was no disturbance (no pixels is closer to a Cecropia than to the border,ñ o Cecropia in the fragment) and 1 if the fragment was totally disturbed (only Cecropia trees in the fragment). As some Cecropia trees occur naturally within large fragments, mainly after the fall of large trees, we set a treshold D max to not artificially increase D disturb,f in the case of natural disturbance. A distance to the edge or to Cecropia trees above D max was set to D max . This is an estimation of the natural distance to disturbance. Here, we used a D max of 200 m that is a pixel at a distance >200 m of the edge or of a Cecropia tree is considered undisturbed. The index D disturbance,f is designed to account for the spatial distribution of the Cecropia trees ( Figure 5).

Model convergence details and accuracies
Time for convergence was~20 h for the forest-type model and $ 2 h for the Cecropia trees model. The best models were obtained after 25 epochs with eight images per batch for the forest-type model and 28 epochs with 16 images per batch for the Cecropia trees model (Table 1).
For the forest-type segmentation, the overall accuracies and Dice coefficient were 95.40% and 0.96, respectively, as shown in Table 1. The overall accuracy of the Cecropia trees segmentation was 97.09% with a dice coefficient of 0.86.
The natural forest class showed the best F1-score followed by the eucalyptus plantation class (Table 2). Recall was higher in natural forests than in plantations, thereby indicating a lower rate of false negative. Precision was higher in plantations than in natural forests, thereby indicating a lower rate of false positive. The F1-score of the Cecropia trees segmentation was lower than for forest types with a value of 0.80. For this class, F1-score, precision and recall values were similar (Table 2).
Some errors of labeling have been identified ( Figure 6). For the natural forest, the main errors appeared when the tree cover had a homogeneous spectral response and contained less shade due to a highly closed canopy ( Figure 6A and B). For the eucalyptus plantation, the main errors appeared when tree plantation structure was not visible and the tree appeared to have a random location ( Figure 6C). In Figure 6D, in the bottom left portion of the image, the plantation lines were visible; on the other hand, where the model made an error (top right portion), the plantation structure has disappeared.
An example of Cecropia trees segmentation result with F1-score and manual segmentation is presented in Figure 7. The crowns of Cecropia trees are small, mainly <20 m. The border of the Cecropia is not sharp in the image and there is a variation between the manual and automatic segmentation (Figure 7). Some Cecropia trees were missed during the production of the training sample ( Figure 7B). Even though this segmentation can be considered good by visual interpretation (Figure 7C), low accuracy can result from inaccurate manual delineation and a small size of the object. In Figure 7D, an error of segmentation is due to the artifact (blue color) in one of the WV-3 images. IoU, intersection over union.

Regional results
We found that the region had 44.80% of forest cover in 2017 (82 248.52 ha), constituted of 78.68% of natural forest and 21.31% of eucalyptus plantations (Figure 8). In comparison, MapBiomass mapped only 36.94% of forest cover in 2017, which comprised 72.90% of natural forest and 27.10% of eucalyptus plantation. Furthermore, 0.95% of the natural forests are covered by Cecropia trees, which represents a total cover of 612.48 ha.
Based on our estimation of forest types, we found that the gain of tree cover of the region between 2000 and 2016 from the Global Forest Change map was mainly constituted of eucalyptus plantation (83.31%) and only 16.69% was natural forest regeneration. Similarly, with the estimation using MapBiomas, the gain of the forest cover constituted of 87.64% of eucalyptus plantations and 12.36% of natural regeneration.
The mean tree cover in 2000 from the Global Forest Change map was 71.04% for all forest types, 46.09% for plantations and 76.71% for natural forests. With the MapBiomas forest type classes, the mean tree cover was 77.64% for all forest types, 50.49% for plantation and 86.02% for natural forest.

Forest fragmentation and degradation
We found 2791 fragments of natural forest with an area larger than 1 ha, which represents 93% of all the segmented natural forest (Figure 9). The median size of the fragment was 3.07 ha, the 90th percentile was 22.45 ha and the largest fragment had an area of 15 107.80 ha. This largest fragment is located in the south-east and encompasses the Serra do Mar State Park, which is currently the largest remaining fragment of the Atlantic Forest.
We found that 0.95% of the natural forest fragments were covered by Cecropia trees (612.48 ha). However, the distribution of Cecropia had a high variability between the fragments, as shown by the value of D disturbance, (Figure 9). Fragments with a high disturbance present a D disturbance close to 1. While the median and mean values of D disturbance are low, 0.09 and 0.17, respectively, D disturbance tends to increase with fragment sizes (Figure 10). A majority of the less disturbed fragments has an area below 10 ha. On the other hand, most of the large fragments have a D disturbance of above 0.2.
Two fragments with contrasted values of D disturbance are presented in Figure 11. In fragment Figure 11A, 118 Cecropia trees have been identified by the Unet algorithm, thereby indicating important past disturbance. Fragment Figure 11B contains only two small Cecropia trees.

Mapping of forest types
In the studied region, the U-net network identified natural and eucalyptus forests with an overall accuracy of  above 95%. In comparison, overall accuracies of MapBiomas were 88.45% for natural forests and 77.7% for forest plantations. In our study, the high performance of the Unet model for this segmentation could be explained by the spectral values and textural information of the forest types, as shown in Figure 2, which are identifiable/separable by an human eye on an RGB image. As the region contains a lot of eucalyptus plantation, the training sample contains eucalyptus plantations at different stand ages. For eucalyptus plantations, one of the important features is the line of plantation. When some eucalyptus die or when the plantation is old and the line of the plantations is not so visually evident, the model tends to make some errors ( Figure 6). For the natural forests, one of important features for the U-net appears to be the presence of shade, and the model occasionally predicts very large homogeneous crowns without shade as background, likely predicted as pasture (Figure 6). At the spatial resolution of 0.3 m, we found~8% more forests than the forest mapping at 30 m of 2017 from MapBiomas (44.80% vs. 36.94% respectively). This is expected, as the pixels must have a high tree cover to be classified as forest, and Map-Biomas is not able to delineate the border of the forest accurately, due to its 30-m spatial resolution. This was also confirmed by the higher tree cover values for the forested pixels of MapBiomass. It must be noted that the obtained results could be further improved with a larger training set. We acknowledge that the U-net model is not necessarily the best method to produce vegetation/land cover mapping larger than the regional scale. Recent work using mainly Landsat images and non-deep learning methods (Hansen et al. 2013;MapBiomas, 2018) or other deep learning methods (Jia et al. 2017;Kussul et al. 2017;Lyu et al. 2018) also show high performance and can cover larger areas. However, in the case of a regional study using one or two very high resolution Worldview images and with the objective of mapping objects inside a particular vegetation class, such as in this study, we Figure 10. Forest degradation index D disturbance variation with the fragment area. The tendency curve in red was made with a cubic smooth spline. Fragments with a high disturbance present a D disturbance close to 1 and when there is no disturbance, close to 0. recommend producing the vegetation class model with Unet. First, because U-net has a demonstrated high performance for very high resolution image segmentation [this study and Huang et al. (2018)]; second, because it enables production of a vegetation mask at very high resolution; and finally, because U-net is relatively easy and convenient to use.

Implications for land use analysis
With a Worldview-3 image and with a few days of work to prepare the training images, train the model and make predictions, a reliable information on the land use can be produced by a single person. This land-use classification does not exist in tropical regions from available global maps at this spatial resolution. It is important to map the forest types for this region in order to interpret the large increase of 5-30% of the tree cover observed since the 1990s (FAO, 2010;Hansen et al. 2013;Silva et al. 2018) and to understand its implication in the context of global carbon change and conservation. While having a large amount of new growing forests is positive for the carbon balance, the fact that 80% of these new growing forests are eucalyptus plantation is a negative aspect for biodiversity. When producing the training dataset, the forest borders were easily assessed, as both forest types are dense and the other land-use types are spectrally different (mainly human construction or pasture). For sparser vegetation physiognomy with trees such as those in the Savannah formation, it may be more difficult to produce the training sample and it might be necessary to map all individual trees and see how crown size and density of trees vary across the landscape.

Recognition of C. hololeuca trees
In this study, the first regional distribution map of all individuals of a natural tree species with a multispectral remote sensing image was produced using a U-net network. Only an estimated 0.01% of total area of the Atlantic Forest has been surveyed since 1945 through field studies (1817 ha) (de Lima et al. 2015); here, C. hololeuca has been mapped in 64 713 ha of natural forests, which is $ 0.35% of the Atlantic forest remnants. This map of the species C. hololeuca had an overall accuracy of 97%. This high accuracy could be explained by the unique spectral values of the Cecropia leaves and the structure of the crown, as shown in Figure 3, thereby rendering them identifiable even for a human eye in an RGB image which is not the case for most of the tree species. Furthermore, contrary to more common segmented objects, such as cars, animals or buildings, individuals of a tree species theoretically have the same spectral response, which is linked to both leaf biochemical composition and architecture of the species (Asner et al. 2015;Ferreira et al. 2016). This might help the network to identify the trees. The accuracies of the segment border (F1-score = 0.804) were not high for the following reasons: First, because of the difficulty of accurately delineating the border of the Cecropia trees, which is not sharp. In comparison, the border of a car in an image is easier to distinguish and to accurately delineate. Second, due to the small size of the crown, each pixel contribution is important in relation to the crown size, that is, missing only a few pixels strongly affects the value of the F1-score. In the Neotropics, several other species of the genus Cecropia have bright leaves, including Cecropia telealba in Colombia (Franco-Rosselli and Berg 1997;Kattan and Murcia 2012) and Cecropia telenitida in the central and northern sections of the Andes Mountains (Franco-Rosselli and Berg 1997), and are strong candidates for mapping with the U-net method.

Consideration for tree species mapping
Recent studies on tree counts and species identification with deep learning employ only RGB bands ( (Li et al. 2017;Onishi and Ise 2018) and this paper). For other green vegetation, tree species and large plants, the gain of adding other multispectral bands in the model, bearing in mind that it will add a significant number of parameters to the model and increase computation time, remains to be explored. In comparison with recent object-based image analysis for identifying tree species in forests (Clark et al. 2005;Warner et al. 2006;Feret and Asner 2013;Fassnacht et al. 2016;Ferreira et al. 2016), the main advantage of the U-net method is that no previous individual delineations of tree crowns are needed, while its main disadvantage is that it requires a lot of training images (in our case between 1000 and 10 000). To create the sample for training, it is required to be able to identify the tree species in the image. This implies that the targets have a certain size and a particular spectral response or structure. We also recommend using images with 0.3 m of spatial resolution. In our region, the method will be tested with other species that can be identified in the VHR image, such as bamboos or the critically endangered Araucaria angustifolia. Some tree species could also be mapped during the reproduction period, if the tree is visible from space when flowering, such as the species Tibouchina mutabilis in the Atlantic Forest. C. hololeuca is distributed across the Atlantic forest biome (Franco-Rosselli and Berg 1997), but for a species with a narrower distribution range, we might be able to fully describe its specific climatic envelope with its real occurrence data (He et al. 2015). Furthermore, the large majority of tropical forest tree species are locally very rare, with adult population sizes <1 per hectare (ter Steege et al. 2013(ter Steege et al. , 2015, so that it is difficult to derive population sizes with field surveys based on~1 ha plots. For those tree species clearly identifiable in imagery, the coordinates of all adults in a region could be used to guide botanical collections or for genetics analysis of parenting, for example. Given that information on widespread, precise locations of species is generally unobtainable from field studies, applying methods such as the one developed here could have great value in the urgent task of describing and mapping rare, poorly known, and potentially threatened tropical tree species (ter Steege et al. 2015).

Natural forest fragmentation and disturbance
Although we are still far from recognizing all the species, species such as C. hololeuca which describe the disturbance and early successional stage of the forests can be accurately mapped ( Figure 11). Cecropia is a reliable indicator of short-term disturbance, since it may persist in the canopy from half a century to a century depending on the lifetime of the species (Zalamea et al. 2012;Guitet et al. 2018). Their mapping could help to assess and understand forest disturbances at regional scale, with information not accessible by another means; for example, if disturbance was older than the first satellite observations, and this with only one very high resolution image. In the future, it could be interesting to map an end-succession species to improve the understanding of the successional stage of these endangered forest fragments. The index based on the distribution of species in the fragment revealed information that is not accessible from the classical fragment analysis, which is based on size, shape and connectivity of the fragments (Vogt et al. 2007(Vogt et al. , 2009Strassburg et al. 2016). For example, here, we show that fragments above 10 ha can present a high degree of disturbance ( Figure 10). We have to acknowledge that the index D disturbance works on a fragment scale and does not account for the diversity of forest physiognomy within a fragment; for example, if the fragment contains both old growth forest and secondary forest. The largest fragment in the south-east, as shown in Figure 9, is crossed by roads and other human-made barriers, such as fences that are not detected below the tree cover; thus, it is still likely that we underestimated forest fragmentation. As fragmentation is increasing over all tropical forests (Taubert et al. 2018), our index D disturbance , in addition to the classical fragmentation analysis, can provide valuable information to guide the conservation of the Atlantic Rain Forest in particular and Neotropical forests in general.

Considerations for the processing
As shown in this study, automatic tropical forest mapping may be feasible in the near future, but there are still limitations from both the hardware and the model. The raster of one band of the full region at 0.3-m spatial resolution contains over 22 billion pixels. Consequently, the images must be processed in smaller tiles, which adds significant computational time to the analysis. Another limitation is that the geographical information of the original image is not conserved during the run of the model, because these models are still not designed for georeferenced imagery. The geographical information of the image must be saved before the run and added after the prediction, which adds computational time as data has to be written several times on the hard drive. To speed-up the predictions step, the model can be run on larger images than the image used for training. For example, the prediction were made from 640 9 640 pixels images, which was the maximum size supported by our GPU, while the models were trained with images of a maximum size of 256 9 256 pixels. Finally, the model required the use of a GPU with at least three Gigabytes of dedicated memory for both training and testing, which is not yet of standard installation in personal computers. In this study, the model ran on a GPU Nvidia K6000 of 2013 with 12 Gigabytes of dedicated memory and with a compute capacity of 3.5, and faster processing time might be expected with new generation of GPUs.

Considerations for generalization and scalability
The largest image produced by the Worldview-3 satellite covers 7336 km², 112 9 65.5 km, with the collecting scenario 'Large area collect' (DigitalGlobe, 2019). Due to this limitation, to cover a region larger than this size, several Worldview-3 images will be needed; and, as Worldview satellites do not cover all the Earth at regular intervals, they will have different days, hours, solar and sensor angles. Consequently, the reflectance of targeted objects will vary between the images and new training samples will be necessary. Furthermore, forests can display different features. For example, if in one image forests contain Cecropia trees, and in the other they contain pink-flowering trees, each feature must be presented to the model for it to be able to predict pink trees and Cecropia trees as forest features. New training samples will have to be produced for new images but can be added to the first training set to train only one multiclass model for all images (and not one model per image or per feature). As the U-net model is scale dependent, a trained model at a defined resolution is not expected to work with another spatial resolution. Further work is needed to compare U-net performances for tree species and vegetation mapping with other convolutional neural networks that can handle the task of semantic labeling in very high resolution images (He et al. 2017;Maggiori et al. 2017;Huang et al. 2018).

Considerations for atmospheric corrections
Very thin clouds which are transparent and contain reflectance information of the ground targets (Bai et al. 2016) might be treated during data augmentation. Further analysis should be made to analyze and simulate the reflectance of the ground targets above the very thin cloud cover. Other reflectance variations due to sun-view angle effects could also be included in the data augmentation, as some of these effects have been studied for decades, such as the bidirectional reflectance characteristics of vegetation (Ranson et al. 1985). The simulation of atmospheric variations and sun-view angle effects on the reflectance during the data augmentation remains to be explored and is of primary importance for the development of deep learning methods for remote sensing.

Conclusions
In this work, we showed that the deep learning algorithm U-net (Ronneberger et al. 2015) presents a great potential in remote sensing for ecology. U-net convolutional networks accurately segmented natural forest and eucalyptus plantation, and, for the first time, all adult individuals of a natural tree species were mapped at a regional scale with very high resolution multispectral images. The mapping of individual trees provides access to new information on forest ecosystems at a regional scale to ecologists and forest managers, such as the disturbance index developed here. The network will be further improved to work with images taken in different seasons, solar/view angles and atmospheric conditions, by increasing the training sample and developing a data augmentation step specific to remote sensing images.