Data Processing Documentation

Overview

Brief description of the data processing objectives and scope. Reminder to adhere to data ownership and usage guidelines.

ACCESS proposal

Dispersal is a key mechanism driving the geographic distributions of species on Earth, but dispersal theory and methods are based primarily on macroorganisms with microbial dispersal paradigms emerging only recently. In fungi, numerous traits related to dispersal (e.g. spore traits, fruiting body traits, dispersal syndromes) are likely linked to fungal biogeographic patterns, but these hypotheses remain largely untested. We aim to harmonize fungal dispersal trait data with DNA sequence-based taxon occurrence data to test trait-based predictions regarding the dispersal capabilities of fungi across spatial scales. We will also assess the potential for fungal dispersal to buffer against range shifts predicted with global climate change. This work will contribute to our understanding of global fungal biodiversity and ecosystem function, as well as aid in predicting plant and human fungal disease outbreaks. Finally, we will integrate fungal dispersal models with global climate change predictions to assess the potential for fungal range shifts in a changing world.

To accomplish these tasks, most data wrangling and analyses will be performed using the language R. Much of the eDNA data will be in raw sequence form, which will require advanced computing capabilities to enable an efficient bioinformatics pipeline; merging trait-based spore dispersal modeling with climate envelope models will also require substantial computing power. Due to the novelty of this work we are unsure about our specific requirements, but we are excited to learn more about our GPU and storage requirements through benchmarking.

Data Sources

List and describe data sources used, including links to cloud-optimized sources. Highlight permissions and compliance with data ownership guidelines.

CyVerse Discovery Environment

Instructions for setting up and using the CyVerse Discovery Environment for data processing. Tips for cloud-based data access and processing.

Data Processing Steps

Using GDAL VSI

Guidance on using GDAL VSI (Virtual System Interface) for data access and processing. Example commands or scripts:

gdal_translate /vsicurl/http://example.com/data.tif output.tif

Cloud-Optimized Data

Advantages of using cloud-optimized data formats and processing data without downloading. Instructions for such processes.

Data Storage

Information on storing processed data, with guidelines for choosing between the repository and CyVerse Data Store.

Best Practices

Recommendations for efficient and responsible data processing in the cloud. Tips to ensure data integrity and reproducibility.

Challenges and Troubleshooting

Common challenges in data processing and potential solutions. Resources for troubleshooting in the CyVerse Discovery Environment.

Conclusions

Summary of the data processing phase and its outcomes. Reflect on the methods used.

References

Citations of tools, data sources, and other references used in the data processing phase.

Last update: 2025-09-25