Why develop a database of available samples from around the world?

The EMP is an ambitous initiative to refocus microbial ecology, however, such a initiative can only work with the support of the community. At this stage this support should take the form of the development of the GLOBAL ENVIRONMENTAL SAMPLE DATABASE (GESD). This database will hold information about the types of environmental samples that are currently available globally for the EMP. To make this feasible we are developing a submission system, which will outline the exact fields we will require to enable maximum comparison of sampled locations. Each sample series will be examined by the EMP community leaders and the final selection and will be made based on access to associated environmental context data, and experimental metadata which are compliant with the Minimal Information about a Metagenomic and Marker Gene Sequence (MIMS, MIMARKS respectively). The richer the available metadata the easier it will be to examine how the relationships between environment and biology. Specifically we want to explore environmental parameter space and it’s role in structuring microbial ecology in ecosystems, and importantly the feedback role of the microbial community on the environment. This is only possible if we have sufficient parametrization of both the community and the environment; the better the parametrization the easier our job will be.

Can we obtain information from the metadata alone?

Microbes exist by responding to environmental conditions; i.e if the temperature changes because the water in a river moves from high altitude to lower altitude the microbial system will respond. Microbes do not respond to changes in the geopolitical status of their current location, unless it has a direct effect on the environmental conditions. Therefore, the metadata collected through the GESD will enable us to explore the range and types of environments from which we can explore the Earth’s microbiome. This can be used to determine what additional samples we will need to collect to ‘fill in the gaps’ in our exploration. For example, we may have soil samples from a pH range of 4-6.5 and 7.3-11.2 at a range of different temperatures, nutrient loads, and soil types. To understand the full range of soil microbiota on earth we would need to explore samples whose pH is below 4, above 11.2 and between 6.5 and 7.3, and all at a range of different temperatures, nutrient loads and soil types.

One important factor will be time. A biogeographic survey is only as useful as your understanding of the history of the sample you have taken, i.e. is the community you explore representative of that site from that time of year. Where we do not have extensive temporal data, we will need to explore opportunities to obtain samples which expand our understanding of the temporal changes in community structure and their relationship to environmental conditions. The metadata will help us to determine where this is needed.

Submitting sample information to the GESD

Please contact “Gail Lesley Ackermann” <ackermag@ucsd.edu> who will coordinate your submission. However, here is an example of the process for submitting metadata about a soil sample collection:

Soil Metadata Guide


Go to http://microbio.me/qiime (Qiime website) and create an account.  This account will work on both the Qiime and EMP websites.  Once your account has been created, visit http://microbio.me/emp (the EMP website) and sign in. Click on “Create a new study” and fill in all the fields, then click on “create.” Next, click on “Generate a MIMARKS-compliant metadata template.”  This will open a page where the standard soil sample template fields are shown (pink-high-lighted fields) as well as a list of optional fields.  Select the appropriate fields and click on “Generate Templates.” This will generate 2 files.  The sample_template can be opened in Excel and then you can transfer your metadata as appropriate.

You have the option of adding columns to the template for measurements that are not included on either the standard template or in the optional fields.  When the metadata sheet is complete, save the file as a tab delimited .txt file and send it on to me for validation.



Soil sample template (EMP)

Sample_Name A unique identifier for the sample
Anonymized_Name Usually applies to human subjects, if not available leave blank
Description Description of the sample—what is it?
Taxon_ID NCBI Taxonomy BrowserExample: 410658 (Soil Metagenome)
Title A common name you also use for the sample
Altitude The distance between surface and point being sampled in meters (for soil this is usually 0 (different from elevation)
Assigned_from_geo Enter “y” or “n.”  Usually y (yes), (data assigned from a geographic source)
Collection_date Examples: 5/23/11, or 5/23/11 10:22:14 or 5/23/2011 or 20:25:12.  Date range may also be specified: 2007-2008, or02/2011-04/2011.
Country Choose from GAZ Ontology, example: GAZ: Scotland.
Depth Distance from the surface in meters
Elevation The distance of the site from sea level in meters
Env_Biome; Env_Feature; Env_Matter Envo Ontology Browser Biome: Description of the site; Feature: Description of feature in the biome where sample was obtained; Matter: Description of material
Extracted_dna_available_now Enter “y” or “n”
Latitude Geographical origin of sample in decimal degrees (WGS84  system), not DMSConversion Tool
Longitude Geographical origin of sample in decimal degrees (WGS84  system), not DMSConversion Tool
Physical_sample_available_now Enter “y” or  “n”
Public Do you want your metadata to be accessible to the public? Enter “y” or “n.

Other field options have also been defined and can be selected based on the information collected regarding your samples:

agrochem_addition (addition of fertilizers) al_sat (aluminum saturation), al_sat_meth (method for determining al_sat), annual_season_precpt (annual seasonal precipitation in mm), annual_seasonal_temp(mean annual and seasonal temp oC), cur_land_use (current land use), cur_vegetation (vegetation classification), cur_vegetation_method (reference or method used to determine vegetation classification), drainage_class (drainage classification from a standard system like USDA), extreme_event (unusual physical events that may have affected microbial populations), extreme_salinity (measured salinity), salinity_method reference or method used to determine salinity), fao_class (soil classification from the FAO world reference database for soil resources), fire (historical and/or physical evidence of fire), flooding (historical and/or physical evidence of flooding), heavy_metals (concentration and type), heavy_metal_meth (reference or method used for determining concentrations of heavy metals), horizon_method (reference or method used to determine the horizon), microbial_biomass (organic matter in soil from living organisms smaller than 5-10mm, in mgC or mgN/kg soil), ph (pH), ph_method (reference or method used in determining pH), soil_type (soil series name), soil_type_method (reference or method used to determine soil type), texture (relative proportion of different grain sizes in a soil expressed as pervcentage sand(50mm-2mm), silt (2-50mm) and clay (<2mm) with textural name optional), texture_meth (reference or method used to determine texture), tot_nitro (total amount of nitrogenous substance), tot_n_meth (reference or method used to determine the total N), tot_org_carb (total organic carbon content), tot_org_c_meth (reference or method used to determine total C), water_content_soil (water content in g/g or cm3/cm3), water_content_soil_method (reference or method used to determine water content).

You may also add fields that are not listed here based on other parameters you have measured. Simply add these columns to your sheet, making sure that the “column_name” you select follows the appropriate format.