/
UNCC

UNCC

UNCC-validate_rkp_bb_rkp_bb.xlsx

Input

Output

Note

Brad's comments

Bob's comments on Brad's comments

Brad's final comments

Fixed?

row_num

 

Might be good to save this

If this is a database ID, then I suggest Aaron save it under datasetRecordID

ok

 

 

 

datasource

All records currently = UNCC; probably should be where you got the data = NCU

Bob: Index Herbariorum does not recognize UNC, only UNCC. These are UNCC collections, so I suggest we use UNCC as both the datasource (an internal BIEN field) and institutionCode (a DwC field).

I suppose UNCC is less confusing than NCU, but technically I got the data from NCU which got it from UNCC.

Prefer to use UNCC

FIXED

accession

Catalog Number

OK, I think. Need to chek with Brad regardss our final decisions on this

This looks good to me. If accessionNumber is provided by the data provider, is unique and used for every observation, it should be used at the catalogNumber

ok

 

 

herbarium

institutionCode

Wrong. Herbarium is where UNCC got the collection. For institution code you want UNCC for all records

I believe Bob is right.` institutionCode` refers to the institution where the specimen is stored, which in this case is UNCC for all specimens. I believe `herbarium` is populated with a value other than UNCC when it was received as a duplicate from another herbarium. I don't think there is any point in storing the other values as there is not guarantee that another duplicate is stored at that institution. They may simply be fowarding a specimen they don't want; we have no way of knowing. So yes, make all values "UNCC"

omit herbarium

OK

FIXED: We don't currently have separate fields for this in VegBIEN or the analytical DB, so I just mapped it to a placeholder VegCore name.

 

occurrenceID

?? What is this?

This should be populated for all sources. Aaron, please use my recommended formula is in the spreadsheet "vegbien_identifiers", based on the DwC specification.

 

 

 

family

family_verbatim; AND ScientificName_verbatim (part 1 of 3)

OK

OK

 

 

 

genus

OMITTED

OK

 

 

 

 

species

OMITTED

OK

 

 

 

 

usdarank

OMITTED

OK

 

 

 

 

infrarank

OMITTED

OK

 

 

 

 

SciName

ScientificName_verbatim (part 2 of 3)

ScientificName_verbatim = family + SciName + authors; Note sure why family was concatenated with SciName, and then why were the authors concatenated and these listed a second time as authors. The true verbatim scientificName is to my thinking just SciName

I think Aaron's formulation is correct, except for one record ("ZANNICHELLIACEAE (KEUTZ.) CORRELL" should not include the authority). Bob, you weren't part of the entire discussion, but the problem is that DwC defines scientificName as the taxon name PLUS authority. I don't much like the DwC definition either, but we decided to follow DwC. Because we didn't want to keep the authority separate from the name, we created a new field, taxonName_verbatim, which contain the lowest-level taxon name applied to the observation, MINUS the authority. That is why family sometimes appears in this column. If the specimen is only identified to order, the field will contain an ordinal name. Etc. Make sense?

Makes no sense to me to always concatenate family + genus + species to render ScientificName_verbatim as a trinomial. Aaron populated this as family + SciName + authors, whereas it should be SciName + authors except for those cases where there is no SciName and there is a family name.

Bob, I think you may be looking at an earlier incorrect version of the validation extract. Originally, Aaron was dumping to this field the value we send to the TNRS, which includes family pre-pended. I have since asked him to place family (or a higher taxon) in this field only if no determination below the rank of genus is provided. So you should never see family + genus + species. In version I am looking at, the name is always a correctly formed taxon name according to the ICBN. Aaron: your current version is correct

This happens when there is no input taxonName, causing the ranks and author to be concatenated together
.
"should not include the authority": Why shouldn't we send the family's authority to TNRS for parsing?

authors

scientificNameAuthorship_verbatim; AND ScientificName_verbatim (part 3 of 3)

Note sure why this is saved in two places

I only see it in one place, scientificNameAuthorship_verbatim. We also display scientificNameAuthorship_matched, but that is different: it's what the TNRS found after matching the name to Tropicos.

Perhaps you answered this above, but the field 'authors' in the input is copied into TWO places in the output, first as the final third of ScientificName_verbatim and then as a separate scientificNameAuthorship_verbatum

OK, this must definitely be a misunderstanding due to you using at an older version of the validation. Sorry about that.The current validation only contains only taxonName_verbatim (without authors) and scientificNameAuthorship_verbatum (the authors only). I would have preferred taxonNameAuthorship_verbatim, but as there was already a DwC name for the latter entity, we had to use it. Basically, DwC's ill-thought-out naming convention has screwed us up, but there's nothing we can do but use a new name if DwC has no unambiguous name for the taxon name without author. Aaron: your current version is correct.

collector

recordedBy (part 1 of 4)

OK

 

 

 

 

collector1

recordedBy (part 2 of 4)

OK

 

 

 

 

collector2

recordedBy (part 3 of 4)

OK

 

 

 

 

collector3

recordedBy (part 4 of 4)

OK

 

 

 

 

collectno

recordNumber

OK

 

 

 

 

collmonth

dateCollected (part 2 of 3)

OK

 

 

 

 

collday

dateCollected (part 3 of 3)

OK

 

 

 

 

collyear

dateCollected (part 1 of 3)

OK

 

 

 

 

country

country

OK, but should probably convert into a standard character string

Bob: Aaron's presentation is correct for now. We haven't yet implemented geovalidation for BIEN3. When we do, he will return both the verbatim and scrubbed political division names.

ok

 

 

state

stateProvince

OK, but should probably convert into a standard character string

Bob: Aaron's presentation is correct for now. We haven't yet implemented geovalidation for BIEN3. When we do, he will return both the verbatim and scrubbed political division names.

ok

 

 

County

county

OK

 

 

 

 

Campus

 

Omit

Bob, what does this column mean? If it has anything to do with plants being collected on a campus (and therefore cultivated) we should definitely use it.

Yes, this refers to occurrence on the UNCC campus, but I do not know the meaning of the codes. Presumably C = cultivated

OK, this is important information in that case. Here's what I suggest. Aaron, if the column contains a non-null value, please put text "cultivated, collected on campus" in the column `cultivated_verbatim`.

FIXED

leaves

 

Omit

Agree

ok

 

 

flower

reproductiveCOndition

Just a code that will be meaningless; if we were to keep this we would also want fruit. I suggest we omit.

Aaron, I suggest you get a formula from Bob for how to translate their codes for columns `flower` and `fruit` into the plain English terms "flower", "fruit", and "fertile" (the latter in the case of ferns, gymnosperms). This is definitely important information which we should not discard. And yes, I agree that we do not need to keep the information in the columns `leaves` and `root`.

Flower = AFIJMN, Fruit = 2AHIMOQS, definitions unknown

Bob: should we interpret flowers (or buds) present if flower is not null? And should we interpret fruits present if "fruit" is not null? I would like to capture this phenology information if at all possible.

"I suggest we omit": FIXED

fruit

 

Omit

No. See above.

ok

 

root

 

Omit

Agree.

 

 

 

locality

locality (in part)

OK?

If we could figure out a way to separate the specimen description, I would prefer to append everything from all the `comment` columns pertaining to the locality description in this field. But can't think of a way to do that. Bob, suggestions?

Too inconsistent to for a simple answer

 

 

habitat

locality (in part)

OK??? I might place habitat as yet another comment and not part of locality

We have in general been appending the habitat description to the verbatim locality description. We do this because of the inconsistent practices of herbaria. Some keep them in separate fields, and some (like UNCC and ARIZ) keep them in separate fields.

 

 

 

comment1

 

Missing -- need to preserve; not sure variable name

I agree that this information is important, but the four denormalized columns unpredictable mix locality description with habitat description with the specimen description. It is crucially important for both trait data mining and for detecting cultivated specimens, that the specimen description be preserved. But I'm at a loss as to what Aaron should do here. Bob, can you suggest an algorithm whereby Aaron could separate the locality description from the specimen description?

Hopelessly inconsistent. I would just place all of habitat and comments1-4 in a general notes field

OK Aaron, dump everything in these fields to another field, and I will modify my script to check that field for the word cultivated but nothing else. Is there another DwC field we can use other than occurrenceRemarks (we need to keep locality descriptions out of that field).

These are concatenated in occurrenceRemarks

comment2

 

Missing -- need to preserve; not sure variable name

 

 

 

comment3

 

Missing -- need to preserve; not sure variable name

 

 

 

comment4

 

Missing -- need to preserve; not sure variable name

 

 

 

loanto

OMITTED

OK

 

 

 

 

inorout

OMITTED

OK

 

 

 

 

sheetno

OMITTED

OK

 

 

 

 

cultivated

cultivated_bien

Translation seems inconsistent. Where do the ocassional seros come from?

Aaron, please see my validation for source "TEX". I suggested we add the column `cultivatedVerbatim` to vegCore and vegBIEN, and display this column in all specimen validation views. As with TEX, the contents of UNCC's column `cultivated` should be dumped to `cultivatedVerbatim`.This means that for the validation extract you display, cultivatedVerbatim would be blank for all records except Accession Numbers 39691 and 19561. It wold be helpful to do some minimal translation. Bob, do the meanings of the codes "X" and "Y" differ in your field `CULTIVATED`? `cultivated_bien` is the results of post-import validation by BIEN; it is the results of combining parsing of the specimen description as well as using any cultivated flags supplied by the data provider, if any. Aaron, could you remind me why some values are blank (NULL) whereas others are zero? And why isn't Accession number 19561 flagged with a 1?

 

Aaron, if this column contains a non-null value, please put text "cultivated" in column `cultivated_verbatim`, unless you have already populated `cultivated_verbatim` due to non-null value in column `Campus`.

Which of the cultivated values ACKMNSWXYy?] correspond to true?
.
"Where do the ocassional seros come from?": FIXED: The zeros were being populated as the result of locality parsing, and should have been NULL

filler

OMITTED

OK

 

 

 

 

 

decimalLatitude

NA

 

 

 

 

 

decimalLongitude

NA

 

 

 

 

 

coordinateUncertaintyInMeters

NA

 

 

 

 

 

elevationInMeters

NA

 

 

 

 

 

identifiedBy

NA

 

 

 

 

 

dateIdentified

NA

 

 

 

 

 

identificationRemarks

NA

 

 

 

 

 

identificationRemarks

Header error. This is familyName_matched.

??? I don't understand Bob. Looks corrected to me. All values for this column are blank in the extract I am looking at.

Looks ok now. Previously the header was showing as a duplicate of identificationRemarks, but perhaps it was just my excel misbehaving.

 

FIXED

 

taxonName_matched

OK (should families be excluded here? I noticed a case where the cultivar names was dropped upon matching)

Aaron's formulation is correct. taxonMatched contains the lowest taxon name matched, minus the authority. The TNRS doesn't process cultivar names, so they are dumped to `unmatchedTerms` (not displayed in specimen validation extracts)

ok

 

 

 

scientificNameAuthorship_matched

OK

 

 

 

 

 

higherPlantGroup_bien

 

 

 

 

 

 

family

(I noticed a case wherefamilyName_matched was blank and a genus name was captured from the family_verbatim; suggests an error)

Yes, I see that too. Accession number 39691. Genus "ELEAGNUS" was incorrectly placed in field `FAMILY` in the original data. I'm pretty sure there's no way the TNRS would have returned "Eleagnus" as a family. Aaron, is this a parsing error?

!

 

Do we want to pass genuses in the family column to TNRS? They are currently filtered out because they are not in the valid families list.

 

genus

OK

 

 

 

 

 

taxonName

OK

 

 

 

 

 

scientificNameAuthorship

OK

 

 

 

 

 

growthForm

NA??

Aaron: omit this from specimen validations.

 

 

FIXED

 

threatened_bien

source?

Aaron: omit this from specimen validations.Sorry, I should have told you that earlier.Bob: not yet implemented, but we will us IUCN.

 

 

FIXED

 

cultivatedBasis_bien

NA??

Bob: for specimens flagged as cultivated (isCultivatedBien=1) this will contain a standard bit of text describing the reason it was flagged as cultivated (e.g., "Key words found in specimens description", etc.)