UNCC
UNCC-validate_rkp_bb_rkp_bb.xlsx
Input |
Output |
Note |
Brad's comments |
Bob's comments on Brad's comments |
Brad's final comments |
Fixed? |
row_num |
|
Might be good to save this |
If this is a database ID, then I suggest Aaron save it under datasetRecordID |
ok |
|
|
|
datasource |
All records currently = UNCC; probably should be where you got the data = NCU |
Bob: Index Herbariorum does not recognize UNC, only UNCC. These are UNCC collections, so I suggest we use UNCC as both the datasource (an internal BIEN field) and institutionCode (a DwC field). |
I suppose UNCC is less confusing than NCU, but technically I got the data from NCU which got it from UNCC. |
Prefer to use UNCC |
FIXED |
accession |
Catalog Number |
OK, I think. Need to chek with Brad regardss our final decisions on this |
This looks good to me. If accessionNumber is provided by the data provider, is unique and used for every observation, it should be used at the catalogNumber |
ok |
|
|
herbarium |
institutionCode |
Wrong. Herbarium is where UNCC got the collection. For institution code you want UNCC for all records |
I believe Bob is right.` institutionCode` refers to the institution where the specimen is stored, which in this case is UNCC for all specimens. I believe `herbarium` is populated with a value other than UNCC when it was received as a duplicate from another herbarium. I don't think there is any point in storing the other values as there is not guarantee that another duplicate is stored at that institution. They may simply be fowarding a specimen they don't want; we have no way of knowing. So yes, make all values "UNCC" |
omit herbarium |
OK |
FIXED: We don't currently have separate fields for this in VegBIEN or the analytical DB, so I just mapped it to a placeholder VegCore name. |
|
occurrenceID |
?? What is this? |
This should be populated for all sources. Aaron, please use my recommended formula is in the spreadsheet "vegbien_identifiers", based on the DwC specification. |
|
|
|
family |
family_verbatim; AND ScientificName_verbatim (part 1 of 3) |
OK |
OK |
|
|
|
genus |
OMITTED |
OK |
|
|
|
|
species |
OMITTED |
OK |
|
|
|
|
usdarank |
OMITTED |
OK |
|
|
|
|
infrarank |
OMITTED |
OK |
|
|
|
|
SciName |
ScientificName_verbatim (part 2 of 3) |
ScientificName_verbatim = family + SciName + authors; Note sure why family was concatenated with SciName, and then why were the authors concatenated and these listed a second time as authors. The true verbatim scientificName is to my thinking just SciName |
I think Aaron's formulation is correct, except for one record ("ZANNICHELLIACEAE (KEUTZ.) CORRELL" should not include the authority). Bob, you weren't part of the entire discussion, but the problem is that DwC defines scientificName as the taxon name PLUS authority. I don't much like the DwC definition either, but we decided to follow DwC. Because we didn't want to keep the authority separate from the name, we created a new field, taxonName_verbatim, which contain the lowest-level taxon name applied to the observation, MINUS the authority. That is why family sometimes appears in this column. If the specimen is only identified to order, the field will contain an ordinal name. Etc. Make sense? |
Makes no sense to me to always concatenate family + genus + species to render ScientificName_verbatim as a trinomial. Aaron populated this as family + SciName + authors, whereas it should be SciName + authors except for those cases where there is no SciName and there is a family name. |
Bob, I think you may be looking at an earlier incorrect version of the validation extract. Originally, Aaron was dumping to this field the value we send to the TNRS, which includes family pre-pended. I have since asked him to place family (or a higher taxon) in this field only if no determination below the rank of genus is provided. So you should never see family + genus + species. In version I am looking at, the name is always a correctly formed taxon name according to the ICBN. Aaron: your current version is correct |
This happens when there is no input taxonName, causing the ranks and author to be concatenated together |
authors |
scientificNameAuthorship_verbatim; AND ScientificName_verbatim (part 3 of 3) |
Note sure why this is saved in two places |
I only see it in one place, scientificNameAuthorship_verbatim. We also display scientificNameAuthorship_matched, but that is different: it's what the TNRS found after matching the name to Tropicos. |
Perhaps you answered this above, but the field 'authors' in the input is copied into TWO places in the output, first as the final third of ScientificName_verbatim and then as a separate scientificNameAuthorship_verbatum |
OK, this must definitely be a misunderstanding due to you using at an older version of the validation. Sorry about that.The current validation only contains only taxonName_verbatim (without authors) and scientificNameAuthorship_verbatum (the authors only). I would have preferred taxonNameAuthorship_verbatim, but as there was already a DwC name for the latter entity, we had to use it. Basically, DwC's ill-thought-out naming convention has screwed us up, but there's nothing we can do but use a new name if DwC has no unambiguous name for the taxon name without author. Aaron: your current version is correct. |
|
collector |
recordedBy (part 1 of 4) |
OK |
|
|
|
|
collector1 |
recordedBy (part 2 of 4) |
OK |
|
|
|
|
collector2 |
recordedBy (part 3 of 4) |
OK |
|
|
|
|
collector3 |
recordedBy (part 4 of 4) |
OK |
|
|
|
|
collectno |
recordNumber |
OK |
|
|
|
|
collmonth |
dateCollected (part 2 of 3) |
OK |
|
|
|
|
collday |
dateCollected (part 3 of 3) |
OK |
|
|
|
|
collyear |
dateCollected (part 1 of 3) |
OK |
|
|
|
|
country |
country |
OK, but should probably convert into a standard character string |
Bob: Aaron's presentation is correct for now. We haven't yet implemented geovalidation for BIEN3. When we do, he will return both the verbatim and scrubbed political division names. |
ok |
|
|
state |
stateProvince |
OK, but should probably convert into a standard character string |
Bob: Aaron's presentation is correct for now. We haven't yet implemented geovalidation for BIEN3. When we do, he will return both the verbatim and scrubbed political division names. |
ok |
|
|
County |
county |
OK |
|
|
|
|
Campus |
|
Omit |
Bob, what does this column mean? If it has anything to do with plants being collected on a campus (and therefore cultivated) we should definitely use it. |
Yes, this refers to occurrence on the UNCC campus, but I do not know the meaning of the codes. Presumably C = cultivated |
OK, this is important information in that case. Here's what I suggest. Aaron, if the column contains a non-null value, please put text "cultivated, collected on campus" in the column `cultivated_verbatim`. |
FIXED |
leaves |
|
Omit |
Agree |
ok |
|
|
flower |
reproductiveCOndition |
Just a code that will be meaningless; if we were to keep this we would also want fruit. I suggest we omit. |
Aaron, I suggest you get a formula from Bob for how to translate their codes for columns `flower` and `fruit` into the plain English terms "flower", "fruit", and "fertile" (the latter in the case of ferns, gymnosperms). This is definitely important information which we should not discard. And yes, I agree that we do not need to keep the information in the columns `leaves` and `root`. |
Flower = AFIJMN, Fruit = 2AHIMOQS, definitions unknown |
Bob: should we interpret flowers (or buds) present if flower is not null? And should we interpret fruits present if "fruit" is not null? I would like to capture this phenology information if at all possible. |
"I suggest we omit": FIXED |
fruit |
|
Omit |
No. See above. |
ok |
|
|
root |
|
Omit |
Agree. |
|
|
|
locality |
locality (in part) |
OK? |
If we could figure out a way to separate the specimen description, I would prefer to append everything from all the `comment` columns pertaining to the locality description in this field. But can't think of a way to do that. Bob, suggestions? |
Too inconsistent to for a simple answer |
|
|
habitat |
locality (in part) |
OK??? I might place habitat as yet another comment and not part of locality |
We have in general been appending the habitat description to the verbatim locality description. We do this because of the inconsistent practices of herbaria. Some keep them in separate fields, and some (like UNCC and ARIZ) keep them in separate fields. |
|
|
|
comment1 |
|
Missing -- need to preserve; not sure variable name |
I agree that this information is important, but the four denormalized columns unpredictable mix locality description with habitat description with the specimen description. It is crucially important for both trait data mining and for detecting cultivated specimens, that the specimen description be preserved. But I'm at a loss as to what Aaron should do here. Bob, can you suggest an algorithm whereby Aaron could separate the locality description from the specimen description? |
Hopelessly inconsistent. I would just place all of habitat and comments1-4 in a general notes field |
OK Aaron, dump everything in these fields to another field, and I will modify my script to check that field for the word cultivated but nothing else. Is there another DwC field we can use other than occurrenceRemarks (we need to keep locality descriptions out of that field). |
These are concatenated in occurrenceRemarks |
comment2 |
|
Missing -- need to preserve; not sure variable name |
|
|
|
|
comment3 |
|
Missing -- need to preserve; not sure variable name |
|
|
|
|
comment4 |
|
Missing -- need to preserve; not sure variable name |
|
|
|
|
loanto |
OMITTED |
OK |
|
|
|
|
inorout |
OMITTED |
OK |
|
|
|
|
sheetno |
OMITTED |
OK |
|
|
|
|
cultivated |
cultivated_bien |
Translation seems inconsistent. Where do the ocassional seros come from? |
Aaron, please see my validation for source "TEX". I suggested we add the column `cultivatedVerbatim` to vegCore and vegBIEN, and display this column in all specimen validation views. As with TEX, the contents of UNCC's column `cultivated` should be dumped to `cultivatedVerbatim`.This means that for the validation extract you display, cultivatedVerbatim would be blank for all records except Accession Numbers 39691 and 19561. It wold be helpful to do some minimal translation. Bob, do the meanings of the codes "X" and "Y" differ in your field `CULTIVATED`? `cultivated_bien` is the results of post-import validation by BIEN; it is the results of combining parsing of the specimen description as well as using any cultivated flags supplied by the data provider, if any. Aaron, could you remind me why some values are blank (NULL) whereas others are zero? And why isn't Accession number 19561 flagged with a 1? |
|
Aaron, if this column contains a non-null value, please put text "cultivated" in column `cultivated_verbatim`, unless you have already populated `cultivated_verbatim` due to non-null value in column `Campus`. |
Which of the cultivated values ACKMNSWXYy?] correspond to true? |
filler |
OMITTED |
OK |
|
|
|
|
|
decimalLatitude |
NA |
|
|
|
|
|
decimalLongitude |
NA |
|
|
|
|
|
coordinateUncertaintyInMeters |
NA |
|
|
|
|
|
elevationInMeters |
NA |
|
|
|
|
|
identifiedBy |
NA |
|
|
|
|
|
dateIdentified |
NA |
|
|
|
|
|
identificationRemarks |
NA |
|
|
|
|
|
identificationRemarks |
Header error. This is familyName_matched. |
??? I don't understand Bob. Looks corrected to me. All values for this column are blank in the extract I am looking at. |
Looks ok now. Previously the header was showing as a duplicate of identificationRemarks, but perhaps it was just my excel misbehaving. |
|
FIXED |
|
taxonName_matched |
OK (should families be excluded here? I noticed a case where the cultivar names was dropped upon matching) |
Aaron's formulation is correct. taxonMatched contains the lowest taxon name matched, minus the authority. The TNRS doesn't process cultivar names, so they are dumped to `unmatchedTerms` (not displayed in specimen validation extracts) |
ok |
|
|
|
scientificNameAuthorship_matched |
OK |
|
|
|
|
|
higherPlantGroup_bien |
|
|
|
|
|
|
family |
(I noticed a case wherefamilyName_matched was blank and a genus name was captured from the family_verbatim; suggests an error) |
Yes, I see that too. Accession number 39691. Genus "ELEAGNUS" was incorrectly placed in field `FAMILY` in the original data. I'm pretty sure there's no way the TNRS would have returned "Eleagnus" as a family. Aaron, is this a parsing error? |
! |
|
Do we want to pass genuses in the family column to TNRS? They are currently filtered out because they are not in the valid families list. |
|
genus |
OK |
|
|
|
|
|
taxonName |
OK |
|
|
|
|
|
scientificNameAuthorship |
OK |
|
|
|
|
|
growthForm |
NA?? |
Aaron: omit this from specimen validations. |
|
|
FIXED |
|
threatened_bien |
source? |
Aaron: omit this from specimen validations.Sorry, I should have told you that earlier.Bob: not yet implemented, but we will us IUCN. |
|
|
FIXED |
|
cultivatedBasis_bien |
NA?? |
Bob: for specimens flagged as cultivated (isCultivatedBien=1) this will contain a standard bit of text describing the reason it was flagged as cultivated (e.g., "Key words found in specimens description", etc.) |
|
|
|