boulder onestop presentation

13
netCDF to ISO XML workflow Thomas Jaensch, Silver Spring

Upload: thomas-jaensch

Post on 16-Jan-2017

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Boulder OneStop presentation

netCDF to ISO XML workflowThomas Jaensch, Silver Spring

Page 2: Boulder OneStop presentation

Steps from netCDF to ISO XML1.ncdump -x to extract metadata from netCDF and output XML (NcML)

instead of CDL

2.Append additional data not included in NcML to the .ncml data to be included in ISO metadata later on, like file name, file size, path to data files on the network/WAF, browse graphic info, etc.

3.XSLT transform to write the data extracted from the .ncml file to an ISO XML file

4.Additional XSLT transform(s) to add additional information, like e.g. collection level keywords to granule metadata

Page 3: Boulder OneStop presentation

Bash scripts to create ISO XML from netCDFMain nc2iso script that performs the steps described in previous slide

Page 4: Boulder OneStop presentation

Bash scripts to create ISO XML from netCDFHelper script that runs main nc2iso script

Page 5: Boulder OneStop presentation

Not horrible but...- Sequential code with a tendency to become cryptic and messy in no

time

- Not easy to understand someone else’s code

- Not easy to scale and extend

- Hard to test

Page 6: Boulder OneStop presentation

Hello Go!

Page 7: Boulder OneStop presentation

Why Go?- It’s fast

- Scales well

- Automated code formatting

- Compiled binaries run anywhere

- Great standard library

- Testing framework baked into the language

- Concurrency baked into the language

- Well documented

- Go playground to test stuff in the browser while developing

- Makes it easy to write well documented, easy-to-reason-about code

Page 8: Boulder OneStop presentation

Code organization

- Code organized by loosely coupled functions that can be placed anywhere it makes sense since Go is a compiled language and the compiler doesn’t care about order (for the most part)

- Easy to extend functionality by adding/removing functions that do specific things without breaking the whole thing

Page 9: Boulder OneStop presentation

Concurrency- Run multiple processes at

the same time and cut down on overall runtime

- Scales (theoretically) indefinitely if there weren’t hardware limitations

Page 10: Boulder OneStop presentation

TestingPackage testing provides support for automated testing of Go packages.

Page 11: Boulder OneStop presentation

OneStop Datasets I’m working on in Silver SpringGranule metadata processes in place for the following datasets I’m working on

- World Ocean Atlas 2013 (about 700 file level granules), batch-processing time for all files about 5 minutes (after editing XSLT or adding additional info to NcML)

- C-MAN (NDBC Coastal-Marine Automated Network and moored weather buoys), about 10000 file level granules and counting, batch-processing time for all files about 10 minutes

- CO-OPS (Center for Operational Oceanographic Products and Services), about 10000 file level granules and counting, batch-processing time for all files about 10 minutes

- Quality-Controlled Underway Oceanographic and Meteorological Data from the Center for Ocean-Atmospheric Predictions Center (COAPS) - Shipboard Automated Meteorological and Oceanographic System (SAMOS), about 110000 file level granules, batch-processing time for all files about 10 hours

Page 12: Boulder OneStop presentation

XML Linkchecker cmd toolWHAT DOES IT DO?

Check all http:// and https:// links in XML files in directory the tool is run in, report back the server responses received from the checked links and log the failed responses (everything other than "200 OK") to a linkchecker_bad_links_log file in the current working directory

PREREQUISITES

The program uses Bash and cURL commands under the hood and will not work if Bash and/or cURL are not installed in the environment it's run in

HOW TO RUN IT?

- Drop the binary suitable for your system (Mac or Linux) into the folder with the XML files you want to check for broken links

- Open a shell (PuTTY, Mac terminal, whatever) and navigate to the folder where you dropped the linkchecker binary

- Run "chmod -R 777 linkchecker"

- Run "./linkchecker" and if it works you should be able to see what it's doing in your shell and end up with a linkchecker_bad_links_log file in the current working directory after it's done

Page 13: Boulder OneStop presentation

Etc.Tools I use at NOAA

Bash, Git, Go, Linux, Mac, Oxygen, PuTTY, Sublime Text, xsltproc, XML, XPath, XSLT, Windows

Goals

Further improve my workflows and programs, collaborate more and learn about other developers’ workflows

Wishes

Shared GitHub account for better code collaboration