This vignette is based on
readJDX
version 0.6.4.
This vignette describes a taxonomy of JCAMP-DX files from the point
of view of the readJDX
package. It’s based on reading and
re-reading of the cited references, as well as inspection of files found
in the wild and files shared with the author. No single publication
includes all the needed information as the standard has evolved over
time. This viewpoint may not be a perfect reflection of the actual
standards and may include errors. Nevertheless, this is how the author
looks at the structure of the files at this point in time, especially
aspects that determine how the files should be processed. A table of the
program flow is included as well.
At the most basic level, JCAMP-DX files are either simple or compound.
Compound files contain more than one spectrum or type of information
(McDonald and Wilks
1988, sec. 3.3.2). readJDX
does not support the
processing of compound files. However, a utility function,
splitMultiblockDX
, is provided that will split such files
into separate files which can be processed with
readJDX
.
Simple files contain the data for one experiment (McDonald and Wilks 1988, sec. 3.3.1). This experiment could be a single spectrum. However, it can also be a more complex experiment, in which case the NTUPLES tag is used. Examples of more complex experiments are NMR spectra in which both the real and imaginary points are reported, 2D NMR spectra, and LC-MS or GC-MS results.
A single spectrum could be any basic spectroscopic experiment: IR, processed NMR, UV-Vis, MS, Raman, ESR, CD etc.
This format assumes equal spacing along the x-axis. The notation
(X++(Y..Y))
means that each line begins with an x-value and
is followed by as many y-values as will fit on the line (McDonald and Wilks 1988,
sec. 5.1.1 & 6.4.1). There are a number of checks built
into this format.
This format assumes arbitrary spacing along the x-axis, but the data
is essentially continuous in nature (McDonald and Wilks 1988, sec.
6.4.2). This variable list is handled the same as the next
one, as they both come down to processing series of x,y
points.
This format assumes arbitrary spacing along the x-axis, and implies
some curation of the data has occurred, e.g. only counting peaks above a
threshold (McDonald
and Wilks 1988, sec. 6.4.3). Data is fundamentally
discontinuous. This variable list is handled the same as the previous
one, as they both come down to processing series of x,y
points.
A processed 1D NMR spectrum composed of the real data points would be reported in one of the formats for simple files. NTUPLES is used when both the real and imaginary data are reported (Davies and Lampen 1993, sec. 7).
The author is unaware of official documentation of this format, though unofficial documents can be found on the web. It is analogous to the spectral series used for LC-MS data, among others
##NTUPLES= nD NMR SPECTRUM
.
.
.
##PAGE= F1= x
##FIRST= x, y, z
##DATA TABLE= (F2++(Y..Y)), PROFILE
[followed by x, y1, y2, y3... data in AFFN or ASDF]
.
.
.
##PAGE= F1= x
##FIRST= x, y, z
##DATA TABLE= (F2++(Y..Y)), PROFILE
[followed by x, y1, y2, y3... data in AFFN or ASDF]
.
.
.
##PAGE= F1= x
##FIRST= x, y, z
##DATA TABLE= (F2++(Y..Y)), PROFILE
[followed by x, y1, y2, y3... data in AFFN or ASDF]
##END NTUPLES= nD NMR SPECTRUM
##END=
In this case the NTUPLES tag is used to report a page for each time
format, with the spectral data in (XY..XY)
format (referred to as a “spectral series” in Lampen et al. 1994, sec. 5.3.3).
Processing these files requires dealing with the paging structure, but
each page can be processed like an (XY..XY)
data set.
##NTUPLES= MASS SPECTRUM
.
.
.
##PAGE= T=1
##NPOINTS= x
##DATA TABLE= (XY..XY), PEAKS
[followed by x, y data in AFFN]
.
.
.
##PAGE= T=2
##NPOINTS= x
##DATA TABLE= (XY..XY), PEAKS
[followed by x, y data in AFFN]
.
.
.
##PAGE= T=n
##NPOINTS= x
##DATA TABLE= (XY..XY), PEAKS
[followed by x, y data in AFFN]
##END NTUPLES= MASS SPECTRUM
##END=
readJDX
is coded in such a way that it should be easy to
add features. Contributions to improve or expand the package, including
pull requests, are always welcome! Figure @ref(fig:PF) shows the overall
flow of the function calls. Only a couple of these functions are
exported, so take a look at the source code for documentation. Be sure
to check out the MiniDIFDUP_1 and MiniDIFDUP_2
vignettes for additional information about the JCAMP-DX file structure
and how readJDX
functions extract the data.
Professor Emeritus of Chemistry & Biochemistry, DePauw University, Greencastle IN USA., [email protected]↩︎