Sean C. Crosby

Sampling for Maximum Dissimilarity

Creating a subsample of data that covers your N-dimensional space

Sampling Sampling data can be done in many ways depending on what is desired. Most often a random uniform sampling is used to collect a small subset for preliminary analysis. This can reduce computation and provide rapid insight. There are, however, other reasons to sample data, such as create a representative sample set that covers the data range. A particular example arises in earth sciences when the goal is to model weather or ocean conditions across the range of possible forcing.

Making wind forecast corrections with ML

Using multi-layer perceptrons to create non-linear forecast corrections

The idea Recent skill improvements in meteorological and ocean forecast have been by incorporating observations into model predictions. Numerical model development, by means of improved physical representations, is in many cases suffering from diminishing returns. While ensemble approaches (many) models are useful and provide uncertainty estimates, the assimilation of ever increasing data into models is needed. The folks over at Google’s Deep Mind more recently showed that deep neural networks could be used to accurately predict short-term (up to 90-minutes) precipitation events using prior radar observations (Nature article).

Can we visualize decision space for different classifiers?

Visualizing decision space from simple to complex classifiers

Support Vector Machines (SVM) are seemingly derived from a intuitive concept, drawing a decision boundary with the widest margin (aka gutter, street, etc.). While this only really applies to the linear problem in which the data are indeed separable, I find it particularly helpful in visualizing the decision space of the model. Unlike a random forest or multi-layer neural network, it is easy to picture model space. While not a novel idea in the slightest, this provoked me to consider decision space for several classification algorithms to hopefully gain insight into other techniques, and with this insight select appropriate methods for future questions.

Dominant wind field patterns from complex empirical orthogonal functions

Using complex EOFs (aka PCA) to find common wind patterns

Today I was wondering if I can make some sense of a rather large 60-year hindcast of winds in Washington around the Salish Sea. Could I see what the typical wind patterns in the region are? How can I manage this with a rather hefty 1+GB set of spatial wind predictions? This is a good opportunity to explore empirical orthogonal functions (EOFs), also known as principle component analysis (PCA).