In space, finding the facts we don’t know we know

By Moriba Jah|January 2021

Back in 2002, then-Defense Secretary Donald Rumsfeld famously waxed bureaucratic at a Defense Department briefing: “As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns. The ones we don’t know we don’t know.” Well, beyond a funny sound bite, Rumsfeld missed a major category: the Unknown Knowns, which in his phraseology would be “things we do not know that we know.”

In order to know something, you have to measure it. So, interpret the first word in the pair as what you are aware of and the last word in the pair as what you have measured. Therefore, a known known is something you are aware of that’s been measured. A known unknown is something you are aware of that has not been measured. The unknown unknown is something that can’t be known by definition because you are unaware of it and you’ve not measured it. Again, anything that is not measured cannot be known. This leaves us with the unknown knowns, which are things we’ve measured but just don’t know it. Unveiling these hidden knowns amounts to the holy grail of big data science and analytics. Finding them requires fusing data from multiple sources to create and exploit what data scientists call mutual information, meaning knowledge that can be divined only by combining information housed in discrete data sets, thus bringing to our awareness things that we may have unknowingly measured. View this as mapping from the unknown knowns to the known knowns.

Let’s take a brief step back and underscore the fact that data exists everywhere in the universe. For example, we’re in an environment saturated by signals, radio and such. Just because we are not aware of them doesn’t mean they’re not there. We don’t care about all data. There are specific things we wish to know, and the thing that determines whether or not the data in our environment is relevant to that is the question we ask of it. Once we pose a question, we can quantify the information content in said data related to the thing we wish to know. It may indeed be zero.

This is where big data has a role. Let’s assume that the information content is zero for what we wish to know in any specific source of data. However, by creating a big data problem, aggregating massive quantities of disparate sources of data, we can create an opportunity for ourselves to discover something that is only measured in the mutual information of this multiple-source data set. For example, I may have lots of data about solar flux activity, a separate set of data on satellite locations in multiple orbital regions, a separate set of data on hardware that some of these satellites may be equipped with, and finally a separate set of data on satellite failures or anomalies. By aggregating and curating this multisource data set, my question might be, “is there a causal relationship between space environment phenomena and satellite hardware loss, disruption or degradation?” No single source of these data can answer this question because the answer is only contained in the mutual information content of this multisource data set. Linking these disparate data sets transforms an unknown unknown to an unknown known. Asking a relevant question of this mutual information found in the multisource data set enables me to transform this unknown known to a known known.

In order to realistically create this mutual information landscape that is exploitable, I need to perform data engineering, modeling and curation. In essence, I need to develop and maintain a digital library along with a data dictionary that describes these data, defines their meaning, orients them in their proper scales and frames of reference, and makes this semantically and even scientifically consistent to be meaningfully queried. A user should be empowered to query this aggregated data and receive knowledge as a consequence. The goal must be successful decision intelligence, which is the ability to understand, use and manage information in such a way that leads to desired outcomes.

We don’t have that capability, at least in the U.S. space community, because of a misperception. Most people confuse having lots of data to curate and manage with having a “big data” problem, which is the challenge of fusing lots of data from disparate sources.

Once a big data process is established for the space domain, satellite operators and legislators would have the knowledge required to satisfy the plethora of space safety, security and sustainability needs and demands for space activities. Unknown unknowns will be turned into unknown knowns through data aggregation and fusion, and then into known knowns. Until then, I shall continue to be a decision intelligence evangelist.

About Moriba Jah

Moriba Jah is an astrodynamicist, space environmentalist and associate professor of aerospace engineering and engineering mechanics at the University of Texas at Austin. An AIAA fellow and MacArthur fellow, he’s also chief scientist of startup Privateer Space.