The increase in data from sensors and edge devices has exposed existing trust frameworks as obsolete. Our research focuses on open, low level protocol primitives that can support trust operations at scale. Our aim is catalyze development of domain specific applications that can guarantee the integrity of observations at the physical-digital border.

Data Fraud and Fumbling

Scientific progress rests upon trust in observations. But traditional methods of maintaining trust through institutional supervision are failing at scale. Researchers face increasing challenges from both data fumbling and data fraud in a context of extremely complex data pipelines and untraceable AI operations.

  • Non-conforming or untrustworthy timestamps interfere with alignment.
  • Human error cannot be traced back to its original source.
  • Sophisticated AI-assisted data fraud causes epistemic pollution.

Upstream/ Downstream

Where data is very valuable — for instance, in clinical research trials for new drugs, or in managing medical images — human experts define data trust workflows and manage recovery and forensics. But in many domains, including data from IoT sensors, data assurance processes are complex, confusing, and difficult to update and implement. Everybody does their own thing, to the extent they can afford it.

Our research asks the question: What is the minimal amount of additional data that can be added to a record at the moment of its collection that will securely tie it to the time the observation was made and the instrument that recorded it? If we define a protocol that does this work at minimum computational cost, it will then be easier to automate trust assurance higher in the stack.

Basic Questions

A protocol that could do the job we’ve defined for it would have to answer the following questions:

  • What instrument made the observations?
  • What time series do they belong to?
  • Who controls access to the instrument and time series?
  • When were the observations made?
  • Is the time series complete and uncorrupted?

Cryptographic commitments

Our approach uses an interlocking set of cryptographic commitments.

  • Temporal anchoring
  • Instrument identity
  • Time series sequentiality
  • Immutability
  • Custody

We take advantage of decades of work that have make basic cryptographic techniques a standard part of the software development toolkit. The hard part has been developing a representation of the observational process that is both sufficiently formal to support automation and complete enough to be useful across many domains of research practice.

Temporal Density

To give you a sense of this difficulty: in our exploration of research workflows we’ve seen that real world “temporal density” of records ranges across sixteen orders of magnitude. That is, from every few years in longitudinal health study to something like 100,000,000 observations per second at the Large Hadron Collider. We’ve enjoyed taking on the challenge of designing a system that is plausible across the full range. The actual limits on performance is something we are currently investigating.

Theory of Change

Our theory of change is simple: if the protocol works, it will prove its worth quickly by catching errors in ordinary research life. We hope to have something early collaborators can test soon. If you’d like to try it, please get in touch.