We’ve been involved in developing language models from the ground up in addition to the scientific data engineering pipelines including data visualization, interpretation and human-in-the-loop curation to support the advancement of research in the context of space biosciences for over a couple decades with our collaborators. It is our understanding that language models represent a large part of the tip-of-the-spear in today’s AI and ML efforts and reside at the heart of many innovations and breakthroughs in computational biology today. Our ultimate goal is to truly quicken the pace in generating new hypotheses and novel discoveries.

Specifically, we aim to initially advance space biosciences research related to the areas of ‘dynamic reciprocity’, the ECM and blood microenvironment related to the brain ECM, exosomic cargo analysis, dormant tumor cells (DTCs), atrogenes, cardiolipin (CL) and genes/proteins related to human aging including OPA1 and NME4 along with the NDPK family of proteins including NDPK-A which is involved in Non-Homologous End Joining (NHEJ), a primary pathway for the repair of DNA double-strand breaks (DSBs) which can be caused by Galactic Cosmic Rays (GCRs) and heavy ion, high-charge, high-energy particles (HZEs).

Additional efforts in language modeling in the context of space biosciences include developing models to match protein and RNA sequences for similarity with new or foreign sequences resulting in advanced relationship networks between sequences.

Ultimately, we see innovations and discoveries in space biosciences result in products and services for all industries, more importantly, personalized medicine for all humankind.

Immutable On-chain Security

Datasets represent a large attack surface in overall data security associated to AI and ML pipelines. Knowing the origin of data sources, calculations, algorithms, timestamps along with any other changes remain critical for today’s scientific data engineering pipelines.

Data Provenance, Governance, Lineage

Data provenance, governance and lineage are supported by Ethereum/ERC20 blockchain technology and are indispensable to the security of pharmaceutical research: knowing where your data comes from and how reliable it is are particularly important in bioscience research. We provide a Data Provenance Pipeline (DPP) hash which sets immutable tracking on all datasets.

Request a dataset with custom features

While our datasets, derived from language modeling, are customizable and can be leveraged by most research efforts in biosciences and other industries such as materials, chemical and financial, we also provide REST API access to many pre-packaged high-value datasets out-of-the-box.

Contact us to learn more