Encode

The Encode section of the BioLM Console provides access to models for embedding biological sequences - transforming proteins, antibodies, or nucleic acids into numerical vector representations.

These embeddings capture structural, evolutionary, and functional features that can be used in downstream tasks such as clustering, similarity search, classification, or input to custom machine learning models.

Each model entry in the table displays:

  • Model Name – the encoder model’s identifier.
  • Description – a short summary of what the model represents and how embeddings are generated.
  • Length – the dimensionality of the resulting embedding vector.
  • Parameters – the approximate number of trainable parameters in the model.
  • Status – indicates whether the model is currently active (Cold Start toggle).

Each row also includes a View API Docs button for programmatic access and usage examples.

Browsing Models

The Encode catalog presents available embedding models in a sortable table view.

You can use the Tags dropdown to filter by type (e.g. “protein,” “antibody,” or “DNA”) or search directly for a model by name.

Example Model Types

Some of the commonly used encoder models include:

  • AbLang-2 – antibody-specific language model that generates embeddings optimized to reduce germline bias.
  • BioLMTox-2 – sequence-based encoder for predicting protein toxicity potential across multiple domains of life.
  • DNABERT-2 – transformer model for generating DNA embeddings using byte-pair encoding.
  • ESM-2 150M / ESM-2 35M – protein language models for embedding protein sequences and inferring residue-level structure and relationships.
  • AntiFold – structure-guided antibody model supporting embedding of optimized CDR sequences.

Using a Model

  1. Select a model row in the table.

    Click API Docs to open detailed documentation, including:

    • Input format (FASTA, sequence string, or JSON payloads)
    • Output structure (embedding vector dimensions and example outputs)
    • Example API and SDK calls
  2. Optionally toggle Cold Start if you want to pre-initialize the model before calling it from the API.

Typical Use Cases

  • Generating embeddings for protein or DNA sequences to use as ML features
  • Comparing or clustering related sequences
  • Screening or annotating sequences for functional or structural similarity
  • Using embeddings as inputs to downstream predictive or generative workflows

Notes

  • All embedding models can be accessed via the web console or the BioLM SDK.
  • Embeddings are typically returned as arrays of floating-point values (NumPy or JSON format).
  • Dimensionality and model parameter count vary across models; higher-parameter models generally provide richer contextual features.
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us