WeMol

结构预测

Structure Prediction

根据预测的目标推荐使用相应的模块，包括蛋白质结构预测、抗体-抗原复合物结构预测、蛋白-小分子复合物结构预测、完整抗体IgG结构预测、环肽结构预测和RNA结构预测等。推荐使用AlphaFold3 like结构预测模型Protenix、Boltz-1、Chai-1、HelixFold3等，这些模型在常规的蛋白质结构、小分子配体、核酸分子（包括DNA和RNA）的预测精度上与AF3相当。

模块/流程名称	描述
Structure Prediction (Protenix)	AlphaFold3 like结构预测模型，基于字节跳动AML AI4Science团队的Protenix模型
Structure Prediction (Helixfold3)	AlphaFold3 like结构预测模型，基于百度螺旋桨PaddleHelix团队的HelixFold3模型
Structure Prediction (Chai-1)	AlphaFold3 like结构预测模型，基于Chai Discovery, Inc.公司的Chai-1模型
Structure Prediction (Boltz-1)	AlphaFold3 like结构预测模型，基于MIT麻省理工的Boltz-1模型
Multi-Model Structure Prediction流程	集成了3款AF3-like模型（Protenix、Boltz-1、Chai-1），一次调用多个模型进行结构预测
Protein Structure Prediction (AlphaFold2.3.2)	推荐用于蛋白质结构预测、抗体-抗原复合物结构预测
Protein Structure Prediction (ESMFold)	推荐用于抗体可变区单体结构预测
Protein Structure Prediction (RaptorX-Single)	单序列的蛋白质结构预测
Immune Protein Structure Prediction	免疫蛋白结构预测
Biomolecular Structure Prediction (RFAA)	蛋白-小分子复合物结构预测
IgG Modeling	完整抗体IgG结构预测
Cyclic Peptide Structure Prediction	环肽结构预测
RNA Secondary Structure Prediction	RNA二级结构预测
RNA 3D Structure Prediction	RNA三级结构预测

教程

结构预测介绍文档

Based on the predicted targets, it is recommended to use the corresponding modules, including protein structure prediction, antibody-antigen complex structure prediction, protein-small molecule complex structure prediction, full antibody IgG structure prediction, cyclic peptide structure prediction, and RNA structure prediction, etc. It is recommended to use structure prediction models like AlphaFold3, such as Protenix, Boltz-1, Chai-1, and HelixFold3. These models achieve prediction accuracy comparable to AF3 for conventional protein structures, small molecule ligands, and nucleic acid molecules (including DNA and RNA).

Module/Process Name	Description
Structure Prediction (Protenix)	An AlphaFold3-like structure prediction model based on the Protenix model developed by ByteDance’s AML AI4Science team.
Structure Prediction (HelixFold3)	An AlphaFold3-like structure prediction model based on the HelixFold3 model developed by Baidu’s PaddleHelix team.
Structure Prediction (Chai-1)	An AlphaFold3-like structure prediction model based on the Chai-1 model developed by Chai Discovery, Inc.
Structure Prediction (Boltz-1)	An AlphaFold3-like structure prediction model based on the Boltz-1 model developed by MIT (Massachusetts Institute of Technology).
Multi-Model Structure Prediction Workflow	Combines four AF3-like models (Protenix, Boltz-1, Chai-1, HelixFold3) to perform structure prediction by invoking multiple models simultaneously.
Protein Structure Prediction (AlphaFold2.3.2)	Recommended for protein structure prediction, antibody-antigen complex structure prediction
Protein Structure Prediction (ESMFold)	Recommended for antibody variable region monomer structure prediction
Protein Structure Prediction (RaptorX-Single)	Single-sequence protein structure prediction
Immune Protein Structure Prediction	Immune protein structure prediction
Biomolecular Structure Prediction (RFAA)	Protein-small molecule complex structure prediction
IgG Modeling	Full antibody IgG structure prediction
Cyclic Peptide Structure Prediction	Cyclic peptide structure prediction
RNA Secondary Structure Prediction	RNA secondary structure prediction
RNA 3D Structure Prediction	RNA tertiary structure prediction

Tutorial

Structure Prediction Introduction Document

结构比对

Structure Alignment

RMSD (Root Mean Square Deviation) 和 DockQ 都是评估分子结构相似性和对接模型质量的指标，但它们的应用范围和考量因素有所不同。

RMSD (Root Mean Square Deviation)

定义： RMSD 是衡量两个叠加的分子结构之间原子位置平均偏差的量度。它通过计算对应原子（通常是主链原子，如 Cα 原子，或所有重原子）在三维空间中的距离平方的均值，再开平方根得到。

应用场景：

蛋白质结构比较： 广泛用于比较两个或多个蛋白质结构之间的相似性，例如评估预测结构与实验结构的一致性，或者比较不同构象下的蛋白质结构。
分子动力学模拟： 用于监测模拟过程中分子结构随时间的稳定性，计算结构相对于初始构象的偏差。
小分子对接： 在小分子与蛋白质对接中，可以用来衡量预测的配体结合姿态与已知实验姿态的相似性。

特点：

单位是距离（Ångström）： RMSD 值越小，表示结构越相似。
对全局结构敏感： 即使少量原子的大偏差也会导致 RMSD 值显著增加。
不区分界面和非界面区域： RMSD 是对整个结构或指定原子集的整体比较，不特别关注分子间的相互作用界面。

DockQ

定义： DockQ 是一个专门用于评估蛋白质-蛋白质对接模型质量的连续性指标，范围在 [0,1] 之间。它结合了多个衡量对接质量的关键因素，以提供一个更全面、更接近 CAPRI (Critical Assessment of PRediction of Interactions) 评估标准的单一分数。

组成部分： DockQ 综合了以下几个关键指标：

界面 RMSD (iRMSD)： 衡量预测界面原子与真实界面原子之间的 RMSD。它只关注相互作用区域的结构偏差。
配体 RMSD (LRMSD)： 衡量配体分子（例如，对接中的一个小蛋白质）与真实配体位置的 RMSD。
天然接触分数 (Fnat)： 衡量预测模型中与真实复合物中相同接触点的比例。

计算方式： DockQ 并非简单的线性组合，而是通过对这些组分进行非线性变换和组合得出的，旨在更好地重现 CAPRI 的质量分类（Incorrect, Acceptable, Medium, High）。

应用场景：

蛋白质-蛋白质对接： 主要用于评估蛋白质-蛋白质对接预测模型的质量，判断预测的结合姿态是否准确。
对接方法开发和比较： 作为标准度量，用于评估不同对接算法的性能。

特点：

连续分数（0-1）： DockQ 值越高，表示对接模型质量越好（0代表质量差，1代表完美）。
关注界面质量： DockQ 特别强调界面区域的准确性，这对于评估蛋白质-蛋白质相互作用至关重要。
综合性指标： 它不仅仅是几何上的相似性，还考虑了接触点的正确性，因此比单纯的 RMSD 更能反映对接的生物学意义。
与 CAPRI 评估相关： DockQ 的设计初衷是为了更好地反映 CAPRI 比赛中使用的复杂评估标准，从而实现对接模型质量的标准化和可解释性比较。

RMSD 和 DockQ 的主要区别总结

特征	RMSD (Root Mean Square Deviation)	DockQ
应用范围	广泛用于各种分子结构比较（蛋白质、小分子、构象变化）	主要用于蛋白质-蛋白质对接模型质量评估
评估目标	衡量两个结构之间原子位置的几何相似性	衡量蛋白质-蛋白质对接模型在界面区域的准确性
考量因素	仅考虑原子位置的几何偏差	综合考虑界面 RMSD、配体 RMSD 和天然接触分数
结果形式	距离单位（Å），越小越好	0-1 之间的连续分数，越大越好
侧重点	全局或局部结构相似性	蛋白质相互作用界面的准确性和生物学相关性
与对接关系	可以作为对接评估的一个组成部分（如iRMSD, LRMSD）	专门为蛋白质对接设计，整合了多个对接相关指标

简而言之，RMSD 是一个更通用的几何相似性度量，可以用于各种分子结构比较。而 DockQ 则是一个专门为蛋白质-蛋白质对接模型设计的高度集成的质量评估指标，它更全面地反映了对接的生物学相关性和准确性，因为它综合了界面几何精度和关键相互作用的正确性。在评估蛋白质-蛋白质对接时，DockQ 通常被认为是更优选和更具代表性的指标。

RMSD (Root Mean Square Deviation) and DockQ are both metrics used to evaluate molecular structure similarity and docking model quality, but they differ in their range of applications and considerations.

RMSD (Root Mean Square Deviation)

Definition: RMSD is a measure of the average deviation in atomic positions between two superimposed molecular structures. It is calculated by taking the square root of the mean of the squared distances between corresponding atoms (typically backbone atoms, such as Cα atoms, or all heavy atoms) in three-dimensional space.

Applications:

Protein Structure Comparison: Widely used to compare the similarity between two or more protein structures, such as assessing the consistency between predicted and experimental structures, or comparing protein structures in different conformations.
Molecular Dynamics Simulations: Used to monitor the stability of molecular structures over time during simulations, calculating deviations relative to the initial conformation.
Small Molecule Docking: In small molecule-protein docking, it can be used to assess the similarity between predicted ligand binding poses and known experimental poses.

Characteristics:

Measured in Distance (Ångström): The smaller the RMSD value, the more similar the structures are.
Sensitive to Global Structure: Even a small number of atoms with large deviations can significantly increase the RMSD value.
Does Not Distinguish Interface and Non-interface Regions: RMSD is a global comparison of the entire structure or a specified set of atoms, without special focus on interaction interfaces.

DockQ

Definition: DockQ is a continuous metric specifically designed to evaluate the quality of protein-protein docking models, ranging from [0,1]. It combines multiple key factors for assessing docking quality to provide a more comprehensive score that aligns closely with CAPRI (Critical Assessment of PRediction of Interactions) evaluation standards.

Components: DockQ integrates the following key metrics:

Interface RMSD (iRMSD): Measures the RMSD between predicted interface atoms and true interface atoms, focusing only on structural deviations in the interaction region.
Ligand RMSD (LRMSD): Measures the RMSD of the ligand molecule (e.g., a small protein in the docking) relative to its true position.
Fraction of Native Contacts (Fnat): Measures the proportion of contact points in the predicted model that are the same as those in the true complex.

Calculation Method: DockQ is not a simple linear combination but is derived through nonlinear transformations and combinations of these components, aiming to better reproduce CAPRI’s quality classifications (Incorrect, Acceptable, Medium, High).

Applications:

Protein-Protein Docking: Mainly used to assess the quality of protein-protein docking prediction models, determining whether the predicted binding poses are accurate.
Docking Method Development and Comparison: Used as a standard measure to evaluate the performance of different docking algorithms.

Characteristics:

Continuous Score (0-1): The higher the DockQ value, the better the quality of the docking model (0 indicates poor quality, 1 indicates perfect quality).
Focus on Interface Quality: DockQ emphasizes the accuracy of the interface region, which is crucial for evaluating protein-protein interactions.
Comprehensive Metric: It considers not only geometric similarity but also the correctness of contact points, making it more reflective of the biological significance of docking compared to RMSD alone.
Related to CAPRI Evaluation: DockQ is designed to better reflect the complex evaluation standards used in the CAPRI competition, enabling standardized and interpretable comparisons of docking model quality.

Summary of Main Differences Between RMSD and DockQ

Feature	RMSD (Root Mean Square Deviation)	DockQ
Scope of Application	Widely used for various molecular structure comparisons (proteins, small molecules, conformational changes)	Primarily used for evaluating the quality of protein-protein docking models
Evaluation Target	Measures geometric similarity of atomic positions between two structures	Measures the accuracy of the interface region in protein-protein docking models
Considered Factors	Considers only geometric deviations of atomic positions	Integrates interface RMSD, ligand RMSD, and fraction of native contacts
Result Format	Distance unit (Å), smaller is better	Continuous score between 0-1, higher is better
Focus	Global or local structural similarity	Accuracy and biological relevance of protein interaction interfaces
Relation to Docking	Can be a component of docking evaluation (e.g., iRMSD, LRMSD)	Specifically designed for protein docking, integrating multiple docking-related metrics

In short, RMSD is a more general metric for geometric similarity, applicable to various molecular structure comparisons. DockQ, on the other hand, is a highly integrated quality assessment metric specifically designed for protein-protein docking models, providing a more comprehensive reflection of the biological relevance and accuracy of docking by integrating interface geometric precision and the correctness of key interactions. In evaluating protein-protein docking, DockQ is often considered a more preferred and representative metric.

免疫原性

Immunogenicity

免疫原性预测已经历多个版本迭代，目前应用版本为：WeADApt v4.1.0, AlphaMHC v3.0 beta, AlphaMHC v2.0。

同时也可以从WeSeq中提交预测：WeSeq->Immunogenicity，界面更友好（推荐v4）。

免疫原性介绍文档
Immunogenicity prediction has undergone multiple iterations, and the currently applied versions are:
You can also submit predictions from WeSeq: WeSeq->Immunogenicity, which offers a more user-friendly interface (supports v2/v3/v4).

Immunogenicity Introduction Document
稳定性

Stability
热稳定性与蛋白的折叠自由能正相关，可能影响表达、纯度、PK等，优化方式包括基于物理的能量计算和ML/AI模型。
- 优化抗体稳定性，可使用Antibody Stability Optimization v3.1或Antibody Stability Optimization v3.0 plus MD或。
  抗体稳定性优化流程介绍文档
- 优化蛋白稳定性，可使用Protein Stability Optimization v3.1或Protein Stability Optimization v3.0 plus MD。
  蛋白稳定性优化流程介绍文档
- 预测蛋白质的绝对稳定性，可使用Absolute Folding Stability。
  蛋白绝对稳定性预测介绍文档
- 预测蛋白稳定性相对结合自由能，可使用Protein FEP。
- 基于ThermoMPNN模型预测蛋白质单点突变的稳定性变化，可使用Mutation Energy of Stability (ThermoMPNN)。
- 基于序列预测蛋白中潜在的PTM位点，可使用PTM Hotspot by Sequence。建议在WeSeq中进行分析：WeSeq->PTM。
- 基于结构预测蛋白中潜在的PTM位点，可使用PTM Hotspot by Structure。
- 基于ESMIF逆折叠模型，预测能提升结构稳定性的单点或多点突变，可使用Structure Evolution。
Thermal stability is positively correlated with the folding free energy of proteins, which may affect expression, purity, pharmacokinetics (PK), etc. Optimization methods include physics-based energy calculations and ML/AI models.
- To optimize antibody stability, you can use Antibody Stability Optimization v3.0 or Antibody Stability Optimization v3.0 plus MD.
  Antibody Stability Optimization Process Introduction Document
- To optimize protein stability, you can use Protein Stability Optimization v3.0 or Protein Stability Optimization v3.0 plus MD.
  Protein Stability Optimization Process Introduction Document
- To predict the absolute stability of proteins, you can use Absolute Folding Stability.
  Absolute Folding Stability Prediction Introduction Document
- To predict the relative binding free energy of protein stability, you can use Protein FEP.
- To predict the stability changes of protein single-point mutations based on the ThermoMPNN model, you can use Mutation Energy of Stability (ThermoMPNN).
- To predict potential PTM sites in proteins based on sequence, you can use PTM Hotspot by Sequence. It is recommended to perform the analysis in WeSeq: WeSeq -> PTM.
- To predict potential PTM sites in proteins based on structure, you can use PTM Hotspot by Structure.
可开发性

Developability
可开发性包括蛋白表面patch分析、理化性质计算（含pI）、TAP原则、PTM（基于序列）、基于结构的异构化预测、断裂位点预测等。
- 成药性一键综合评价
  - 进行成药性一键综合评价，可以使用抗体可开发性预测流程，Antibody Developability Properties v4或Antibody Developability Properties v3。
  - 同时，也可以在WeSeq中进行抗体可开发性预测分析，WeSeq->Developability->Antibody General Evaluation。
- Patch分析
  - 建议从WeView中运行：WeView->Analysis->Patch。Patch分析介绍文档
- PTM预测
  - 基于序列的PTM预测，建议直接在WeSeq运行：WeSeq->PTM。PTM预测介绍文档
  - 基于结构的PTM预测，可以直接在模块中运行PTM Hotspot by Structure。
- 抗体成药性预测（TAP）
  - 可以直接在模块中运行Therapeutic Antibody Profiler。TAP介绍文档
- 溶解度预测
  - 可以直接在模块中运行，Solubility Score，Solubility Score (CamSol)。溶解度预测介绍文档
- 聚集度预测
  - 可以直接在模块中运行，Aggregation Score。聚集度预测介绍文档
Developability includes protein surface patch analysis, physicochemical property calculations (including pI), TAP principles, PTM (based on sequence), structure-based isomerization prediction, cleavage site prediction, etc.

Antibody Developability Properties
- For a comprehensive evaluation of druggability, you can use the antibody developability prediction workflows, Antibody Developability Properties v4 or Antibody Developability Properties v3.
- Additionally, you can perform antibody developability prediction analysis in WeSeq: WeSeq -> Developability -> Antibody General Evaluation.
Patch Analysis
- It is recommended to run from WeView: WeView -> Analysis -> Patch. Patch Analysis Introduction Document
PTM Prediction
- For sequence-based PTM prediction, it is recommended to run directly in WeSeq: WeSeq -> PTM. PTM Prediction Introduction Document
- For structure-based PTM prediction, you can run directly in the module PTM Hotspot by Structure.
Antibody Developability Prediction (TAP)
- You can run directly in the module Therapeutic Antibody Profiler. TAP Introduction Document
Solubility Prediction
- You can run directly in the module Solubility Score or Solubility Score (CamSol). Solubility Prediction Introduction Document
Aggregation Prediction
- You can run directly in the module Aggregation Score. Aggregation Prediction Introduction Document
序列分析

Sequence Analysis
序列分析包括序列编号、多序列比对、测序数据分析、频率分析、序列突变等。
- 序列编号
  进行抗体序列编号，建议在WeSeq中运行：WeSeq->Number。序列编号介绍文档
- 多序列比对
  进行多序列比对，建议在WeSeq中运行：WeSeq->Align。多序列比对介绍文档
- 测序数据分析
  进行测序数据分析，可以使用NGS Analysis。NGS Analysis介绍文档
- 频率分析
  进行频率分析，建议在WeSeq运行，WeSeq->Frequency。频率分析介绍文档
- 序列突变
  进行序列突变，建议在WeSeq中操作：WeSeq->Edit->Batch Mutate。或者使用Sequence Mutation模块。
Sequence analysis includes sequence numbering, multiple sequence alignment, sequencing data analysis, frequency analysis, and sequence mutations.

Sequence Numbering

For antibody sequence numbering, it is recommended to run in WeSeq: WeSeq -> Number. Sequence Numbering Introduction Document

Multiple Sequence Alignment

For multiple sequence alignment, it is recommended to run in WeSeq: WeSeq -> Align. Multiple Sequence Alignment Introduction Document

Sequencing Data Analysis

For sequencing data analysis, you can use NGS Analysis. NGS Analysis Introduction Document

Frequency Analysis

For frequency analysis, it is recommended to run in WeSeq: WeSeq -> Frequency. Frequency Analysis Introduction Document

Sequence Mutation

For sequence mutation, it is recommended to operate in WeSeq: WeSeq -> Edit -> Batch Mutate. Alternatively, you can use the Sequence Mutation module.
专利分析

Patent Analysis
专利分析包括专利抗体CDR序列搜索、专利序列提取、专利图片OCR。专利分析介绍文档
- 进行专利抗体CDR序列搜索，可以应用Patent BLAST。
- 从专利文本文件或专利序列图片OCR提取专利序列，可以应用Patent Sequence Listing。
  
  >
Patent analysis includes searching for antibody CDR sequences in patents, extracting patent sequences, and performing OCR on patent images. Patent Analysis Introduction Document

Patent Antibody CDR Sequence Search

To search for antibody CDR sequences in patents, you can use Patent CDR BLAST.

Extracting Patent Sequences

To extract sequences from patent text files or perform OCR on patent sequence images, you can use Patent Sequence Listing.
蛋白设计

Protein Design
- 从头结构生成
  进行蛋白结构从头生成，可以应用Protein Design (RFDiffusion)。RFDiffusion介绍文档
- 基于主链结构设计序列（逆折叠）
  - ProteinMPNN，建议从WeSeq中运行：WeSeq->Design->ProteinMPNN。ProteinMPNN介绍文档
  - ABACUS-R，可以使用Protein Design (ABACUS-R)。
  - RFDesign，可以使用Protein Design (RFDesign)。RFDesign介绍文档。
  - ESMIF逆折叠模型，可使用Structure Evolution。
De Novo Protein Structure Generation

To perform de novo protein structure generation, you can use Protein Design (RFDiffusion). RFDiffusion Introduction Document

Sequence Design Based on Backbone Structure (Inverse Folding)
1. ProteinMPNN
  - To design sequences based on the backbone structure, you can use ProteinMPNN. It is recommended to run it in WeSeq: WeSeq -> Design -> ProteinMPNN. ProteinMPNN Introduction Document
2. ABACUS-R
  - You can use Protein Design (ABACUS-R) for sequence design based on backbone structures.
3. RFDesign
  - To use RFDesign for sequence design, you can use Protein Design (RFDesign). RFDesign Introduction Document.
4. ESMIF Inverse Folding Model
  - For sequence design using ESMIF inverse folding model, you can use Structure Evolution.
抗体设计

Antibody Design
- RFAntibody
  是基于RFAntibody（抗体微调版RFdiffusion）的抗体从头设计。Antibody Design (RFAntibody)。
- MEAN模型
  基于MEAN模型实现的抗体设计，该模型采用多通道等变图注意力网络，可用于设计CDR的一维序列和三维结构。Antibody Design (MEAN)。
- DiffAb模型
  基于扩散概率模型和等价神经网络的抗体设计，可针对特定抗原结构生成抗体，也可基于抗体-抗原复合物结构进行抗体结构和序列的优化。Antibody Design (DiffAb)。
- RFAntibody
  RFAntibody is an antibody de novo design method based on the fine-tuned version of RFdiffusion. Antibody Design (RFAntibody).
- MEAN Model
  The MEAN model enables antibody design using a multi-channel equivariant graph attention network, which can be used to design both the one-dimensional sequence and three-dimensional structure of CDRs. Antibody Design (MEAN).
- DiffAb Model
  The DiffAb model utilizes diffusion probabilistic models and equivariant neural networks for antibody design. It can generate antibodies specific to a given antigen structure and optimize antibody structure and sequence based on antibody-antigen complex structures. Antibody Design (DiffAb).
酶

Enzyme
多肽

Peptide
多肽分析包括线性肽/环肽结构预测、多肽对接筛选、线性肽/环肽设计、信号肽预测。
- 进行线性肽结构预测，可以应用Peptide Structure Generation。
- 进行环肽结构预测，可以应用Cyclic Peptide Structure Prediction。
- 进行基于受体蛋白的多肽对接筛选，可以应用Peptide VS。
- 进行环肽设计，可以应用Cyclic Peptide Design。
- 进行线性肽设计，可以应用Receptor-Based Peptide Design。
- 进行信号肽预测，可以应用Signal Peptide Prediction。
Peptide analysis includes linear/cyclic peptide structure prediction, peptide docking screening, linear/cyclic peptide design, and signal peptide prediction.
- For linear peptide structure prediction, you can use Peptide Structure Generation.
- For cyclic peptide structure prediction, you can use Cyclic Peptide Structure Prediction.
- For peptide docking screening based on receptor proteins, you can use Peptide VS.
- For cyclic peptide design, you can use Cyclic Peptide Design.
- For linear peptide design, you can use Receptor-Based Peptide Design.
- For signal peptide prediction, you can use Signal Peptide Prediction.
核酸

DNA/RNA
包括密码子优化、CDS优化、UTR优化等。
- 密码子优化，可使用Codon Optimization。
- CDS及UTR优化，可使用mRNA Optimization (AlphaRNA)。
- UTR优化，可使用mRNA 5’UTRs optimization。
Including codon optimization, CDS optimization, UTR optimization, etc.
- For codon optimization, you can use Codon Optimization.
- For CDS and UTR optimization, you can use mRNA Optimization (AlphaRNA).
- For UTR optimization, you can use mRNA 5’UTRs optimization.
靶点鉴定

Target Identification
靶点鉴定包括疾病相关靶点提取以及小分子靶点预测模块。靶点鉴定介绍文档
- 疾病相关靶点提取，可使用Target Prioritization (OpenTargets)，可以提取疾病相关的靶点，支持多种相关性打分。
- 小分子靶点预测，可使用Target Prediction (FastTargetPred)，基于二维相似度的小分子靶点预测模块，活性分子及靶点数据来源于ChEMBL数据库。
- 指定基因在肿瘤和正常组织表达情况的检索，可使用Tumor Gene Expression (TCGA)，可统计并绘制肿瘤细胞、肿瘤组织、正常组织等的基因表达差异，帮助药物靶点选择、研发立项和决策。
Target identification includes disease-related target extraction and small molecule target prediction modules. Target Identification Introduction Document
- For disease-related target extraction, you can use Target Prioritization (OpenTargets), which extracts disease-related targets and supports multiple relevance scoring.
- For small molecule target prediction, you can use Target Prediction (FastTargetPred), a small molecule target prediction module based on 2D similarity, with active molecule and target data sourced from the ChEMBL database.
- To search for the expression of specified genes in tumor and normal tissues, you can use Tumor Gene Expression (TCGA). This tool can statistically analyze and visualize the differences in gene expression among tumor cells, tumor tissues, and normal tissues, aiding in drug target selection, research and development project initiation, and decision-making.
小分子生成

Molecule Generation
小分子生成是从头设计全新分子的过程，可以基于多种AI架构生成类药分子，也可以基于靶点，骨架、活性分子生成衍生物或者相似分子。分子生成介绍文档
- REINVENT4
  应用De novo Generation (REINVENT4)模块运行计算。基于阿斯利康开源的REINVENT4算法用于小分子全新生成的模块。支持多种分子生成方式：Reinvent - 从头开始创造新分子，Libinvent - 修饰一个骨架，Linkinvent - 设计两个片段之间的linker，Mol2Mol - 在用户定义的相似度范围内优化分子。
- Scaffold Constrained Generation
  应用Scaffold Constrained Generation模块运行计算。骨架限制的生成模型，可以限制骨架，指定优化部位，特异性的生成全新分子库。
- Moses
  应用De novo Generation (Moses)模块运行计算。随机类药分子生成模型，基于多种主流的分子生成模型，包括字符级循环神经网络，变分自编码器，以及对抗自编码器的分子生成模块。
Small molecular generation is the process of designing entirely new molecules from scratch. It can generate drug-like molecules based on various AI architectures, or generate derivatives or similar molecules based on targets, scaffolds, or active molecules. Molecular Generation Introduction Document
- REINVENT4
  Use the De novo Generation (REINVENT4) module for computation. This module is based on AstraZeneca’s open-source REINVENT4 algorithm for the de novo generation of small molecules. It supports various molecular generation methods: Reinvent - create new molecules from scratch, Libinvent - modify a scaffold, Linkinvent - design a linker between two fragments, Mol2Mol - optimize molecules within a user-defined similarity range.
- Scaffold Constrained Generation
  Use the Scaffold Constrained Generation module for computation. This model generates molecules with scaffold constraints, allowing the restriction of the scaffold, specifying optimization sites, and generating a new library of molecules specifically.
- Moses
  Use the De novo Generation (Moses) module for computation. This is a random drug-like molecule generation model based on various mainstream molecular generation models, including character-level recurrent neural networks, variational autoencoders, and adversarial autoencoder molecular generation modules.
虚拟筛选

Virtual Screening
虚拟筛选根据配体或受体结构，对小分子化合物进行筛选，预测可能的活性分子，大大提高化合物药物发现进程，缩减药物发现费用。

基于受体的方法
- 虚拟筛选流程 Cascaded Virtual Screening，基于配体和基于受体整合的虚拟筛选流程。
基于配体方法
- 性质过滤
  - Property Filter，根据性质过滤，包括基于相似度（2D和3D）,基于性质过滤，基于结构聚类，基于对接等。Property Filter介绍文档
  - PAINS Filter，过滤PAINS片段。PAINS Filter介绍文档
- 结构搜索
  - Substructure Search，从化合物库中查找含有特定子结构片段的化合物。
  - 2D Similarity Search，从化合物库中查找出与查询分子二维相似的化合物。二维形状搜索介绍文档
- 三维形状搜索
  - AlphaShape，从化合物库中查找出与查询分子三维形状相似的化合物，私有库需要结合3D Conf Generation (AlphaConf)模块生成筛序库的分子构象。三维形状搜索介绍文档
- 结构聚类
  - Diverse Subset，通过分子聚类，挑选具有代表性的结构多样性的分子子集，常用于基于其他筛选手段得到的分子库进行进一步挑选，减小筛选hits的数量。
Virtual screening is a computational technique used to identify potential active compounds by screening large libraries of small molecules. This process can significantly accelerate drug discovery and reduce costs.

Receptor-Based Methods
1. Virtual Screening Workflow
  - Cascaded Virtual Screening: An integrated virtual screening workflow that combines ligand-based and receptor-based methods.
Ligand-Based Methods
1. Property Filtering
  - Property Filter: Filters compounds based on properties such as 2D and 3D similarity, property-based filtering, structural clustering, and docking. Property Filter Introduction Document
  - PAINS Filter: Filters out compounds containing PAINS fragments. PAINS Filter Introduction Document
2. Structure Search
  - Substructure Search: Searches for compounds containing specific substructures within a compound library.
  - 2D Similarity Search: Finds compounds in a library that are 2D similar to the query molecule. 2D Similarity Search Introduction Document
3. 3D Shape Search
  - AlphaShape: Searches for compounds in a library that have a 3D shape similar to the query molecule. For private libraries, it needs to be combined with the 3D Conf Generation (AlphaConf) module to generate molecular conformations for screening. 3D Shape Search Introduction Document
4. Structure Clustering
  - Diverse Subset: Selects a representative subset of structurally diverse molecules through clustering. This is often used to further refine a set of hits obtained from other screening methods, reducing the number of hits.
分子性质

Molecular Property
分子性质包括小分子的理化性质以及药代动力学（ADMET）性质。
- 理化性质计算，建议使用Descriptors (RDKit)。理化性质计算介绍文档
- 化合物可合成性评估，建议使用Synthetic Accessibility Score。化合物可合成性评估介绍文档
- P450代谢位点预测，建议使用Metabolism Site Prediction。P450代谢位点预测介绍文档
- 毒效片段识别，建议使用Toxic Fragment Identification。毒效片段识别介绍文档
- ADMET性质预测，建议使用ADMET Prediction。ADMET介绍文档
Molecular properties include the physicochemical properties and pharmacokinetic (ADMET) properties of small molecules.
- Physicochemical Properties Calculation, recommended tool: Descriptors (RDKit). Physicochemical Properties Calculation Documentation
- Synthetic Accessibility Evaluation, recommended tool: Synthetic Accessibility Score. Synthetic Accessibility Evaluation Documentation
- P450 Metabolism Site Prediction, recommended tool: Metabolism Site Prediction. P450 Metabolism Site Prediction Documentation
- Toxic Fragment Identification, recommended tool: Toxic Fragment Identification. Toxic Fragment Identification Documentation
- ADMET Properties Prediction, recommended tool: ADMET Prediction. ADMET Prediction Documentation
分子对接

Docking
分子对接是研究相互作用的重要工具，包括蛋白-小分子，蛋白-蛋白对接。
- 蛋白-小分子对接
  - AutoDock-GPU，建议从WeView中运行：WeView->Docking。基于GPU加速的AutoDock的分子对接工具。AutoDock-GPU对接介绍文档
  - Molecular Docking (SMINA)，基于Autodock Vina分支SMINA的分子对接工具。SMINA对接介绍文档
  - Molecular Docking (DOCK)，基于Dock6的分子对接工具。DOCK对接介绍文档
  - Molecular Docking (DiffDock)，基于扩散生成模型的对接工具。DiffDock对接介绍文档
- 蛋白-蛋白/核酸对接
  - Protein Docking (HDOCK)，支持蛋白-蛋白，蛋白-DNA/RNA对接，支持限制位点对接。
  - Protein Docking (FRODOCK)，除了常规蛋白-蛋白，还支持抗体抗原模式对接，支持限制位点对接。
  - Antibody-Antigen Docking (HADDOCK)，支持抗体抗原模式对接。
Molecular docking is an important tool for studying interactions, including protein-small molecule and protein-protein docking.
- Protein-Small Molecule Docking
  - AutoDock-GPU, recommended to run from WeView: WeView->Docking. A GPU-accelerated molecular docking tool based on AutoDock. AutoDock-GPU Docking Documentation
  - Molecular Docking (SMINA), a molecular docking tool based on the AutoDock Vina branch SMINA. SMINA Docking Documentation
  - Molecular Docking (DOCK), a molecular docking tool based on Dock6. DOCK Docking Documentation
  - Molecular Docking (DiffDock), a docking tool based on diffusion generative models. DiffDock Docking Documentation
- Protein-Protein/Nucleic Acid Docking
  - Protein Docking (HDOCK), supports protein-protein, protein-DNA/RNA docking, and site-specific docking.
  - Protein Docking (FRODOCK), supports not only conventional protein-protein docking but also antibody-antigen mode docking and site-specific docking.
  - Antibody-Antigen Docking (HADDOCK), supports antibody-antigen mode docking.
格式转换

Format Conversion
分子格式转换工具，包括不同格式文件转换、氨基酸字母格式转换等。
- 基于Open Babel的分子文件格式转换工具，可以使用Format Conversion (Open Babel)。
- 基于RDKit的分子文件格式转换工具，可以使用Format Conversion (RDKit)。
- 小分子构象生成工具3D Conf Generation (AlphaConf)，可将生成的二进制构象压缩文件AC.GZ转为便于查看的SDF文件。
- 氨基酸格式转换工具，可使用3-letter AA Conversion，将氨基酸缩写三字母格式转换为单字母格式。
Molecular format conversion tools, including conversion of different format files, amino acid letter format conversion, etc.
- For molecular file format conversion based on Open Babel, you can use Format Conversion (Open Babel).
- For molecular file format conversion based on RDKit, you can use Format Conversion (RDKit).
- For small molecule conformation generation, use the tool 3D Conf Generation (AlphaConf), which can convert the generated binary conformation compressed file AC.GZ to an SDF file for easy viewing.
- For amino acid format conversion, you can use 3-letter AA Conversion to convert amino acid abbreviations from three-letter format to one-letter format.

结构处理

Structure Preparation

对PDB结构文件进行处理，包括去除杂质、补全确实原子或残基、加氢、修改链名或残基编号等。

模块/流程名称	描述
Structure Preparation	首选，支持提取链，去除杂质，补全缺失原子、残基，以及蛋白氨基酸残基的质子化判断以及加氢等操作
Structure Minimization	结构优化模块，支持氢原子优化、氨基酸侧链优化、整体优化三种方式
PDB ReNumbering	针对蛋白PDB文件中残基重新编号的工具模块，指定残基开始编号序号，同时支持抗体kabat，imgt以及chothia的重编号
PDB Mutation	用于突变PDB格式的蛋白质结构并返回突变后的结构

教程

模拟结构处理介绍文档

Processing of PDB structure files includes removing impurities, completing missing atoms or residues, adding hydrogen atoms, modifying chain names or residue numbers, etc.

Module/Process Name	Description
Structure Preparation	Preferred option, supports chain extraction, impurity removal, completion of missing atoms and residues, and operations like protonation judgment and hydrogen addition for protein amino acid residues
Structure Minimization	Structure optimization module, supports three methods: hydrogen atom optimization, amino acid side chain optimization, and overall optimization
PDB ReNumbering	Tool module for renumbering residues in protein PDB files, specifying the starting number for residues, and supporting renumbering according to Kabat, IMGT, and Chothia schemes
PDB Mutation	Used for mutating protein structures in PDB format and returning the mutated structure

Tutorial

Introduction to Simulated Structure Processing Documentation

轨迹分析

Trajectory Analysis

轨迹分析对分子动力学模拟后产生的轨迹进行结构分析，观察研究对象在模拟过程中的动态变化。

模块/流程名称	描述
MD Trajectory	可根据起始帧数、结束帧数以及间隔帧数对平衡模拟进行轨迹提取，并将其转换为GRO或者PDB格式文件
MD RMS	体系结构稳定性分析模块,包括RMSD、RMSF的计算
MD Hbond	轨迹氢键分析工具
MD Distance	轨迹距离分析工具，输出指定原子、残基之间动态距离变化
MD Clustering	轨迹聚类分析工具
MD PCA	轨迹主成分分析工具
MD Gyration	回旋半径分析工具
MD SASA	计算指定组别的溶剂可及表面积

教程

分子动力学介绍文档

Trajectory analysis involves structural analysis of the trajectories generated from molecular dynamics simulations to observe the dynamic changes of the study object during the simulation process.

Module/Workflow Name	Description
MD Trajectory	Extracts trajectories from equilibrium simulations based on start frame, end frame, and interval frame, and converts them to GRO or PDB format files
MD RMS	System structure stability analysis module, including calculations of RMSD and RMSF
MD Hbond	Trajectory hydrogen bond analysis tool
MD Distance	Trajectory distance analysis tool, outputs dynamic distance changes between specified atoms or residues
MD Clustering	Trajectory clustering analysis tool
MD PCA	Trajectory principal component analysis tool
MD Gyration	Radius of gyration analysis tool
MD SASA	Calculates the solvent accessible surface area of specified groups

Tutorial

Molecular Dynamics Introduction Documentation

结合自由能

Binding Free Energy

结合自由能计算是预测分子间结合强弱的重要方法。

模块/流程名称	描述
MMPBSA	计算受体与配体之间的结合自由能，并且提供能量分解数据等数据
Alanine Scan (MMPBSA)	计算丙氨酸突变后的结合自由能，并且提供能量分解数据
MMPBSA of One Protein/DNA Structure	计算一帧蛋白-蛋白复合物/蛋白-核酸复合物结构的结合自由能流程
MMPBSA of One Protein-Ligand Structure	计算一帧蛋白-小分子结构的结合自由能流程
PPI Binding Energy (Graphomer)	蛋白-蛋白复合物结合能模块，基于图transformer模型预测蛋白-蛋白结合亲和力
PPI Binding Energy & Contacts	蛋白-蛋白复合物结合能与相互作用分析模块，基于界面接触特征预测蛋白-蛋白结合亲和力

教程

分子动力学介绍文档

Combining free energy calculations is a crucial method for predicting the strength of molecular interactions.

Module/Workflow Name	Description
MMPBSA	Calculates the binding free energy between receptor and ligand, and provides energy decomposition data
Alanine Scan (MMPBSA)	Calculates the binding free energy after alanine mutation, and provides energy decomposition data
MMPBSA of One Protein/DNA Structure	Workflow for calculating the binding free energy of a single protein-protein or protein-nucleic acid complex structure
MMPBSA of One Protein-Ligand Structure	Workflow for calculating the binding free energy of a single protein-small molecule structure

Tutorial

Molecular Dynamics Introduction Documentation

教程

Tutorial

RMSD (Root Mean Square Deviation)

DockQ

RMSD 和 DockQ 的主要区别总结

RMSD (Root Mean Square Deviation)

DockQ

Summary of Main Differences Between RMSD and DockQ

Antibody Developability Properties

Patch Analysis

PTM Prediction

Antibody Developability Prediction (TAP)

Solubility Prediction

Aggregation Prediction

Sequence Numbering

Multiple Sequence Alignment

Sequencing Data Analysis

Frequency Analysis

Sequence Mutation

Patent Antibody CDR Sequence Search

Extracting Patent Sequences

De Novo Protein Structure Generation

Sequence Design Based on Backbone Structure (Inverse Folding)

基于受体的方法

基于配体方法

Receptor-Based Methods

Ligand-Based Methods

教程

Tutorial

教程

Tutorial

教程

Tutorial