Title: | Maximum Likelihood Estimation of Relatedness using EM Algorithm |
---|---|
Description: | Inference of relatedness coefficients from a bi-allelic genotype matrix using a Maximum Likelihood estimation, Laporte, F., Charcosset, A. and Mary-Huard, T. (2017) <doi:10.1111/biom.12634>. |
Authors: | Fabien Laporte, Tristan Mary-Huard |
Maintainer: | Fabien Laporte <[email protected]> |
License: | AGPL-3 |
Version: | 2.0 |
Built: | 2025-02-22 05:39:56 UTC |
Source: | https://github.com/cran/Relatedness |
Inference of relatedness coefficients from a bi-allelic genotype matrix using a Maximum Likelihood estimation, Laporte, F., Charcosset, A. and Mary-Huard, T. (2017) <doi:10.1111/biom.12634>.
This package infers the relatedness distribution coefficients for all couple of individuals in a set from their genotype, provided in a bi-allelic genotype matrix. The main function is 'RelCoef' which infers those coefficients. The arguments of this function are a genotype matrix for individuals and an frequency matrix that displays the allelic frequency at each marker in each population. Alternatively, a parental genotype matrix and a crossing matrix can be used. Additional information about structure membership can also be provided via a ParentPop vector (for more details see the help of 'RelCoef'). The main matrix is writen with C language, make sure you can use this code.
Fabien Laporte, Tristan Mary-Huard
Maintainer: Fabien Laporte <[email protected]>
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) RelCoef(IndividualGenom=matrix(0,ncol=0,nrow=0),ParentalLineGenom=Genotype, Freq=Frequencies,Crossing=Cross,ParentPop=rep(1,20),Phased=TRUE,NbCores=2)
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) RelCoef(IndividualGenom=matrix(0,ncol=0,nrow=0),ParentalLineGenom=Genotype, Freq=Frequencies,Crossing=Cross,ParentPop=rep(1,20),Phased=TRUE,NbCores=2)
The crossing matrix for the example.
data("Cross")
data("Cross")
The format is: int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
data(Cross) head(Cross)
data(Cross) head(Cross)
A list of relatedness coefficients obtained with the example of RelCoef.
data("Delta")
data("Delta")
The format is: List of 15 $ Delta1 : num [1:4, 1:4] 0.1872 0 0 0 0.0199 ... $ Delta2 : num [1:4, 1:4] 0.00 0.00 0.00 0.00 3.55e-05 ... $ Delta3 : num [1:4, 1:4] 0 0 0 0 0.0472 ... $ Delta4 : num [1:4, 1:4] 0 0 0 0 0.0322 ... $ Delta5 : num [1:4, 1:4] 0 0 0 0 0.0871 ... $ Delta6 : num [1:4, 1:4] 0 0 0 0 0.0395 ... $ Delta7 : num [1:4, 1:4] 0 0 0 0 0.0429 ... $ Delta8 : num [1:4, 1:4] 0 0 0 0 0.0386 ... $ Delta9 : num [1:4, 1:4] 0.8128 0 0 0 0.0202 ... $ Delta10: num [1:4, 1:4] 0 0 0 0 0.000731 ... $ Delta11: num [1:4, 1:4] 0 0 0 0 0.028 ... $ Delta12: num [1:4, 1:4] 0 0 0 0 0.0849 ... $ Delta13: num [1:4, 1:4] 0 0 0 0 0.0174 ... $ Delta14: num [1:4, 1:4] 0 0 0 0 0.0437 ... $ Delta15: num [1:4, 1:4] 0 0 0 0 0.498 ...
data(Delta) print(Delta$Delta7)
data(Delta) print(Delta$Delta7)
The allele frequencies matrix for the example with 5000 markers and one population.
data("Frequencies")
data("Frequencies")
The format is: num [1:5000, 1:2] 0.268 0.786 0.804 0.238 0.235 ...
data(Frequencies) head(Frequencies)
data(Frequencies) head(Frequencies)
The Parental Line Genom matrix for the example with 10 parental lines genotyped with 5000 markers.
data("Genotype")
data("Genotype")
The format is: num [1:5000, 1:10] 0 1 0 0 0 1 1 0 1 1 ...
data(Genotype) head(Genotype)
data(Genotype) head(Genotype)
This function performs Maximum Likelihood estimation for the relatedness coefficients between individuals based on a bi-allelic genotype matrix. Alternatively, a parental genotype matrix and a crossing matrix can be used. In that case information about structure can also be taken into account via a ParentPop vector.
RelCoef(IndividualGenom = matrix(0, nrow=0, ncol=0), ParentalLineGenom = matrix(0, nrow=0, ncol=0), Freq = matrix(0, nrow=0, ncol=0), Crossing = matrix(0, nrow=0, ncol=0), ParentPop = rep(0,0), Combination = list(), Phased = FALSE, Details = FALSE, NbInit = 5, Prec = 10^(-4), NbCores = NULL)
RelCoef(IndividualGenom = matrix(0, nrow=0, ncol=0), ParentalLineGenom = matrix(0, nrow=0, ncol=0), Freq = matrix(0, nrow=0, ncol=0), Crossing = matrix(0, nrow=0, ncol=0), ParentPop = rep(0,0), Combination = list(), Phased = FALSE, Details = FALSE, NbInit = 5, Prec = 10^(-4), NbCores = NULL)
IndividualGenom |
Genotype matrix of individuals. Each individual is described by 2 columns. Each row corresponds to a marker. Entries of matrix IndividualGenom should be either 0 or 1. Either IndividualGenom or ParentalLineGenom has to be provided. |
ParentalLineGenom |
Genotype matrix of parental lines. Each parental line is described by one column with rows corresponding to markers. Entries of ParentalLineGenome should be either 0 or 1. |
Freq |
Allelic frequencies for allele 1 at each markers and for all populations (one column per population, one line per marker). |
Crossing |
Required when argument ParentalLineGenom is provided. A 2-column matrix where each row corresponds to a crossing between 2 parents. Parents should be numbered according to their order of appearance in the ParentalLineGenom matrix. |
ParentPop |
Only available if ParentalLineGenom is displayed. A vector of numbers corresponding to population membership for the parental lines. |
Combination |
If provided, a list of vector with two components. The jth vector is composed with the number of the first hybrid and the number of the second hybrid of the jth couple to study. |
Phased |
A Boolean with value TRUE if observations are phased. |
Details |
A Boolean variable. If TRUE, the relatedness mode graph is displayed. |
NbInit |
Number of initial values for the EM algorithm. |
Prec |
Convergence precision parameter for the EM algorithm. |
NbCores |
Number of cores used by the algorithm (Default is the number of cores available minus one). Only available for linux and Mac. |
Argument IndividualGenom should be used if the available data consist in genotypic information only. By default the data are assumed to be unphased and the function returns 9 relatedness coefficients. If data are phased, use argument Phased = TRUE to obtain the 15 relatedness coefficients. Note that in that case the ordering of the 2 columns per individual in IndividualGenome does matter. Alternatively, if the genotyped individuals are hybrids resulting from the crossing of parental lines (or combinations of parental gametes), it is possible to provide a ParentalLineGenom and a Crossing matrix directly. Additionally, the population membership of the parents can be provided via argument ParentPop. Whatever the arguments used to enter the genotypic data, the allelic frequencies of the markers have to be provided using argument Freq. Arguments NbInit and Prec are tuning parameters for the EM algorithm used for likelihood maximization.
By default, relatedness coefficients are displayed for all couple of genotyped individuals (or hybrids). In that case the function returns a list of matrices, each corresponding to a specific relatedness coefficients (details about relatedness coefficients can be obtained by displaying the relatedness mode graph with argument Details). Element (i,j) of matrix k corresponds to the kth estimated relatedness coefficient for the couple of individuals i and j. Alternatively, if a list of couples is specified with argument Combination, the function returns a list of vectors (each vector corresponding to an relatedness coefficient). In that case element i of vector k corresponds to the kth relatedness coefficient of the ith couple specified in Combination.
In absence of population structure, some relatedness coefficients are not identifiable. Since an EM algorithm is run for each couple of individuals, the procedure can be time consuming for large panels.
Fabien Laporte, 'UMR Genetique Quantitative et Evolution' INRA France.
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) RelatednessCoefficient <- RelCoef(IndividualGenom=matrix(0,ncol=0,nrow=0), ParentalLineGenom=Genotype, Freq=Frequencies,Crossing=Cross, ParentPop=rep(1,8),Phased=TRUE,NbCores=2) print(RelatednessCoefficient$Delta3)
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) RelatednessCoefficient <- RelCoef(IndividualGenom=matrix(0,ncol=0,nrow=0), ParentalLineGenom=Genotype, Freq=Frequencies,Crossing=Cross, ParentPop=rep(1,8),Phased=TRUE,NbCores=2) print(RelatednessCoefficient$Delta3)
This function performs Maximum Likelihood estimation for the relatedness coefficients between lines based on a bi-allelic genotype matrix.
RelCoefLine(LineGenom = matrix(0,nrow=0,ncol=0), Freq = matrix(0,nrow=0,ncol=0), LinePop = rep(0,0), Combination = NULL, NbInit = 5, Prec = 10^(-4), NbCores = NULL)
RelCoefLine(LineGenom = matrix(0,nrow=0,ncol=0), Freq = matrix(0,nrow=0,ncol=0), LinePop = rep(0,0), Combination = NULL, NbInit = 5, Prec = 10^(-4), NbCores = NULL)
LineGenom |
Genotype matrix of lines. Each line is described by 1 column. Each row corresponds to a marker. Entries of matrix Genotype should be either 0 or 1. |
Freq |
Allelic frequencies for allele 1 at each markers and for all populations (one column per population, one line per marker). |
LinePop |
A vector of numbers corresponding to population membership for the parental lines. |
Combination |
If provided, a list of vector with two components. The jth vector is composed with the number of the first hybrid and the number of the second hybrid of the jth couple to study. |
NbInit |
Number of initial values for the EM algorithm. |
Prec |
Convergence precision parameter for the EM algorithm. |
NbCores |
Number of cores used by the algorithm (Default is the number of cores available minus one). Only available for linux and Mac. |
By default, relatedness coefficients are displayed for all couple of genotyped lines. In that case the function returns a matrix corresponding to the Simple Relatedness Coefficient, i.e. the probability that each couple of lines are related. Element (i,j) of the matrix corresponds to the estimated relatedness coefficient for the couple of lines i and j. Alternatively, if a list of couples is specified with argument Combination, the function returns a list of coefficients (each coefficient corresponding to an relatedness coefficient). In that case element i of the list corresponds to the estimated relatedness coefficient of the ith couple specified in Combination.
Fabien Laporte, 'UMR Genetique Quantitative et Evolution' INRA France.
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) SimpleRelatedness <- RelCoefLine(LineGenom=Genotype,Freq=Frequencies, LinePop=rep(1,8),NbCores=2) print(SimpleRelatedness)
require('Relatedness') data(Genotype) data(Frequencies) data(Cross) SimpleRelatedness <- RelCoefLine(LineGenom=Genotype,Freq=Frequencies, LinePop=rep(1,8),NbCores=2) print(SimpleRelatedness)
Compute any relatedness synthetic criterion based on a linear combination of the relatedness coefficients.
RelComb(Combination, Delta, Crossing = matrix(0, nrow = 0, ncol = 0), ParentPop = rep(0, 0), ShowIdentifiable = TRUE)
RelComb(Combination, Delta, Crossing = matrix(0, nrow = 0, ncol = 0), ParentPop = rep(0, 0), ShowIdentifiable = TRUE)
Combination |
A vector, with length identical to the length of |
Delta |
A list of matrices, each corresponding to a specific relatedness coefficients. Element (i,j) of matrix k corresponds to the kth estimated relatedness coefficient for the couple of individuals i and j. |
Crossing |
A 2-column matrix where each row corresponds to a crossing between 2 parents. Parents should be numbered. |
ParentPop |
A vector of numbers corresponding to population membership for the parental lines. |
ShowIdentifiable |
A boolean describing whether the combination should be displayed only for identifiable cases. Default value is TRUE. |
The function can be applied to a list of relatedness coefficients - as produced by the RelCoeff function - to compute any synthetic criterion based on a linear combination of the relatedness coefficients, for all couples. Additional information Crossing and ParentPop are required if they were used in the RelCoeff function to obtain the relatedness coefficients. The function automatically checks the identifiability of the combination to be evaluated. Several classical genetic criteria are implemented by default and can be computed using the Csuros argument. Alternatively, the user can provide a vector of coefficients to be applied through the Combination argument.
If identifiability is satisfied for all pairs of individuals, all criteria are computed and returned in a matrix. If identifiability is not guaranteed for all pairs, the function will return a matrix with NA entries for the potentially non-estimable pairs. This by-default behavior can be bypassed if required by setting ShowIdentifiable to FALSE.
Fabien Laporte, 'UMR Genetique Quantitative et Evolution' INRA France.
Csuros M (2014) Non-identifiability of identity coefficients at biallelic loci. Theoretical Population Biology 92: 22-24.
require('Relatedness') data(Delta) RelatednessComb <- RelComb(Combination='simple relatedness', Delta, ShowIdentifiable = TRUE) print(RelatednessComb)
require('Relatedness') data(Delta) RelatednessComb <- RelComb(Combination='simple relatedness', Delta, ShowIdentifiable = TRUE) print(RelatednessComb)