<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "https://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.1" specific-use="sps-1.9" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">rbz</journal-id>
			<journal-title-group>
				<journal-title>Revista Brasileira de Zootecnia</journal-title>
				<abbrev-journal-title abbrev-type="publisher">R. Bras. Zootec.</abbrev-journal-title>
			</journal-title-group>
			<issn pub-type="ppub">1516-3598</issn>
			<issn pub-type="epub">1806-9290</issn>
			<publisher>
				<publisher-name>Sociedade Brasileira de Zootecnia</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="other">00404</article-id>
			<article-id pub-id-type="doi">10.37496/rbz5120210131</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Breeding and genetics</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>pedSimulate – An R package for simulating pedigree, genetic merit, phenotype, and genotype data</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<contrib-id contrib-id-type="orcid">0000-0003-0339-5442</contrib-id>
					<name>
						<surname>Nilforooshan</surname>
						<given-names>Mohammad Ali</given-names>
					</name>
					<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
					<xref ref-type="corresp" rid="c01"><sup>*</sup></xref>
				</contrib>
			</contrib-group>
			<aff id="aff1">
				<label>1</label>
				<institution content-type="orgname">Livestock Improvement Corporation</institution>
				<addr-line>
					<named-content content-type="city">Newstead</named-content>
				</addr-line>
				<country country="NZ">New Zealand</country>
				<institution content-type="original">Livestock Improvement Corporation, Newstead, New Zealand.</institution>
			</aff>
			<author-notes>
				<corresp id="c01">
					<label>*</label> Corresponding author: <email>mohammad.nilforooshan@lic.co.nz</email>
				</corresp>
				<fn fn-type="conflict">
					<p>Conflict of Interest</p>
					<p>The author declares no conflict of interest.</p>
				</fn>
				<fn fn-type="con">
					<p>Author Contributions</p>
					<p>Conceptualization: M.A. Nilforooshan. Data curation: M.A. Nilforooshan. Formal analysis: M.A. Nilforooshan. Investigation: M.A. Nilforooshan. Methodology: M.A. Nilforooshan. Resources: M.A. Nilforooshan. Software: M.A. Nilforooshan. Writing-original draft: M.A. Nilforooshan. Writing-review &amp; editing: M.A. Nilforooshan.</p>
				</fn>
			</author-notes>
			<pub-date date-type="pub" publication-format="electronic">
				<day>23</day>
				<month>09</month>
				<year>2022</year>
			</pub-date>
			<pub-date date-type="collection" publication-format="electronic">
				<year>2022</year>
			</pub-date>
			<volume>51</volume>
			<elocation-id>e20210131</elocation-id>
			<history>
				<date date-type="received">
					<day>8</day>
					<month>07</month>
					<year>2021</year>
				</date>
				<date date-type="accepted">
					<day>28</day>
					<month>05</month>
					<year>2022</year>
				</date>
			</history>
			<permissions>
				<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/" xml:lang="en">
					<license-p> This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. </license-p>
				</license>
			</permissions>
			<abstract>
				<title>ABSTRACT</title>
				<p>This study aimed to introduce R package pedSimulate, which was built to simulate pedigree, genetic merit, phenotype, and genotype data. These are amongst the most important data types that animal breeders and quantitative geneticists deal with. Twenty pedigrees with ten generations were simulated applying different combinations of three parameters: genetic variance (10 vs. 20), proportion of males selected (10 vs. 20%), and the pattern for selecting females (random, positively, or negatively based on own phenotype or parent average). Males were selected positively based on parent average. Consequently, assortative mating was applied to the pedigrees in which females were positively selected based on their own phenotype or parent average. Disassortative mating was applied to the pedigrees in which females were selected negatively based on phenotype or parent averages. Genetic gain and response to selection over generations were positive for all the pedigrees due to high selection intensity on males, mating each male with multiple females, and moderate to high heritability (0.25 and 0.40 for genetic variances 10 and 20, and the residual variance of 30). Genetic variance showed a slightly increasing trend over generations by assortative mating and lower selection intensity on males. Selection intensity on females was the same in all the pedigrees. This study provided examples of how R package pedSimulate can be adopted for pedigree, genetic merit, phenotype, and genotype data simulation in animal breeding studies. By using different functions and combining different parameters for their arguments, many scenarios can be simulated by R package pedSimulate.</p>
			</abstract>
			<kwd-group xml:lang="en">
				<kwd>assortative</kwd>
				<kwd>disassortative</kwd>
				<kwd>random</kwd>
				<kwd>selection</kwd>
				<kwd>simulation</kwd>
			</kwd-group>
			<counts>
				<fig-count count="2"/>
				<table-count count="2"/>
				<equation-count count="0"/>
				<ref-count count="13"/>
			</counts>
		</article-meta>
	</front>
	<body>
		<sec sec-type="intro">
			<title>1. Introduction</title>
			<p>When dealing with real-world problems, using real data is preferred. However, due to data ownership and regulations, researchers do not always have access to these data. On the other hand, testing specific hypotheses requires specific data, usually simulated. The reason for this is that, unlike real data, which is influenced by many effects, many of them uncontrolled and unknown, simulated data provide a controlled environment suitable for testing the hypothesis.</p>
			<p>Pedigree and phenotype data are critical information in animal breeding and genetics. For decades, this information has been used to estimate genetic merit of animals using the BLUP methodology (<xref ref-type="bibr" rid="B6">Henderson et al., 1959</xref>). In the last decade, the livestock breeding industry has been revolutionized by the use of genomic information. New methods have incorporated pedigree, phenotype, and genotype information in genetic evaluations (e.g., <xref ref-type="bibr" rid="B1">Aguilar et al., 2010</xref>; <xref ref-type="bibr" rid="B3">Fernando et al., 2014</xref>).</p>
			<p>There is a limited number of free and open-source software that simulate pedigree, phenotype, and genotype data. The existing softwares are designed and developed for different purposes and are not very adaptable for general users’ needs. Consequently, each software has its specific functionalities and, based on what the user needs to achieve, one is preferred over the other. Some of the software that are published or available in an official public repository are: GenoSim (<xref ref-type="bibr" rid="B7">Jorjani, 2009</xref>), written in Fortran90/95; PedigreeSim (<xref ref-type="bibr" rid="B12">Voorrips and Maliepaard, 2012</xref>), written in Java; and R packages SimRVPedigree (<xref ref-type="bibr" rid="B9">Nieuwoudt and Graham, 2020</xref>) and AlphaSimR (<xref ref-type="bibr" rid="B4">Gaynor et al., 2021</xref>).</p>
			<p>R package pedSimulate (<xref ref-type="bibr" rid="B10">Nilforooshan, 2022a</xref>) is a convenient tool for simulating pedigree, genetic merit, phenotype, and genotype data. It provides various parameters (function arguments) for tuning the simulation, including the population size of the founder animals, additive genetic and residual variances, litter size, mortality rate, generation overlap for males and females, selection intensity for males and females, selection criteria for males and females, and allele frequency and mutation rate at each genetic marker.</p>
			<p>The objective of this study was to introduce R package pedSimulate and its functionalities. Examples of random, different assortative, and disassortative matings, different selection patterns, and various selection intensities for males and females were presented, and the results of the different scenarios were compared. Functions for genotype simulation and tracing close matings (full-sibs, half-sibs, and parent-progeny) in the population were presented with small examples.</p>
		</sec>
		<sec sec-type="materials|methods">
			<title>2. Material and Methods</title>
			<sec>
				<title>2.1. Simulated data</title>
				<p>R package pedSimulate (<xref ref-type="bibr" rid="B10">Nilforooshan, 2022a</xref>) was used in this study. The primary purpose of R package pedSimulate is simulating pedigree, (true) genetic merit, phenotype, and genotype data. It is equipped with seven functions. The function “simulatePed” is used for non-genotype data simulation. It starts with a base population (F0) with an even user-defined (UD) size. The F0 is equally divided into males and females. Users can define the number of generations to be simulated after F0. No premature mortality, selection, and non-random mating are imposed on this generation. Other UD parameters are the genetic and residual variances (V<sub>A</sub>, V<sub>E</sub>) and litter size. If no litter size is defined, it is by default equal to 1. The F0 population reproduces the F1 generation. Other UD/default parameters are imposed on F1 onwards (default values are used unless changed by the user, as described in the package user manual (<xref ref-type="bibr" rid="B10">Nilforooshan, 2022a</xref>)). Mortality is imposed in two stages, pre- and post-maturity. Obviously, in pre-maturity, the individual has no chance of reproduction. The post-mature mortality is imposed via generation overlap, which can be set to different numbers for different sexes. After imposing mortality (premature mortality and generation overlaps for males and females), available males and females (selection candidates) undergo different patterns (random and non-random) and rates of selection. Then, the selected males and females are mated either in the same order that they have been selected or in a different order (e.g., random). Different combinations of selection and mating patterns shape various forms of random, assortative, and disassortative mating.</p>
				<p>The simulated pedigree has nine columns: Animal ID, Sire ID, Dam ID, Sex, Generation, Parent Average (PA), Mendelian Sampling (MS), Environmental plus residual effects (E), and Phenotype (P). The PA of an individual is the average of its parents’ true genetic merit, and the true genetic merit (TBV) of an individual equals PA + MS. As F0 individuals have no parent information, their PA is considered zero. For F0, the MS term is drawn from an N(0, IV<sub>A</sub>) distribution rather than an N(0, ½IV<sub>A</sub>) distribution due to V<sub>PA</sub> = 0. The MS variance is half the additive genetic variance in the base population (V<sub>MS</sub> = ½V<sub>A</sub>; <xref ref-type="bibr" rid="B2">Bijma and Rutten, 2002</xref>). The E term is drawn from an N(0, IV<sub>E</sub>) distribution, in which V<sub>E</sub> is assumed to be constant across generations. The phenotype (P) of an individual is calculated in two steps: P = PA + MS + E and P = P + <italic>μ</italic>, in which <italic>μ</italic> = –2Min(P). Users can rebase and rescale phenotypes to the desired mean (<italic>μ</italic><sub>Ṕ</sub>) and standard deviation (<italic>σ</italic><sub>Ṕ</sub>) using the conversion formula Ṕ = ((P – <italic>μ</italic><sub>P</sub>)<italic>σ</italic><sub>Ṕ</sub>/<italic>σ</italic><sub>P</sub>) + <italic>μ</italic><sub>Ṕ</sub>.</p>
				<p>As the simulatePed function does not include genotype information, the MS term is simply sampled from a normal distribution. A better practice of determining MS, rather than simulating it, is calculating it via simulated QTL and linkage disequilibrium (e.g., <xref ref-type="bibr" rid="B8">Jorjani et al., 1997</xref>). This way, the MS term is a direct function of the sampled alleles in the gametic phase and their effect.</p>
				<p>Five selection (ordering) and mating patterns [random, positively or negatively based on P (P, −P), positively or negatively based on PA (PA, −PA)] are considered, which can differ or be the same between males and females and between selection and mating. The PA and −PA options are considered for simulating selection in favor or against a trait early in life, before own or progeny phenotypes are observed.</p>
				<p>Twenty pedigrees were simulated (<xref ref-type="bibr" rid="B11">Nilforooshan, 2022b</xref>) with different combinations of V<sub>A</sub> (10 and 20), proportion of males selected (10% and 20%), and selection patterns of females (random, P, −P, PA, and −PA) (<xref ref-type="table" rid="t1">Table 1</xref>). Other parameters were the same across the pedigrees. No inputs were provided to the arguments defining the mating order of females and males (“f.order” and “m.order”). Consequently, by default, mating order was considered the same as the selection order. For each pedigree, nine generations were simulated, followed by a simulated F0 with a size of 100 individuals, litter size = 2, V<sub>E</sub> = 30, (premature) mortality rate = 0 (default), generation overlap of 1, 80% of females were selected, and males were selected (10% or 20%) and ordered for mating based on PA.</p>
				<p>
					<table-wrap id="t1">
						<label>Table 1</label>
						<caption>
							<title>Differences in the parameters used to simulate the pedigrees (PED1-20)</title>
						</caption>
						<table frame="hsides" rules="groups">
							<colgroup width="20%">
								<col/>
								<col/>
								<col/>
								<col/>
								<col/>
								<col/>
							</colgroup>
							<thead>
								<tr>
									<th align="left" rowspan="3" style="font-weight:normal"><italic>fsel</italic></th>
									<th colspan="2" style="font-weight:normal">V<sub>A</sub> = 10</th>
									<th colspan="2" style="font-weight:normal">V<sub>A</sub> = 20</th>
								</tr>
								<tr>
									<th colspan="2" style="font-weight:normal">
										<hr/>
									</th>
									<th colspan="2" style="font-weight:normal">
										<hr/>
									</th>
								</tr>
								<tr>
									<th style="font-weight:normal"><italic>m%</italic> = 10</th>
									<th style="font-weight:normal"><italic>m%</italic> = 20</th>
									<th style="font-weight:normal"><italic>m%</italic> = 10</th>
									<th style="font-weight:normal"><italic>m%</italic> = 20</th>
								</tr>
							</thead>
							<tbody>
								<tr>
									<td>R</td>
									<td align="center">PED1</td>
									<td align="center">PED2</td>
									<td align="center">PED3</td>
									<td align="center">PED4</td>
								</tr>
								<tr>
									<td>P</td>
									<td align="center">PED5</td>
									<td align="center">PED6</td>
									<td align="center">PED7</td>
									<td align="center">PED8</td>
								</tr>
								<tr>
									<td>−P</td>
									<td align="center">PED9</td>
									<td align="center">PED10</td>
									<td align="center">PED11</td>
									<td align="center">PED12</td>
								</tr>
								<tr>
									<td>PA</td>
									<td align="center">PED13</td>
									<td align="center">PED14</td>
									<td align="center">PED15</td>
									<td align="center">PED16</td>
								</tr>
								<tr>
									<td>−PA</td>
									<td align="center">PED17</td>
									<td align="center">PED18</td>
									<td align="center">PED19</td>
									<td align="center">PED20</td>
								</tr>
							</tbody>
						</table>
						<table-wrap-foot>
							<fn id="TFN1">
								<p><italic>fsel</italic> - selection pattern for females; V<sub>A</sub> - additive genetic variance; <italic>m%</italic> - proportion of males selected; R - random; P - positively based on phenotype; −P - negatively based on phenotype; PA - positively based on parent average; −PA - negatively based on parent average.</p>
							</fn>
						</table-wrap-foot>
					</table-wrap>
				</p>
				<p>The following R code was used to simulate pedigrees. The simulated pedigrees and the code to analyze the data are available in a public data repository (<xref ref-type="bibr" rid="B11">Nilforooshan, 2022b</xref>).</p>

									<p>
						<inline-formula><inline-graphic xlink:href="1806-9290-rbz-51-e20210131-ii01.tif"/></inline-formula>
					</p>
				<p>For a more realistic situation, the user may consider setting some pedigree and phenotype data to missing.</p>
			</sec>
			<sec>
				<title>2.2. Analyses</title>
				<p>The simulated pedigrees were analyzed for the trends of genetic gain, response to selection, and genetic variance. The variance of the true genetic merits was considered as an indicator of the genetic variance. It was calculated for different generations and pedigrees. The average (phenotypic) response to selection, average genetic gain, and slope of the trend in genetic variance over generations were calculated starting from F1 (i.e., response to selection from F1 to F2) because in all the pedigrees, the F1 generation resulted from random mating and no selection in F0.</p>
			</sec>
			<sec>
				<title>2.3. Package functions</title>
				<sec>
					<title>2.3.1. simulatePed</title>
					<p>This function was described in subsection 2.1. Should reproducible simulations be needed, this function contains the optional argument “seed”, which receives a numeric value as input.</p>
				</sec>
				<sec>
					<title>2.3.2. appendPed</title>
					<p>Researchers might be interested in simulating future generations of an existing pedigree. Function “appendPed” is similar to function “simulatePed” with the difference that it starts from an existing pedigree. Simulation strategies are limitless, and no single software can accommodate all the possibilities. Suppose a researcher is interested in simulating selection and mating patterns not covered by the package, the simulation can be done generation by generation using function “appendPed” and applying the selection and mating patterns of interest. For example, the following R code reads the first pedigree (out of the 20 simulated pedigrees) and appends a generation to it with all the default options (litter size of 1, no premature mortality, no generation overlap, no selection, and random mating).</p>
					<p>
						<inline-formula><inline-graphic xlink:href="1806-9290-rbz-51-e20210131-ii02.tif"/></inline-formula>
					</p>
					<p>To simulate a pedigree with selection and mating patterns not provided by the package, such as selection and mating based on the estimated breeding value, the function “appendPed” should be used instead of function “simulatePed” by following these steps:</p>
					<list list-type="order">
						<list-item>
							<p>Start with an existing pedigree in the same format as a pedigree object created by the function “simulatePed” or “appendPed”.</p>
						</list-item>
						<list-item>
							<p>Calculate TBV = PA + MS.</p>
						</list-item>
						<list-item>
							<p>Estimate EBV using BLUP (no repeated records).</p>
						</list-item>
						<list-item>
							<p>Replace values in the PA column with the EBV values estimated in the previous step.</p>
						</list-item>
						<list-item>
							<p>Replace values in the MS column with TBV – EBV.</p>
						</list-item>
						<list-item>
							<p>Simulate a generation with selection based on PA.</p>
						</list-item>
						<list-item>
							<p>Continue the steps above for as many generations as needed.</p>
						</list-item>
					</list>
					<p>Should reproducible simulations be needed, this function contains the optional argument “seed”, which receives a numeric value as input.</p>
				</sec>
				<sec>
					<title>2.3.3. simulateGen</title>
					<p>This function simulates diploid and bi-allelic marker genotypes for a given pedigree. Allele frequencies and mutation rates at different marker loci are considered (default “mut.rate = 0” for no mutation). The function assumes that parents appear before progeny in the pedigree. For animals with both parents unknown, a genotype is simulated based on the UD allele frequencies. For animals with only one parent known, the following steps are taken: first, a gamete is simulated for the unknown parent; second, a gamete is produced from the genotype of the known parent; lastly, the gametes are combined into the genotype of the progeny. Mutation rate can be different from one marker to another, and it works by randomly changing an allele to its alternative. No linkage disequilibrium is modelled between markers. Should reproducible simulations be needed, this function contains the optional argument “seed”, which receives a numeric value as input.</p>
					<p>The following example simulates six marker genotypes with allele frequencies from 0.1 to 0.6 by 0.1 and no mutation for a pedigree of eight individuals. The first three columns of the pedigree (animal, sire, and dam, with no restriction on the column names) are used, and missing parents are coded 0.</p>
					<p>
						<inline-formula><inline-graphic xlink:href="1806-9290-rbz-51-e20210131-ii03.tif"/></inline-formula>
					</p>
				</sec>
				<sec>
					<title>2.3.4. appendGen</title>
					<p>This function simulates genotypes for an appended pedigree to an existing pedigree with genotypes. For example, consider the sample pedigree with eight individuals and their genotypes from subsection 2.3.3. For an appended pedigree of individuals 9 and 10, in which 9 has both parents unknown, and 10 has 9 and 7 as sire and dam, genotypes are simulated for individuals 9-10 and appended to the genotypes of individuals 1-8.</p>
					<p>
						<inline-formula><inline-graphic xlink:href="1806-9290-rbz-51-e20210131-ii04.tif"/></inline-formula>
					</p>
					<p>Should reproducible simulations be needed, this function contains the optional argument “seed”, which receives a numeric value as input.</p>
					<p>Depending on the trait and the assumptions, users might need to simulate marker effects differently regarding the number of markers with major effects, the proportion of the phenotypic variance explained by the markers, and the distribution of marker effects. The package does not simulate marker effects. Where genomic selection is simulated, the user needs to assign (allele substitution) effects to the simulated markers. For genomic selection, generations undergoing genomic selection need to be simulated one at a time followed by selecting candidates based on their marker breeding value. Here, a genomic selection scenario is briefly explained.</p>
					<list list-type="order">
						<list-item>
							<p>Simulate the first generation(s), in which genomic selection is not practiced, using “simulatedPed” function. Any selection (if any) in these generations would be translated to genomic selection after simulating their genotypes and marker effects (likely to be a random selection).</p>
						</list-item>
						<list-item>
							<p>Simulate genotypes for the pedigree simulated in the previous step, using “simulatedGen” function.</p>
						</list-item>
						<list-item>
							<p>Simulate marker effects.</p>
						</list-item>
						<list-item>
							<p>Calculate marker breeding values (G) given genotypes and marker effects.</p>
						</list-item>
						<list-item>
							<p>If markers do not completely explain the genetic variance, simulate residual polygenic effect (Δ). Draw Δ values from a N(0, A(V<sub>A</sub> – V<sub>G</sub>)) distribution, in which A is the numerator relationship matrix, and <inline-formula>
									<mml:math>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>P</mml:mi>
										</mml:msub>
										<mml:mo>=</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mrow>
												<mml:mi>P</mml:mi>
												<mml:mi>A</mml:mi>
											</mml:mrow>
										</mml:msub>
										<mml:mo>+</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mrow>
												<mml:mi>M</mml:mi>
												<mml:mi>S</mml:mi>
											</mml:mrow>
										</mml:msub>
										<mml:mo>+</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>E</mml:mi>
										</mml:msub>
										<mml:mo>=</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>A</mml:mi>
										</mml:msub>
										<mml:mo>+</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>E</mml:mi>
										</mml:msub>
										<mml:mo>=</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>G</mml:mi>
										</mml:msub>
										<mml:mo>+</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mrow>
												<mml:mi>Δ</mml:mi>
											</mml:mrow>
										</mml:msub>
										<mml:mo>+</mml:mo>
										<mml:msub>
											<mml:mi>V</mml:mi>
											<mml:mi>E</mml:mi>
										</mml:msub>
									</mml:math>
								</inline-formula>.</p>
						</list-item>
						<list-item>
							<p>Replace values in PA and MS columns with G and Δ values, respectively (the definitions of PA and MS columns are changed).</p>
						</list-item>
						<list-item>
							<p>Re-calculate column P (P = PA + MS + E).</p>
						</list-item>
						<list-item>
							<p>Simulate a new generation using “appendPed” function followed by genomic selection on the selection candidates (selection on PA (containing G)).</p>
						</list-item>
						<list-item>
							<p>Simulate new genotypes for the new generation, using “appendGen” function.</p>
						</list-item>
						<list-item>
							<p>Do steps 4-7, only for the new generation.</p>
						</list-item>
						<list-item>
							<p>Do steps 8-10 for as many generations as desired.</p>
						</list-item>
						<list-item>
							<p>Add <italic>μ</italic> (e.g., –2Min(P)) to column P.</p>
						</list-item>
					</list>
				</sec>
				<sec>
					<title>2.3.5. Functions for finding close matings</title>
					<p>Given a pedigree data frame with three columns for animal, sire, and dam IDs, functions “fs_mate_finder”, “hs_mate_finder”, and “pp_mate_finder” report full-sib, half-sib, and parent-progeny matings in the pedigree. Considering the sample pedigree with eight individuals from subsection 2.3.3, the examples are:</p>
					<p>
						<inline-formula><inline-graphic xlink:href="1806-9290-rbz-51-e20210131-ii05.tif"/></inline-formula>
					</p>
				</sec>
			</sec>
		</sec>
		<sec sec-type="results">
			<title>3. Results</title>
			<p>The simulated pedigrees had an average size of 5,124.4 individuals. Pedigrees 11 and 2 had the largest (5,718) and the smallest (4,456) sizes, respectively. Except for the parents of F0, no information was missing.</p>
			<p>Pedigrees 15 and 18 showed the highest and the lowest slopes of genetic trend, respectively (<xref ref-type="fig" rid="f01">Figure 1</xref>). Those pedigrees differed in negatively and positively selecting and ordering (for mating) females based on PA, genetic variance, and proportion of males selected. Assortative mating, higher genetic variance, and higher selection intensity on males (same for females across all the pedigrees) increased the slope of the genetic trend. Selection based on PA was more effective than selection based on P. All the pedigrees showed a positive genetic trend due to the high-intensity selection of males, positively based on PA.</p>
			<p>
				<fig id="f01">
					<label>Figure 1</label>
					<caption>
						<title>Genetic trend over generations for the simulated pedigrees (PED1-20).</title>
					</caption>
					<graphic xlink:href="1806-9290-rbz-51-e20210131-gf01.tif"/>
					<attrib>Average genetic gain from generation 1 to 9 is given at the top-left corner of each plot.</attrib>
				</fig>
			</p>
			<p>The trends of genetic (true genetic merit) variance over generations are shown in <xref ref-type="fig" rid="f02">Figure 2</xref>, in which the slope of the trend from F1 to F9 is presented within each pertaining plot (matings were random in F0 for all pedigrees). Genetic variances for the pedigrees simulated with V<sub>A</sub> = 20 were almost twice as those for the pedigrees simulated with V<sub>A</sub> = 10. No clear trend was observed in response to the proportion of males selected and selection patterns in females. However, a lower rate of male selection and assortative matings showed tendencies toward increasing the genetic variance.</p>
			<p>
				<fig id="f02">
					<label>Figure 2</label>
					<caption>
						<title>Genetic (true breeding value) variance trends over generations for the simulated pedigrees (PED1-20).</title>
					</caption>
					<graphic xlink:href="1806-9290-rbz-51-e20210131-gf02.tif"/>
					<attrib>The slope of each trend from generation 1 to 9 is given at the top-left corner of each plot.</attrib>
				</fig>
			</p>
			<p>Generally, the patterns for the average response to selection over generations (starting from F1) (<xref ref-type="table" rid="t2">Table 2</xref>) were similar to those observed in <xref ref-type="fig" rid="f01">Figure 1</xref>. Higher genetic variance, higher selection intensity on males (same for females across all the pedigrees), and assortative mating increased the response to selection. Compared with assortative and random matings, response to selection was reduced by disassortative matings.</p>
			<p>
				<table-wrap id="t2">
					<label>Table 2</label>
					<caption>
						<title>Average response to selection over nine generations for the simulated pedigrees1</title>
					</caption>
					<table frame="hsides" rules="groups">
						<colgroup width="20%">
							<col/>
							<col/>
							<col/>
							<col/>
							<col/>
							<col/>
						</colgroup>
						<thead>
							<tr>
								<th align="left" rowspan="3" style="font-weight:normal"><italic>fsel</italic></th>
								<th colspan="2" style="font-weight:normal">V<sub>A</sub> = 10</th>
								<th colspan="2" style="font-weight:normal">V<sub>A</sub> = 20</th>
							</tr>
							<tr>
								<th colspan="2" style="font-weight:normal">
									<hr/>
								</th>
								<th colspan="2" style="font-weight:normal">
									<hr/>
								</th>
							</tr>
							<tr>
								<th style="font-weight:normal"><italic>m%</italic> = 10</th>
								<th style="font-weight:normal"><italic>m%</italic> = 20</th>
								<th style="font-weight:normal"><italic>m%</italic> = 10</th>
								<th style="font-weight:normal"><italic>m%</italic> = 20</th>
							</tr>
						</thead>
						<tbody>
							<tr>
								<td>R</td>
								<td align="center">0.981</td>
								<td align="center">0.925</td>
								<td align="center">1.885</td>
								<td align="center">1.351</td>
							</tr>
							<tr>
								<td>P</td>
								<td align="center">1.734</td>
								<td align="center">1.520</td>
								<td align="center">2.490</td>
								<td align="center">1.992</td>
							</tr>
							<tr>
								<td>−P</td>
								<td align="center">0.750</td>
								<td align="center">0.753</td>
								<td align="center">1.149</td>
								<td align="center">1.001</td>
							</tr>
							<tr>
								<td>PA</td>
								<td align="center">1.654</td>
								<td align="center">1.339</td>
								<td align="center">2.521</td>
								<td align="center">2.466</td>
							</tr>
							<tr>
								<td>−PA</td>
								<td align="center">0.839</td>
								<td align="center">0.715</td>
								<td align="center">1.076</td>
								<td align="center">0.985</td>
							</tr>
						</tbody>
					</table>
					<table-wrap-foot>
						<fn id="TFN2">
							<p><sup>1</sup> The values correspond to the pedigree numbers in Table 1.</p>
						</fn>
						<fn id="TFN3">
							<p><italic>fsel</italic> - selection pattern for females; V<sub>A</sub> - additive genetic variance; <italic>m%</italic> - proportion of males selected; R - random; P - positively based on phenotype; −P - negatively based on phenotype; PA - positively based on parent average; −PA - negatively based on parent average.</p>
						</fn>
					</table-wrap-foot>
				</table-wrap>
			</p>
		</sec>
		<sec sec-type="discussion">
			<title>4. Discussion</title>
			<p>In all the simulated pedigrees, males were selected and ordered (for mating) based on their PA. Selection intensities on males were high, and each male was mated to several (8 or 4, for 10% or 20% of males selected) females. These guaranteed genetic gain (<xref ref-type="fig" rid="f01">Figure 1</xref>) and positive response to selection (<xref ref-type="table" rid="t2">Table 2</xref>) even with disassortative mating, in which females were selected and ordered for mating based on −P or −PA. Compared to selection on PA, selection on P is equivalent to selection on PA + MS + E. The MS term increases the accuracy of selection (higher genetic gain), and the E term reduces the accuracy of selection. Theoretically, as long as V<sub>MS</sub> &lt; V<sub>E</sub> (i.e., V<sub>A</sub> &lt; 2V<sub>E</sub>), selection on PA results in more genetic gain than selection on P. Assortative mating increased the genetic gain and response to selection (<xref ref-type="fig" rid="f01">Figure 1</xref> and <xref ref-type="table" rid="t2">Table 2</xref>), especially with higher genetic variance and higher selection intensity on males (same for females across all the pedigrees).</p>
			<p>In all the pedigrees, the differences between F0 and F1 were random and not due to any selection and mating strategy, because the F0 individuals randomly and equally contributed to F1. There were high fluctuations in the trends of genetic (true genetic merit) variance in the first few generations (<xref ref-type="fig" rid="f02">Figure 2</xref>) due to the small number of parents in those generations.</p>
			<p>Compared with the disassortative mating scenarios, the assortative mating scenarios showed an increasing trend in the genetic variance (<xref ref-type="fig" rid="f02">Figure 2</xref>). This was in agreement with previous studies (<xref ref-type="bibr" rid="B13">Wright, 1921</xref>; <xref ref-type="bibr" rid="B8">Jorjani et al., 1997</xref>; <xref ref-type="bibr" rid="B5">Hayashi, 1998</xref>), in which assortative mating was suggested to increase heritability and additive genetic variance by increasing linkage disequilibrium.</p>
		</sec>
		<sec sec-type="conclusions">
			<title>5. Conclusions</title>
			<p>There is a shortage of free and open-source software for simulating pedigree, phenotype, and genotype data. Most of the available simulation software are too specialized and not flexible or user-friendly. Although advanced, they may not deliver what most users might need, a general-purpose, user-friendly, and flexible software. The R package pedSimulate was developed to fill the shortage of free, open-source, and user-friendly software for simulating pedigree, genetic merit, phenotype, and genotype data. The package is produced in the R ecosystem, a popular programming language among students, scientists, and researchers. It provides several functions equipped with a range of arguments, thus providing the possibility to simulate many different scenarios through the combination of parameters. As an open-source package, researchers can modify the source code should their simulation have specific needs not covered by the package. R package pedSimulate is a user-friendly tool for general-purpose data simulations in animal breeding and genetics.</p>
		</sec>
	</body>
	<back>
		<ack>
			<title>Acknowledgments</title>
			<p>The author thanks Livestock Improvement Corporation (Hamilton, New Zealand) for the financial support for the article publication fee.</p>
		</ack>
		<ref-list>
			<title>References</title>
			<ref id="B1">
				<mixed-citation>Aguilar, I.; Misztal, I.; Johnson, D. L.; Legarra, A.; Tsuruta, S. and Lawlor, T. J. 2010. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93:743-752. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3168/jds.2009-2730">https://doi.org/10.3168/jds.2009-2730</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Aguilar</surname>
							<given-names>I.</given-names>
						</name>
						<name>
							<surname>Misztal</surname>
							<given-names>I.</given-names>
						</name>
						<name>
							<surname>Johnson</surname>
							<given-names>D. L.</given-names>
						</name>
						<name>
							<surname>Legarra</surname>
							<given-names>A.</given-names>
						</name>
						<name>
							<surname>Tsuruta</surname>
							<given-names>S.</given-names>
						</name>
						<name>
							<surname>Lawlor</surname>
							<given-names>T. J.</given-names>
						</name>
					</person-group>
					<year>2010</year>
					<article-title>Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score</article-title>
					<source>Journal of Dairy Science</source>
					<volume>93</volume>
					<fpage>743</fpage>
					<lpage>752</lpage>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3168/jds.2009-2730">https://doi.org/10.3168/jds.2009-2730</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B2">
				<mixed-citation>Bijma, P. and Rutten, M. 2002. Lecture notes for the SelAction workshop. Available at: <ext-link ext-link-type="uri" xlink:href="https://www.wur.nl/en/Research-Results/Chair-groups/Animal-Sciences/Animal-Breeding-and-Genomics-Group/Research/Software.htm">https://www.wur.nl/en/Research-Results/Chair-groups/Animal-Sciences/Animal-Breeding-and-Genomics-Group/Research/Software.htm</ext-link>. Accessed on: June 30, 2021.</mixed-citation>
				<element-citation publication-type="webpage">
					<person-group person-group-type="author">
						<name>
							<surname>Bijma</surname>
							<given-names>P.</given-names>
						</name>
						<name>
							<surname>Rutten</surname>
							<given-names>M.</given-names>
						</name>
					</person-group>
					<year>2002</year>
					<source>Lecture notes for the SelAction workshop</source>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://www.wur.nl/en/Research-Results/Chair-groups/Animal-Sciences/Animal-Breeding-and-Genomics-Group/Research/Software.htm">https://www.wur.nl/en/Research-Results/Chair-groups/Animal-Sciences/Animal-Breeding-and-Genomics-Group/Research/Software.htm</ext-link>
					</comment>
					<date-in-citation content-type="access-date">Accessed on: June 30, 2021</date-in-citation>
				</element-citation>
			</ref>
			<ref id="B3">
				<mixed-citation>Fernando, R. L.; Dekkers, J. C. M. and Garrick, D. J. 2014. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genetics Selection Evolution 46:50. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1297-9686-46-50">https://doi.org/10.1186/1297-9686-46-50</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Fernando</surname>
							<given-names>R. L.</given-names>
						</name>
						<name>
							<surname>Dekkers</surname>
							<given-names>J. C. M.</given-names>
						</name>
						<name>
							<surname>Garrick</surname>
							<given-names>D. J.</given-names>
						</name>
					</person-group>
					<year>2014</year>
					<article-title>A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses</article-title>
					<source>Genetics Selection Evolution</source>
					<volume>46</volume>
					<size units="pages">50</size>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1297-9686-46-50">https://doi.org/10.1186/1297-9686-46-50</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B4">
				<mixed-citation>Gaynor, R. C.; Gorjanc, G. and Hickey, J. M. 2021. AlphaSimR: an R package for breeding program simulations. G3 Genes|Genomes|Genetics 11:jkaa017. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/g3journal/jkaa017">https://doi.org/10.1093/g3journal/jkaa017</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Gaynor</surname>
							<given-names>R. C.</given-names>
						</name>
						<name>
							<surname>Gorjanc</surname>
							<given-names>G.</given-names>
						</name>
						<name>
							<surname>Hickey</surname>
							<given-names>J. M.</given-names>
						</name>
					</person-group>
					<year>2021</year>
					<article-title>AlphaSimR: an R package for breeding program simulations</article-title>
					<source>G3 Genes|Genomes|Genetics</source>
					<volume>11</volume>
					<comment>jkaa017</comment>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/g3journal/jkaa017">https://doi.org/10.1093/g3journal/jkaa017</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B5">
				<mixed-citation>Hayashi, T. 1998. Genetic variance under assortative mating in the infinitesimal model. Genes &amp; Genetic Systems 73:397-405. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1266/ggs.73.397">https://doi.org/10.1266/ggs.73.397</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Hayashi</surname>
							<given-names>T.</given-names>
						</name>
					</person-group>
					<year>1998</year>
					<article-title>Genetic variance under assortative mating in the infinitesimal model</article-title>
					<source>Genes &amp; Genetic Systems</source>
					<volume>73</volume>
					<fpage>397</fpage>
					<lpage>405</lpage>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1266/ggs.73.397">https://doi.org/10.1266/ggs.73.397</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B6">
				<mixed-citation>Henderson, C. R.; Kempthorne, O.; Searle, S. R. and von Krosigk, C. M. 1959. The estimation of environmental and genetic trends from records subject to culling. Biometrics 15:192-218. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2307/2527669">https://doi.org/10.2307/2527669</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Henderson</surname>
							<given-names>C. R.</given-names>
						</name>
						<name>
							<surname>Kempthorne</surname>
							<given-names>O.</given-names>
						</name>
						<name>
							<surname>Searle</surname>
							<given-names>S. R.</given-names>
						</name>
						<name>
							<surname>von Krosigk</surname>
							<given-names>C. M.</given-names>
						</name>
					</person-group>
					<year>1959</year>
					<article-title>The estimation of environmental and genetic trends from records subject to culling</article-title>
					<source>Biometrics</source>
					<volume>15</volume>
					<fpage>192</fpage>
					<lpage>218</lpage>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2307/2527669">https://doi.org/10.2307/2527669</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B7">
				<mixed-citation>Jorjani, H. 2009. A general genomics simulation program. Interbull Bulletin 40:202-206.</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Jorjani</surname>
							<given-names>H.</given-names>
						</name>
					</person-group>
					<year>2009</year>
					<article-title>A general genomics simulation program</article-title>
					<source>Interbull Bulletin</source>
					<volume>40</volume>
					<fpage>202</fpage>
					<lpage>206</lpage>
				</element-citation>
			</ref>
			<ref id="B8">
				<mixed-citation>Jorjani, H.; Engström, G.; Strandberg, E. and Liljedahl, L.-E. 1997. Genetic studies of assortative mating—a simulation study. II. Assortative mating in unselected populations. Acta Agriculturae Scandinavica, Section A — Animal Science 47:74-81. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/09064709709362373">https://doi.org/10.1080/09064709709362373</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Jorjani</surname>
							<given-names>H.</given-names>
						</name>
						<name>
							<surname>Engström</surname>
							<given-names>G.</given-names>
						</name>
						<name>
							<surname>Strandberg</surname>
							<given-names>E.</given-names>
						</name>
						<name>
							<surname>Liljedahl</surname>
							<given-names>L.-E.</given-names>
						</name>
					</person-group>
					<year>1997</year>
					<article-title>Genetic studies of assortative mating—a simulation study. II. Assortative mating in unselected populations</article-title>
					<source>Acta Agriculturae Scandinavica, Section A — Animal Science</source>
					<volume>47</volume>
					<fpage>74</fpage>
					<lpage>81</lpage>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/09064709709362373">https://doi.org/10.1080/09064709709362373</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B9">
				<mixed-citation>Nieuwoudt, C. and Graham, J. 2020. SimRVPedigree: Simulate Pedigrees Ascertained for a Rare Disease. version 0.4.4. Available at: &lt;<ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=SimRVPedigree&gt;">https://cran.r-project.org/package=SimRVPedigree&gt;</ext-link>. Accessed on: May 1, 2022.</mixed-citation>
				<element-citation publication-type="webpage">
					<person-group person-group-type="author">
						<name>
							<surname>Nieuwoudt</surname>
							<given-names>C.</given-names>
						</name>
						<name>
							<surname>Graham</surname>
							<given-names>J.</given-names>
						</name>
					</person-group>
					<year>2020</year>
					<source>SimRVPedigree: Simulate Pedigrees Ascertained for a Rare Disease. version 0.4.4</source>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=SimRVPedigree&gt;">https://cran.r-project.org/package=SimRVPedigree&gt;</ext-link>
					</comment>
					<date-in-citation content-type="access-date">Accessed on: May 1, 2022</date-in-citation>
				</element-citation>
			</ref>
			<ref id="B10">
				<mixed-citation>Nilforooshan, M. A. 2022a. pedSimulate: Pedigree, Genetic Merit, Phenotype, and Genotype Simulation. version 1.3.2. Available at: &lt;<ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=pedSimulate&gt;">https://cran.r-project.org/package=pedSimulate&gt;</ext-link>. Accessed on: May 1, 2022.</mixed-citation>
				<element-citation publication-type="webpage">
					<person-group person-group-type="author">
						<name>
							<surname>Nilforooshan</surname>
							<given-names>M. A.</given-names>
						</name>
					</person-group>
					<year>2022a</year>
					<source>pedSimulate: Pedigree, Genetic Merit, Phenotype, and Genotype Simulation. version 1.3.2</source>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=pedSimulate&gt;">https://cran.r-project.org/package=pedSimulate&gt;</ext-link>
					</comment>
					<date-in-citation content-type="access-date">Accessed on: May 1, 2022</date-in-citation>
				</element-citation>
			</ref>
			<ref id="B11">
				<mixed-citation>Nilforooshan, M. A. 2022b. Twenty simulated pedigrees with different combinations of three parameters using R package pedSimulate. Mendeley Data, v2. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17632/c4pv8w8pmp.2">https://doi.org/10.17632/c4pv8w8pmp.2</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Nilforooshan</surname>
							<given-names>M. A.</given-names>
						</name>
					</person-group>
					<year>2022b</year>
					<article-title>Twenty simulated pedigrees with different combinations of three parameters using R package pedSimulate</article-title>
					<source>Mendeley Data</source>
					<volume>2</volume>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17632/c4pv8w8pmp.2">https://doi.org/10.17632/c4pv8w8pmp.2</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B12">
				<mixed-citation>Voorrips, R. E. and Maliepaard, C. A. 2012. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics 13:248. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1471-2105-13-248">https://doi.org/10.1186/1471-2105-13-248</ext-link>
				</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Voorrips</surname>
							<given-names>R. E.</given-names>
						</name>
						<name>
							<surname>Maliepaard</surname>
							<given-names>C. A.</given-names>
						</name>
					</person-group>
					<year>2012</year>
					<article-title>The simulation of meiosis in diploid and tetraploid organisms using various genetic models</article-title>
					<source>BMC Bioinformatics</source>
					<volume>13</volume>
					<size units="pages">248</size>
					<comment>
						<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1471-2105-13-248">https://doi.org/10.1186/1471-2105-13-248</ext-link>
					</comment>
				</element-citation>
			</ref>
			<ref id="B13">
				<mixed-citation>Wright, S. 1921. Systems of mating. III. Assortative mating based on somatic resemblance. Genetics 6:144-161.</mixed-citation>
				<element-citation publication-type="journal">
					<person-group person-group-type="author">
						<name>
							<surname>Wright</surname>
							<given-names>S</given-names>
						</name>
					</person-group>
					<year>1921</year>
					<article-title>Systems of mating. III. Assortative mating based on somatic resemblance</article-title>
					<source>Genetics</source>
					<volume>6</volume>
					<fpage>144</fpage>
					<lpage>161</lpage>
				</element-citation>
			</ref>
		</ref-list>
	</back>
</article>