Microarray analysis Microarray experiments were performed as dual-color hybridizations

sets available on ImmunoDB, aligned them with FSA ), and generated profile HMMs with HMMER 3.0. We also include an alignment of the NIM domain from Nimrod-like protein, which is missing from ImmunoDB. We then searched each of these 25 immunerelated profile-HMMs, against all Nasonia proteins using hmmsearch. We corrected e-values to reflect the fact that we did 25 separate searches, and then retained all hits with corrected Evalues, = 1. In the few cases where multiple searches hit the same protein, we retained the hit with the smallest E-value. Combining all three sources of evidence, we identified a total of 497 immune genes based on homology, including 32 from previous computational screens for antimicrobial peptides, 106 based on D. melanogaster orthologs, and 361 based on profile HMMs. The full list, including expression and functional assignments, is available at 8 The Infection-Induced Transcriptome of Nasonia non-insect arthopod genomes, and 5 non-arthropod outgroup genomes. For the blastp search, we first softmasked the blastdb using SEG, and then used standard blastp with an e-value cutoff of 0.001. For cases where there are multiple isoforms of a gene, we include all of them but then collapse hits across isoforms, preserving the hits with the lowest e-values. We define phylogenetic strata as the deepest node in the branch at which at least one detectable homolog is present, where a detectable homolog is defined as a blastp hit where the alignment covers at least 50% of the shorter protein with at least 30% positives. This measure allows for gapiness in the pattern of homology, so a Nasonia protein with a hit to a single non-arthropod genome would be defined as “Metazoan”even if it were absent from all arthropod genomes other than Nasonia. This biases our results towards assuming deeper origins of genes than may be the case, but we consider this property desirable as it makes our analyses conservative. We also define paralogs as pairs of Nasonia genes with reciprocal blastp hits with e-values, = 0.0001, at least 30% positives, alignment length of at least 50% of 20571074 the longer protein, and an e-value more significant than the most significant non-Hymenopteran hit to the same protein. To identify the members of a given paralogous family, we use a graph-based algorithm implemented in Perl using the Graph module, as follows: first, we convert blast hits to a graph, by creating edges between any pair of genes that are reciprocally connected by blast hits as defined above. Next, we extract the connected components of this overall graph. For each connected component with more than 2 members, we search for edges that if removed would increase the number of connected components, and remove the one with the lowest bitscore until either no more bridges exist or all subgraphs are of size 1 or 2. We then define a gene family as the members of each JW 55 subgraph. Signal peptide presence. We define the presence of a signal peptide for each gene in the Nasonia proteome using signalp with the following options: -f short -s notm -t euk -c 70 -M 10. We analyze all isoforms of each protein, and consider a gene as having a signal peptide if any isoform has evidence for a signal peptide. Characterizing the properties of proteins encoded by highly induced genes. To understand the properties of highly induced genes encoding short proteins, we 2435173 used several tools designed to characterize different protein properties. We computed net charge and hydrophobicity ba