Since building the pipeline to analyze the alleles of an individual, I’ve been waiting anxiously to actually put it to the test, and see if any variables need to be adjusted. Of the 80 samples that we had sent out to be sequenced however, only 29 were successful. This means that rather than being halfway through the total of 160 samples that had been collected, we are only around 1/5th of the way through. Despite this, analysis has begun on what samples we have. Because of the low number of samples that were successfully sequenced however, we will be unable to draw any meaningful conclusions at this stage.
Preliminary analysis reveals that most of the individuals seem to contain around 3 alleles of MHC. As the data was only received less than a week ago, analysis of individuals is still ongoing, and nothing can be said of the population though. Testing with these new individuals has lead to minor tweaking of the variables used to trim the hierarchical cluster, as visual analysis did not quite agree with the expected results. This blog will be updated once final results are received.
Further work is needed to amplify not only the second half of the samples, but also to re-try those samples which had failed to be sequenced. Once these are done, further minor tweaking of the pipeline may be required as these new sequences are passed through–for the most part however, this part of the project is done, and running sequences through should be trivial. In order to get a better feel for what different alleles do to the structure of the MHC protein, 3d modelling will be done, to reveal what conformational changes different residues can cause. I hope to be able to work not only on getting all of the sequencing to work, but also on the 3d modeling of the proteins as well.
This project has taught me a great deal about bioinformatics and higher mathematical tools for analyses that I would never have learned about in my regular curriculum. By taking the time to truly understand how each of these tools and algorithms work, rather than just following directions for their use, I’ve gained a great appreciation for the complexities of these technologies. Most of the problems involved with analysis are what are known as NP-hard problems, which cannot be solved within the lifetime of the universe if the data being analyzed is of sufficient size. Creative solutions are needed however to determine shortcuts which arrive at close-to-optimal solutions for these problems, and it is through the understanding of the current algorithms that are in use that new ones may be discovered. Hopefully someday soon I will be contributing new ways of solving these problems to the growing frontier in this field.