http://www.osastatistician.com/pics/0047t.jpg

 

William L. Welbourn, Jr., M.Sc., PKP, GKIHS, ISLP

 

Curriculum Vitae


 

 


 

 

“The problems we face cannot be solved at the same level of thinking we were at when we created them” – Albert Einstein

"Who we are today is a reflection of our actions from the past"- Dr. David Chen, Professor California State University, Fullerton
"Greatness is 1% final product, and 99% passion in the pursuit of completing it"- William Welbourn, Jr.
"Mathematics is the queen of the sciences and number theory is the queen of mathematics"- Carl Friedrich Gauss
"The excitement that a gambler feels when making a bet is equal to the amount he might win times the probability of winning it"- Blaise Pascal

Screenshot of my maxT Permutation Algorithm Being Executed Upon my Desktop Cluster of GPUs (Cochran-Armitage Linear Trend Test Employed)

 

MATHEMATICS/STATISTICS/GENETICS BACKGROUND

COMPUTATIONAL ALGORITHM DEVELOPED 3/22/2008

PHD QUALIFYING EXAM NOTES

 

 

Some information about myself:

  • Affiliations
    • Member, Utah Chapter of the American Statistical Association (ASA)
    • Member, Mathematical Association of America (MAA)
    • Member, USC Chapter of the Honor Society of Phi Kappa Phi (PKP)
    • Member, USU Chapter of the Golden Key International Honour Society (GKIHS)
    • International Scholar Laureate Program (ISLP) of GKIHS

 

  • Candidate, Doctor of Philosophy Program in Statistics (Emphasis: Statistical Genetics), August 2007 – December 2011 (Projected)
    • Utah State University
    • USU Grade Point Average (GPA): 4.0/4.0
    • Teaching Assistant, Math 1050: College Algebra, August 2007 –  May 2008
    • Passed PhD Qualifying Exam, May 2008
    • Teaching Assistant, Stat 2000: Statistical Methods, August 2008 –  May 2009
    • SAS/R Lab Instructor, Stat 5100/5200: Linear Regression and Time Series/Experimental Designs, August 2009 – December 2009
    • Recitation Leader, Stat 3000 (x3 Sections): Statistics for Scientists, January 2010 – May 2010
    • Instructor, Stat 3000: Statistics for Scientists, Summer 2010 and Summer 2011
    • Research Assistant, January 2009 – May 2009: Database Management of a genetic data warehouse (287 participants; 660,918 SNPs; ~ 8 gigabytes)
    • Research Assistant, June 2009 – July 2009: Data Management of a genetic data warehouse (11 participants; 1,145,510 SNPs; ~ 3 gigabytes)
    • Research Assistant, February 2010 – : Analyzing GWAS data set (2,035 participants; 769,672 SNPs; ~ 6 gigabytes of text files)
    • Elected to the USU Chapter of Golden Key International Honour Society, February 2009
    • Research Writing Award, Department of Mathematics and Statistics, April 17, 2009
    • Nominated to the International Scholar Laureate Program of GKIHS, December 2009
    • Passed PhD Comprehensive Exam, March 2010
    • Technology Expert Award, Department of Mathematics and Statistics, April 19, 2010
    • Successfully Defended Doctoral Dissertation Proposal and Advanced to Candidacy, May 3, 2011

 

 

  • Bachelor of Arts in Mathematics with an emphasis in Probability and Statistics, January 2001

 

  • Summited Mount Whitney (EL. 14,497.61 FT.), Sierra Nevada Mountain Range
    • August 23, 1996 – Along with my father, William Welbourn
    • August 6, 1998 – Along with my Aunt, Patti Welbourn; her friend, Cindy; my brother in-law, Rob; and my friend, Donovan
    • July 26, 1999 – Along with my Aunt, Patti Welbourn
  • Summited Half Dome (EL. 8,842 FT.), Yosemite National Park
    • July 22, 2000 – Along with my Aunt and Uncle, Patti and Eddie Welbourn

 

  • Computer System, Built 1/21/2009:
    • Motherboard:  Foxconn BloodRage, Intel X58 Chipset
    • CPU:  Intel Core i7 920 2.66GHZ 8MB L3 Cache 4.8GT/s QPI Quad Core, Supporting Hyper-Threading (i.e., handles 8+ threads simultaneously)
    • Memory:  OCZ 3GB DDR3 1600MHZ
    • Hard Drives:  Seagate SATA II  4 x 250GB RAID 0; 120GB Corsair SSD
    • OS:  Windows XP Home/Windows Vista Home Premium/Windows 7 RC - Desktop (Windows Vista, Business Edition - Laptop) – 32bit
    • Graphics:  EVGA NVIDIA GeForce GTX 470 Fermi 1.2GB (x2) in SLI mode - Cluster of GPUs (May 2010)
    • Monitor:  ASUS 24" LCD
    • Sound:  SoundBlaster X-Fi Fatal1ty Pro Series
    • Whetstone Benchmarks show a 180% increase in integer/floating point arithmetic over my prior Intel Core 2 Duo E6750 System

 

 

Past Academic News

Undergraduate

  • May 26, 2000- Received my first ever perfect score (overall course score) in a mathematics course (Math 350A- Advanced Calculus).  As a result I was the top ranked undergraduate in the course.

·       May 26, 2000- Ranked #1 in Math 375- Discrete Dynamical Systems and Chaos with an overall course score of 96%.

·       January 2, 2001- Conferral Date for the Bachelor of Arts Degree in Mathematics.

 

Graduate (Master of Science)

·       March 2004 - Admitted to the Master's Program in Biostatistics at the Los Angeles Campus of the University of California System.  Concurrently, admitted to the Master's Program in Biostatistics at the University of Southern California.

·       May 2004 - Began my Graduate Studies at USC.

·       May 2005 - Began my Master's Thesis Research - Topic: Ordered Subset Analysis (OSA).

·       April 2006 - Successfully defended my Master's Thesis and completed the requirements for the Degree, Master of Science in Biostatistics.

·       May 10, 2006 - Elected to the USC Chapter of the Honor Society of Phi Kappa Phi.

·       May 12, 2006 - Conferral Date for the Master of Science Degree in Biostatistics.

 

Graduate (Doctoral)

  • March 2007 - Admitted to the Doctor of Philosophy Program in Mathematics, Statistics Option, at Utah State University and the University of Nevada, Las Vegas.  Concurrently, each University offer included a Graduate Assistantship Award.

·       August 2007 - Began my Graduate Studies at USU.  Teaching Section 11 of Math 1050, College Algebra.

·       February 2008 - Revising MS Thesis and subsequent papers for publication.

·       May 2008 - Overall Course Score of 100.5% in Mathematical Statistics II (Stat 6720).

·       May 2008 - Passed the Mathematical Statistics PhD Qualifying Exam.

·       December 1, 2008 – Elected to the USU Chapter of the Golden Key International Honour Society.

Research Interests

  • GPU Statistical Computing; GPU clustering.
  • CUDA paper for maxT MHT procedure – 12/2009 Abstract
  • CUDA paper for clustering GPUs – 3/2010 Abstract
  • Dissertation Abstract/TOC – 10/2009  (MAIN OSA C CODE)
  • Utilize Ordered Subset Analysis (OSA) in a Genome-wide Association Study (GWAS), to locate and suggest genes which express in the same amino acid transcription/translation process (i.e., locate genes which are correlated).
  • Assign appropriate correlation magnitudes to the pairwise (and higher order) gene associations from OSA use.
  • Working on the development of a permutation based multiple hypothesis testing (MHT) procedure, analogous to maxT/minP.  However, novel algorithm will account for OSA determined gene-gene correlations.
  • Adapting OSA to a binary phenotype.  This would enable OSA to be used in place of traditional logistic regression modeling, in suggesting gene-environment interaction.
    • Integration of permutation testing, particularly the use of the tilted hypergeometric distribution (GHG) for testing non traditional null hypotheses related to the odds ratio.
    • Suggesting the use of Exact (permutation) Tests (as opposed to asymptotic tests) for GWAS MHT.  This is a crucial notion, since within MHT, statistical significance is achieved “way out” in the tail of the null distribution.  Hence, the assumption of an asymptotic sampling distribution under the null, could result in serious p-value calculation errors.
  • Major Master's Thesis Extension Document (Potential Book/The basis for my PhD Dissertation – paper currently stands at 194 pages) - Description, as of 1/4/2009:
    • Chapter 1 - Overview of what Ordered Subset Analysis (OSA) is all about, and a few examples of where it could be applied
    • Chapter 2 - Some of the underlying theory of OSA, including how one would draw inference from the empirical p-value.
    • Chapter 3 - Computational efficiency issues.  Topics include a theorem to reduce p-value computations, algorithms to search for the subset which provides maximum statistical evidence for gene-environment interaction, several algorithms for the implementation of power analyses, and maximizing computer power (i.e., parallel processing, utilizing multi-core CPU technology) in the implementation of the algorithms.
    • Chapter 4 – Numerical Justification to referenced assumptions within chapters 2 and 3.  Benchmarking Theorem 8 of the paper.
    • Chapter 5 – Simulation and Implementation of OSA into an R GUI
    • I am in the process of implementing the OSA methodology into the C programming environment (spent all of December 2008 doing this).
    • I plan to look into extending the OSA methodology to other (than linear regression) GLM's, particularly logistic regression.
    • With regard to computational aspects, I plan to integrate database management (via the SQL querying language) into the framework.  This should reduce computational timing substantially, over the use of "flat (text) file" storage techniques.
    • Integrate the OSA statistical methodology into a novel R package.

My primary interest in statistics lies within distribution theory.  My Master's Thesis, an Ordered Subset Analysis investigation (hence the "osa" in osastatistician.com), entailed computational burdensome algorithms, and during the course of the project I was able to prove an interesting result (Paper #1 above) regarding the Central Student's-t Distributions.  Namely, given any positive real value, x, and two positive integers, n1 and n2, such that n1<n2, then the value for the cumulative distribution function (at the given value of x) for the Student's-t distribution with n1 degrees of freedom is less than that for n2 degrees of freedom.  This result was an extremely important efficiency improvement for this project, as a mere 4.7% of total data throughput was seen.  The Thesis essentially entails three unique and fresh distributions for Gene-Environment Interaction Association Studies - The OSA distribution, the CEO Null Distribution, and the COO Null Distribution.

I continue to work on extending the Thesis.  As mentioned in the body of the paper, a source of error for the OSA investigation lies in the assumption of the number of random orderings chosen for each of the CEO and COO Null Distributions.  Namely, we chose to sample 10,000 random orderings, of the possible n!-1 random orderings, where n=200.  To say this is a minute sampling of random orderings for each of these Null Distributions is an exaggeration.  However, even with the choice of 10,000 random orderings, approximately six months of computational time was required to obtain full data results.  This was the due to approximately two trillion simple linear regression analyses, each requiring a p-value calculation (based on the Central Student's-t Distributions).  As it turns out, the p-value calculation is the "bottleneck" in the OSA algorithm, as far as computational timing issues are concerned.  My current research is the development of a new mathematical/statistical methodology to avoid this bottleneck (at least to a certain extent).  The methodology is rather simple and elegant, but preliminary analysis suggests this methodology requires approximately 80% less time for the OSA/CEO Null Distribution empirical p-value ascertainment, based on sampling 10,000 random orderings for the CEO Null Distribution.  This methodology entails two parts: (1) 191 Student's-t tables were created from the R (version 2.1.1) software environment, one table for each of the degrees of freedom Student's-t test statistics encountered for the OSA assumptions chosen for the Thesis, and (2) Essentially, a linear interpolation algorithm, using the values from these Student's-t tables.  The linear interpolation approximates the true p-value for a given test statistic...  Now, one could argue that we have just contradicted what we set out to do, and just introduced another source of error to the OSA.  We set out to improve the computational timing, where the end result is to increase the number of random orderings chosen.  Thus, decreasing the assumed error for the OSA.  However, preliminary statistical analyses (based on Kaplan-Meier Survival Methods), suggests very strong evidence to support the lack of introduction of error into the OSA investigation.  A most intriguing observation I found is, it can be shown that the differential misclassification bias introduced to the Empirical P-Value Estimate, from the use of the interpolation algorithm, is unidirectional.  The overall result which I intend to show here is that this new methodology is a welcomed addition to the already unique OSA Methodology I have investigated (and introduced to the Statistical World), and because of the significant increase in computational efficiency, opens new avenues of OSA enhancement.

My most recent research endeavors:

(1) The Order Statistics for the Standard Normal Distribution.  This project is designed to add credibility to the Order Statistic Simulation Methodology of my MS Thesis Appendices.  I am continuously finding interesting results unfold...  Keep coming back to this page for updates to this paper.

(2) An elegant result regarding the Central Student's-t Distributions.  This result enhances the computational efficiency for the OSA.

Over the course of the months of September and October 2006, I kept a brief Journal of my thoughts regarding the OSA expansion.  So many thoughts had filled my mind in such a short period of time, I documented my thoughts into Journal Form.  These are notions which I hope to investigate in the near future.

An ongoing project which I began back in 2000, entails stochastic processes and Dynamical Systems.  The motivation for this project was stemmed from Dr. Mario Martelli, Professor Emeritus of CSUF.  Essentially, the project entails a group of sailors confined to an island with a fixed supply of food, a stack of bananas.  The sailors, one-by-one approach the stack of bananas and take "their fair share."  The complexity of the problem originates in the "stochastic thief action" of a monkey on the island - As the sailors take their bananas, there is a certain probability that the monkey will take some banana(s) from the stack.  This project has the potential to be applied to medicine, where the sailors are human cells, the bananas are fuel for the human cells, and the monkey is a virus or bacterium.

During the Summer months of 1999, I investigated how the statistical tables in textbooks are derived.  I used numerical integration techniques, such as the Simpson Rule and the Composite Boole's Rule, to generate the Standard Normal, Chi-Square, Student's-t, and Snedecor-F statistical tables.  This research is documented in "notebook format," and a journal of the works is kept in my files.  Since the writing of this technical document, I have further investigated the Chi-Square, Student's-t, and Snedecor-F distributions in much more detail.  What has resulted in this research is the ascertainment of reduction formulas for the cumulative distribution functions for these distributions.

 

Links to Other Pages

Hobbies and interests
Probability Challenge Problems
A Special Outcome From My Math 375 Group Project
Math 375 Project Pictures

The History of Mathematics

  


Page Last Updated June 1, 2011