The problems we face cannot be solved
at the same level of thinking we were at when we created them Albert
Einstein
"Who we are today is a reflection of our actions from the
past"- Dr. David Chen, Professor California State University, Fullerton
"Greatness is 1% final product, and 99% passion in the pursuit of
completing it"- William Welbourn, Jr.
"Mathematics is the queen of the sciences and number theory is the queen
of mathematics"- Carl Friedrich Gauss
"The excitement that a gambler feels when making a bet is equal to the
amount he might win times the probability of winning it"- Blaise Pascal
Beamer Class for professional looking
presentations Example
Past Academic News
Undergraduate
May 26, 2000- Received my first ever perfect score (overall
course score) in a mathematics course (Math 350A- Advanced
Calculus). As a result I was the top ranked undergraduate in the
course.
·May 26, 2000- Ranked #1 in Math 375- Discrete Dynamical Systems and
Chaos with an overall course score of 96%.
·January 2, 2001-
Conferral Date for the Bachelor of Arts Degree in Mathematics.
Graduate (Master of
Science)
·March 2004 -
Admitted to the Master's Program in Biostatistics at the Los Angeles Campus of the University of California System. Concurrently, admitted to the Master's Program in
Biostatistics at the University of Southern California.
·May 2004 - Began
my Graduate Studies at USC.
·May 2005 - Began
my Master's Thesis Research - Topic: Ordered Subset Analysis (OSA).
·April 2006 -
Successfully defended my Master's Thesis and completed the requirements for the
Degree, Master of Science in Biostatistics.
·May 10, 2006 - Elected to the USC Chapter of the Honor Society of Phi Kappa Phi.
·May 12, 2006 - Conferral Date for the Master of Science Degree in Biostatistics.
Graduate (Doctoral)
March
2007 - Admitted to the Doctor of Philosophy Program in Mathematics,
Statistics Option, at Utah State University
and the University of Nevada, Las Vegas. Concurrently, each University offer
included a Graduate Assistantship Award.
·August 2007 -
Began my Graduate Studies at USU.
Teaching Section 11 of Math 1050, College Algebra.
·February 2008 -
Revising MS Thesis and subsequent papers for publication.
·May 2008 -
Overall Course Score of 100.5% in Mathematical Statistics II (Stat 6720).
·May 2008 - Passed
the Mathematical Statistics PhD Qualifying Exam.
·December 1, 2008 Elected to the USU Chapter of the Golden Key International Honour
Society.
Research Interests
GPU
Statistical Computing; GPU clustering.
CUDA paper for maxT MHT procedure 12/2009 Abstract
Utilize Ordered Subset
Analysis (OSA) in a Genome-wide Association Study (GWAS), to locate and
suggest genes which express in the same amino acid
transcription/translation process (i.e., locate genes which are
correlated).
Assign appropriate
correlation magnitudes to the pairwise (and
higher order) gene associations from OSA use.
Working on the development
of a permutation based multiple hypothesis
testing (MHT) procedure, analogous to maxT/minP.However, novel algorithm will account
for OSA determined gene-gene correlations.
Adapting OSA to a binary
phenotype.This would enable OSA to
be used in place of traditional logistic regression modeling, in
suggesting gene-environment interaction.
Integration of
permutation testing, particularly the use of the tilted hypergeometric distribution (GHG) for testing non
traditional null hypotheses related to the odds ratio.
Suggesting the use of
Exact (permutation) Tests (as opposed to asymptotic tests) for GWAS
MHT.This is a crucial notion, since
within MHT, statistical significance is achieved
way out in the tail of the null distribution.Hence, the assumption of an asymptotic
sampling distribution under the null, could
result in serious p-value calculation errors.
Major Master's Thesis Extension
Document (Potential Book/The basis for my PhD Dissertation paper
currently stands at 194 pages) - Description, as of 1/4/2009:
Chapter 1 - Overview
of what Ordered Subset Analysis (OSA) is all about, and a few examples of
where it could be applied
Chapter 2 - Some of
the underlying theory of OSA, including how one would draw inference from
the empirical p-value.
Chapter 3 -
Computational efficiency issues. Topics include a theorem to reduce
p-value computations, algorithms to search for the subset which provides
maximum statistical evidence for gene-environment interaction, several
algorithms for the implementation of power analyses, and maximizing
computer power (i.e., parallel processing, utilizing multi-core CPU
technology) in the implementation of the algorithms.
Chapter 4 Numerical
Justification to referenced assumptions within chapters 2 and 3.Benchmarking Theorem 8 of the paper.
Chapter 5
Simulation and Implementation of OSA into an R GUI
I am in the process
of implementing the OSA methodology into the C programming environment
(spent all of December 2008 doing this).
I plan to look into
extending the OSA methodology to other (than linear regression) GLM's,
particularly logistic regression.
With regard to
computational aspects, I plan to integrate database management (via the
SQL querying language) into the framework. This should reduce
computational timing substantially, over the use of "flat (text)
file" storage techniques.
Integrate the OSA
statistical methodology into a novel R package.
My primary interest in statistics lies within distribution theory. My Master's Thesis, an Ordered
Subset Analysis investigation (hence the "osa" in osastatistician.com), entailed computational
burdensome algorithms, and during the course of the project I was able to prove
an interesting result (Paper #1 above) regarding the Central Student's-t
Distributions. Namely, given any positive real value, x, and two positive
integers, n1 and n2, such that n1<n2, then the value for the cumulative
distribution function (at the given value of x) for the Student's-t
distribution with n1 degrees of freedom is less than that for n2 degrees of
freedom. This result was an extremely important efficiency improvement
for this project, as a mere 4.7% of total data throughput was seen. The
Thesis essentially entails three unique and fresh distributions for
Gene-Environment Interaction Association Studies - The OSA distribution, the
CEO Null Distribution, and the COO Null
Distribution.
I continue to work on extending the Thesis. As mentioned in the body
of the paper, a source of error for the OSA investigation lies in the
assumption of the number of random orderings chosen for each of the CEO and COO
Null Distributions. Namely, we chose to sample 10,000 random orderings,
of the possible n!-1 random orderings, where n=200. To say this is a
minute sampling of random orderings for each of these Null Distributions is an
exaggeration. However, even with the choice of 10,000 random orderings,
approximately six months of computational time was required to obtain full data
results. This was the due to approximately two trillion simple linear
regression analyses, each requiring a p-value
calculation (based on the Central Student's-t Distributions). As it turns
out, the p-value calculation is the "bottleneck" in the OSA
algorithm, as far as computational timing issues are concerned. My
current research is the development of a new mathematical/statistical
methodology to avoid this bottleneck (at least to a certain extent). The
methodology is rather simple and elegant, but preliminary analysis
suggests this methodology requires approximately 80% less time for the OSA/CEO
Null Distribution empirical p-value ascertainment, based on sampling 10,000
random orderings for the CEO Null Distribution. This methodology entails
two parts: (1) 191 Student's-t tables were created from the R (version 2.1.1)
software environment, one table for each of the degrees of freedom Student's-t
test statistics encountered for the OSA assumptions chosen for the Thesis, and
(2) Essentially, a linear interpolation algorithm, using the values from these
Student's-t tables. The linear interpolation approximates the true
p-value for a given test statistic... Now, one could argue that we have
just contradicted what we set out to do, and just introduced another source of error to the OSA.
We set out to improve the computational timing, where the end result is to
increase the number of random orderings chosen. Thus,
decreasing the assumed error for the OSA. However, preliminary
statistical analyses (based on Kaplan-Meier Survival Methods), suggests very
strong evidence to support the lack of introduction of error into the OSA
investigation. A most intriguing
observation I found is, it can be shown that the differential
misclassification bias introduced to the Empirical P-Value Estimate, from the
use of the interpolation algorithm, is unidirectional. The overall result
which I intend to show here is that this new methodology is a welcomed addition
to the already unique OSA Methodology I have investigated (and introduced to
the Statistical World), and because of the significant increase in
computational efficiency, opens new avenues of OSA enhancement.
My most recent research endeavors:
(1) The Order Statistics for
the Standard Normal Distribution. This project is designed to add
credibility to the Order Statistic Simulation Methodology of my MS Thesis
Appendices. I am continuously finding interesting results unfold...
Keep coming back to this page for updates to this paper.
(2) An elegant result regarding the Central Student's-t Distributions.
This result enhances the computational efficiency for the OSA.
Over the course of the months of September and October 2006, I kept a brief Journal of my thoughts
regarding the OSA expansion. So many thoughts had filled my mind in such
a short period of time, I documented my thoughts into
Journal Form. These are notions which I hope to investigate in the near
future.
An ongoing project
which I began back in 2000, entails stochastic
processes and Dynamical Systems. The motivation for this project was
stemmed from Dr. Mario Martelli, Professor Emeritus
of CSUF. Essentially, the project entails a group of sailors confined to
an island with a fixed supply of food, a stack of bananas. The sailors,
one-by-one approach the stack of bananas and take "their fair share."
The complexity of the problem originates in the "stochastic thief
action" of a monkey on the island - As the sailors take their bananas,
there is a certain probability that the monkey will take some banana(s) from
the stack. This project has the potential to be applied to medicine,
where the sailors are human cells, the bananas are fuel for the human cells,
and the monkey is a virus or bacterium.
During the Summer months of 1999, I investigated how the statistical tables
in textbooks are derived. I used numerical integration techniques, such
as the Simpson Rule and the Composite Boole's Rule,
to generate the Standard Normal, Chi-Square, Student's-t, and Snedecor-F statistical tables. This research is
documented in "notebook format," and a journal of the works is
kept in my files. Since the writing of this technical document, I have
further investigated the Chi-Square, Student's-t, and Snedecor-F
distributions in much more detail. What has resulted in this research is
the ascertainment of reduction
formulas for the cumulative distribution functions for these distributions.