DMI
Home
About
DMI
Services
Publications
Software
Careers
Contact -
J.D. Opdyke, CV & Bio
|
DataMineit Tackles Big Data using SAS®
Why wait over 21.5 hours for Proc SurveySelect when DataMineit bootstraps in under 80 seconds?* (download .pdf brochure)
NEW! – even faster, proprietary versions of OPDY and OPDN, the fastest SAS® algorithms published in peer reviewed statistics journals for conducting Bootstraps, Permutation tests, and Sampling With and Without Replacement (download publications by J.D. Opdyke, Senior Managing Director, DataMineit, LLC at http://www.DataMineIt.com/DMI_publications.htm).
NEW RESULTS: SAS-based OPDY and OPDN Algos OVER 5 ORDERS OF MAGNITUDE FASTER THAN STATA, and OVER ONE ORDER OF MAGNITUDE FASTER THAN MATLAB. Contact J.D. Opdyke at JDOpdyke@DataMineit.com for additional details.
• FAST: Orders of Magnitude Faster than SAS® Procs OPDY_Boot_FT1 and OPDN_Perm_FT1 are modular, compiled SAS® Macros that run exactly as do OPDY and OPDN (but even faster). On large datasets, which is the only time that speed and scalability matter, OPDY_Boot_FT1 executes bootstraps over 990x faster than the relevant “built-in” SAS® procedure (Proc SurveySelect). Similarly, OPDN_Perm_FT1 executes permutation tests over 530x faster than Proc SurveySelect, over 400x faster than Proc NPAR1WAY (which crashes on datasets/strata less than a tenth the size of those OPDN_Perm_FT1 can process), and over 5,970x faster than Proc Multtest (that’s over 7 days vs. under 2 minutes).
• AFFORDABLE: Only Base SAS® is Required
• SCALABLE: Linear Runtime Both OPDY_Boot_FT1 and OPDN_Perm_FT1 are truly scalable: their time complexity is linear, which is not the case for the relevant SAS® Procedures.
• ROBUST: Theoretically Unlimited Dataset Size The storage complexity (only memory, no I/O) of the algorithm is linear in the number of records in the largest stratum, not the size of the dataset, so the algorithm can handle theoretically unlimited dataset size with any number of strata. The SAS® Procs either crash, or become prohibitively slow as dataset/strata sizes increase.
• GENERALIZABLE: Multivariate Regression Both algorithms are very generalizable. DataMineIt can modify OPDN_Perm_FT1 to conduct permutation tests using any sample statistic, and for multivariate regression, DataMineIt has modified versions of OPDY_Boot_FT1 available to users for performing bootstraps on econometric models.
CONTACT: Please contact J.D. Opdyke, Senior
Managing Director, DataMineit, LLC,
Finance/Market Risk Management Statistical Software
Sharpe Ratios are ubiquitous in financial analysis. Funds continuously are ranked by the Sharpe Ratio the world over. Yet these rankings never are accompanied by p-values or confidence intervals indicating the likelihood that observed differences between two funds' ratios actually are caused by true differences in performance as opposed to random sampling error. To be able to state that one fund's Sharpe ratio is larger than that of another, with 95% or 99% statistical confidence, would be highly valuable whenever one was performing a risk-adjusted performance assessment, via rankings of funds or a myriad of other approaches. Previous tests comparing Sharpe Ratios either were complex and computationally intensive, or relied on restrictive and highly unrealistic assumptions about the financial returns data being analyzed. But the statistical tests presented in the below Excel spreadsheets and SAS Programs relax these constraints, and are the first to provide such statistical tests, fully automated, on easily useable and universal platforms. Thus can financial analysts determine whether one fund's risk-adjusted performance truly is larger than that of another, with statistical significance.
Sharpe Ratios: - Opdyke, J.D., (2006), Comparing Sharpe Ratios: So where are the p-values? - preprint SAS Program (email for 1-time password) - p-values from Sharpe Ratio comparisons and Mutual Fund Rankings (.pdf results) - Excel Workbook (.xls- 1.4MB) (email for 1-time password) p-values from Sharpe Ratio comparisons and Mutual Fund Rankings - JSM2006 PowerPoint Presentation
Permutation Test Statistical Software (download .pdf summary)
Permutation tests are often and increasingly the statistical test of choice when using data to answer business and research questions across an incredibly wide range of circumstances – literally wherever data samples are being used to address hypotheses. This is because permutation tests require minimal assumptions about the data being examined, yet often have statistical power equal to – and sometimes even greater than – their parametric counterparts that require stronger, and sometimes untenable data assumptions. And unlike many parametric and other nonparametric tests, the results of permutation tests (the p-values) are unbiased. Several statistical software vendors offer products with permutation test capabilities, but they are limited -- none can perform permutation tests within reasonable timeframes when samples are not very small and many tests are required. These products have prohibitive runtimes under these conditions (if they run at all) because the steps required to carry out a permutation test are computationally intensive.
DataMineIt’s solution to the computational demands of permutation tests is PermuteItTM – statistical software that performs fast, two-sample permutation tests when one sample is large or both are moderately sized and many permutation tests must be performed (e.g. most multiple comparisons situations). PermuteItTM has been benchmarked against the available commercial alternatives (see table below or .pdf) and has relative runtimes often more than an order of magnitude faster under these conditions. This can make the difference between meeting deadlines, or missing them, when performing thousands of tests, and an hour’s runtime easily can become ten, twenty, or thirty hours. This disparity obviously becomes even more magnified when, as is the rule rather than the exception, analyses or reports need to be rerun due to the receipt of revised data; or the reprocessing of the input datasets; or any of the countless issues that arise when working with large volumes of data.
But
PermuteItTM not only
provides the speed that makes the appropriate application of permutation tests
possible where other software fails – it also provides increased precision in
the estimated p-values.
PermuteItTM uses a combination of algorithms that,
wherever possible, provide exact p-values based on full enumeration. When exact
inference is not possible, at the user’s request
PermuteItTM efficiently
attains variance reduction by increasing the number of permutation samples drawn
if the confidence interval contains the predetermined critical p-value of the
test. This provides a larger number of unambiguous test results in less time by
avoiding wasteful sampling. Some of the unique and powerful features of
PermuteItTM include:
·
the availability to the user of a wide range of test statistics
for performing permutation tests on continuous, count, and binary data, including: pooled-variance t-test;
separate-variance Behrens-Fisher t-test and joint tests for scale and location
coefficients using nonparametric combination methodology; permutation scale
test; Brownie et. al. “modified” t-test; skew-adjusted “modified” t-test
exact inference; Cochran-Armitage test; exact inference; Poisson normal-approximate test; Fisher’s exact test;
Freeman-Tukey double arcsine test
·
extremely fast exact inference (no confidence intervals –
just exact p-values) for most count data and high-frequency continuous data,
often several orders of magnitude faster than the most widely available
commercial software (see
table below or
.pdf) · the availability to the user of a wide range of multiple testing procedures, including: Bonferroni, Sidak, Stepdown Bonferroni, Stepdown Sidak, Stepdown Bonferroni and Stepdown Sidak for discrete distributions, Hochberg Stepup, FDR, Dunnett’s one-step (for MCC under ANOVA assumptions), Single-step Permutation, Stepdown Permutation, Single-step and Stepdown Permutation for discrete distributions, Permutation-style adjustment of permutation p-values
·
efficient variance-reduction under
conventional Monte Carlo via
self-adjusting permutation sampling when confidence intervals contain the
predetermined critical value of the test
·
fast, efficient, and automatic
generation of all pairwise comparisons
·
shortest confidence intervals
under conventional Monte Carlo via a new
sampling optimization technique (see Opdyke,
Journal of Modern
Applied Statistical Methods, Vol. 2, No. 1, May, 2003, and related
conference
presentations -- .pps)
·
fast permutation-style p-value adjustments for multiple
comparisons (the code is actually designed to provide an additional speed
premium for these resampling-based multiple comparisons adjustments -- see table
below or .pdf) · simultaneous permutation testing and permutation-style p-value adjustment, although for relatively few tests at a time (this capability is not even provided as a preprogrammed option with any other software currently on the market)
DataMineIt has designed, benchmarked, and thoroughly tested the premier permutation test software on the statistical software market for moderate sample sizes and many tests. To learn more about how PermuteItTM can be used for your enterprise, and to obtain a demo version, please contact its author, J.D. Opdyke, Senior Managing Director, DataMineit, LLC, at JDOpdyke@DataMineit.com. Please include with your name relevant contact (email address, phone number, etc.) and background (company, title, etc.) information.
|