DataMineIt Services - When You Need to Know

DMI Home About DMI Services Publications Software Careers Contact - J.D. Opdyke, CV & Bio

(©Copyright 2003 J.D. Opdyke. All Rights Reserved. See our website usage policy)

Services

More and better data exist today then ever before, and tremendous increases in computing speed and power now allow organizations to utilize this plethora of information. However, effectively and efficiently translating raw information from "Big Data" into critical business knowledge requires the careful application of the sophisticated data mining techniques with which DataMineit professionals have extensive experience and training. Our clients typically have complex business problems for which we provide actionable solutions by applying statistical, econometric, and algorithmic methods to what are often massive amounts of data. Our analyses are empirically driven, carefully documented, extremely thorough and accurate, and theoretically defensible and methodologically sound. Examples of the types of questions we answer for our clients are included below, and our ability to successfully address them, based on a thorough understanding of our clients’ business needs and priorities, is why firms and organizations turn to DataMineit “when they need to know.”

Business Questions/Problems DataMineIt Solves for Clients
- When managing operational risk for large and mid-sized banks, how can I use robust statistics to obtain stable, reliable estimates of the parameters of the heavy-tailed loss severity distribution (even when its truncated), arguably the largest driver of aggregate losses? And how can I do this while not only satisfying Basel II capital estimation regulatory requirements, but also exceeding them and helping to define them?
- For Venture Capital and other alternative asset classes, how can I simultaneously increase returns while decreasing (downside) risk? How can I not only shift the likely distribution of returns upwards by utilizing sophisticated investment algorithms, but also change the shape of the distribution to minimize risk by utilizing rigorous, yet complementary, risk management techniques?
- For large credit card rewards points programs, how can I reliably predict the points redemption behavior of program members for the purposes of i) reliably booking the massive financial liabilities associated with unredeemed points; ii) performing data-driven business analyses to identify, target, and promote profitable redemption patterns? Which are the most sophisticated, robust, yet runtime-efficient econometric models that must be used to accurately capture both the timing and magnitude of points redemptions?
- How can I most accurately predict credit delinquencies, as well as forecast the probability of default at any point during the life of a loan? Given a loan's probability of default, is it more profitable than the next best alternative, based on "loss-given-default" models? What are the most efficient and effective ways to incorporate time-varying effects into these econometric models?
- Which products should be marketed to which customers – and in what combinations and at what prices – in order to maximize profit (not just sales revenues)? Which customers should be targeted for customer retention programs, and how can brand loyalty most effectively be leveraged?
- How can I optimize the runtime of reports that need to be generated regularly under tight deadlines, but are run off of large databases and/or rely on the results of computationally intensive statistical tests?
- How can I most efficiently and effectively link large numbers of datasets of varying sizes, and how can I optimize the performance of on-line analytical processing (OLAP) queries on such a relational database system? How can I now use this previously stand-alone data to establish, quantify, understand, and leverage causal relationships between important business variables?
- Which statistical tests are most powerful for (i.e. most likely to detect errors in) quality control systems while still controlling and minimizing the level of false positives? And what are the underlying causes of differences in quality?
- How can I strategically structure sales and other contracts to maximize profit?
- Are the statistical and econometric methods employed by an expert witness theoretically and empirically valid, defensible, and verifiable in the context of a specific arbitration or court case?
- In a volatile market, how can the accuracy of sales forecasts be increased while maintaining confidence intervals that are robust to the effects of rare occurrences?
- Which statistical sampling methods will minimize sample size requirements (and thus, costs) without sacrificing statistical power, control over the size of the test, or making unsupportable and possibly misleading distributional assumptions about the data?

We have managed small, large, and components of extremely large projects spanning a broad range of industries, some of which are listed below:

CORPORATE BANKING & OPERATIONAL RISK
- For operational risk management for a Large International Bank (high-net-worth individuals and institutional investors):
- per Basel II's capital estimation guidelines, researched, tested, and developed robust statistical alternatives to maximum likelihood estimation for more stable and reliable severity distribution parameter estimation (heavy-tailed distributions, truncated and shifted).
- designed and gave technical statistical presentations for internal and external use to inform and develop regulatory strategy
- effectively incorporated multivariate regression approaches to mitigate heterogeneity within units of measure
VENTURE CAPITAL
- As the Director of Quantitative Strategies of a venture capital firm for three years:
- developed from scratch and implemented the firm’s portfolio selection investment algorithms using a unique, proprietary dataset containing tens of thousands of exit-related financing rounds spanning 20 years
- as the sole model developer, wrote over 400K lines of SAS code
- made detailed presentations of model performance to all sizeable potential investors (including the three largest institutional investors as of 4/20/10 prior to their commitment to invest in the fund
MORTGAGE BANKING & CREDIT RISK
- For a Fortune 100 financial institution:
- performed econometric modeling of the redemption behavior of rewards points (representing over $1 billion USD in financial liabilities) using recurrent events and survival analysis models
- automated model selection procedures for logistic regressions, turning hours of manual coding intervention into seconds of automated runtime
- derived and implemented an original, statistically-driven algorithm that obtains the mathematically optimal solution to automated re-aggregation of “thin data” segments -- this ensured reliable, robust estimation of complex statistical measures on large-data production runs
- single-handedly developed, designed, and delivered formal presentations of new methodologies and empirical findings to senior management
- For a Fortune 50 financial institution:
- performed econometric modeling of credit risk and delinquency behavior using a wide range of proportional and non-proportional hazards, time series, count-data, and logistic regression models
- increased the speed of established company SAS^® macros by orders of magnitude (from over a week to 90 minutes), making possible essential analyses that previously were runtime prohibitive
- wrote original, advanced, statistical SAS^® Macros that not only are faster than SAS's^® own pre-compiled procedures (SAS^® PROCs), but also generate statistics more powerful than those generated by the client's own SAS^® Macros
- analyzed complex credit class rules and quantified the impacts of proposed improvements to them
- developed, designed, and delivered formal presentations of findings to senior management
TELECOMMUNICATIONS
- Wrote and implemented permutation test statistical software for an RBOC to satisfy the operations support services (OSS) performance measurement regulatory requirements of multiple state public service commissions and §271 of the Telecommunications Act of 1996. Code is five times faster than pre-compiled code written by another highly regarded statistical software firm consulted on the project
- Conducted a comprehensive statistical analysis and data audit of the retail and resale markets of an RBOC to satisfy state and federal (§271 of the Telecommunications Act of 1996) regulatory requirements
- selected and implemented rigorous parametric and nonparametric statistical methods for parity testing on the full range of §271-related OSS performance measure data (close to a thousand performance measures)
- hired and managed a team of consultants during initial phase of data/statistical parity analysis
- developed a statistical algorithm for a mandated, computationally intensive statistical test which cut computer runtime from well over a week into hours – program is over 80 times faster than a competing consulting firm’s attempt at implementing the same test
- wrote a statistical affidavit detailing the appropriate implementation of permutation tests within the context of OSS parity testing
- employed a range of regression techniques to perform root cause statistical analyses to determine causes of disparate service provision to CLEC customers
- wrote the technical appendix of a statistical affidavit filed with multiple state public service commissions
- managed the implementation of strict quality control guidelines verifying the integrity of data and statistical test results for the entire database, reporting, and analysis system
- Performed a cost estimation for an RBOC of ISP traffic relative to CLEC local exchange service revenue
- Determined and implemented a range of statistical sampling methods for an RBOC potentially facing large fines regarding its call-monitoring practices
- Employed a range of parametric and nonparametric statistical sampling and testing methods for two RBOCs requiring the implementation of a performance measure sampling methodology
QUANTITATIVE MARKETING / ADVERTISING
- For a large marketing firm, conducted econometric time-series / event-study modeling to estimate the concurrent effects of various types of advertising spending on customer patronage
PRICING / RETAIL
- Developed non-linear price elasticity models based on years of detailed sales data for a global manufacturer and distributor.
- Managed the data analysis component of a comprehensive long-term pricing strategy project for a multibillion dollar global professional services firm. Identified, cleaned, and merged internal financial and client data from numerous database systems to perform modeling for price prediction. Methods used include hedonic regression, neural nets, and the application of resampling methods to tobit models. For each model, constructed GUI interfaces that accept project and client characteristics as input, and as output, predict prices (with user-defined confidence intervals) to aid managers and partners in pricing their projects.
- Managed the data mining component of a comprehensive product, customer, and pricing analysis for the largest privately owned retail organization in the country. Developed and implemented a data warehouse system linking point-of-sale data (half a billion records annually) with databases across multiple systems (store, merchandise, store account, and department) to perform: a) multivariate customer segmentation utilizing various classification algorithms; b) econometric modeling of purchasing behavior; and c) sales margin, price point, geographic, competitor, departmental, and product class analyses.
- Managed the data mining component of a comprehensive product, customer, and pricing analysis for a national retail department store. Performed multivariate customer segmentation utilizing various classification algorithms, as well as competitor, price point, profit margin, geographic, departmental and product class analyses.
- For an audit of a retail manufacturer, estimated total dollars correctly invoiced by designing and implementing parametric and nonparametric stratified bootstrap algorithms applied to ratio estimators
- Critiqued an opposing expert’s time-series – cross-section econometric analysis of an event study in retail litigation with alleged damages of over half a billion dollars
LITIGATION / REGULATION
- For Big 4 and economic consulting firms, on large litigations (e.g. $0.4b), developed econometric models (TSCS, ARMA, non-proportional hazards) for event studies / price estimation (airlines, software, mutual funds); presented expert testimony in federal court arbitration; developed nonlinear price elasticity demand models.
- Managed an evaluation of statistical sampling techniques of a Department of Justice audit of a large city’s administration of federally funded programs
- Performed demographic analyses in support of smoking-related tobacco class action litigation
- Performed statistical analyses for an electric utility estimating incremental benefits of improved metering accuracy from generation to transmission, and transmission to distribution
- For several large law firms, performed statistical programming for a number of large antitrust litigation and merger cases requiring large database construction for predatory pricing and price elasticity analyses, the measurement of market concentration, market definition, and the calculation of various measures of market power
- Critiqued an opposing expert’s time-series – cross-section econometric analysis of an event study in retail litigation with alleged damages of over half a billion dollars
TRANSPORTATION
- Directed the applied econometric analysis of a large ridership forecast project for a multi-billion dollar airport access monorail system in New York City
PHARMACEUTICALS
- Conducted the statistical analysis that served as the foundation for a capitated price contract between an international pharmaceutical company and a national managed-care organization
TRANSFER PRICING
- Utilized a wide range of innovative empirical methods in comparables analyses evaluating non-market transactions in numerous transfer pricing studies.

In addition to wide ranging industry experience, DataMineit’s analytical toolkit is broad and deep. We have over twenty years of experience utilizing a range of statistical and econometric methods with software that enable us to tackle complex business problems and develop methodologically rigorous and defensible solutions that are based on the data – not vague or unproven management theories. We are decade-long experts utilizing most of the major modules of SAS^®, and have custom-tailored most of the data solutions described above using SAS^® software. However, we also have experience deploying many other statistical and scientific computing packages, often in conjunction with SAS^®, including S-Plus^®, Mathematica^®, Statistica^®, Limdep^®, Gauss^®, SPSS^®, and C++. Below is a partial list of methods we turn to when developing and implementing data solutions for our clients.

Multivariate regression (constrained linear, logit, probit, tobit)
Monte Carlo simulations, bootstrapping, permutation tests, jackknife
Survival analyses, recurrent events, proportional and non-proportional hazards models
Statistical sampling (stratified, cluster, random, systematic) and power analyses
Classification Algorithms – CART, CHAID, QUEST, Hybrid Methods
ANOVA, MANOVA, and Multiple Comparisons
Time-series cross-section and panel data
Neural networks
Nonparametric and robust statistical testing
Categorical data analysis (exact tests, contingency tables)
Nonparametric smoothing: splines, kernels, local regression, k-NN
Empirical Likelihood