DMI Home    About DMI    Services    Publications    Software    Careers    Contact - J.D. Opdyke, CV & Bio

(©Copyright 2003 J.D. Opdyke.  All Rights Reserved.  See our  website usage policy)

 

     

About DMI

 

DataMineit, LLC is a consultancy providing advanced statistical and econometric modeling, risk analytics, and algorithm development primarily to the banking, finance, and consulting sectors.  We have extensive experience and training utilizing sophisticated analytic tools to provide actionable solutions to the often complex business problems that our clients face.

 

All of our work is either directly or indirectly empirical – we are data experts who often work with very large datasets when translating raw information into critical business knowledge.  We always provide thorough documentation of all the data cleaning, manipulation, and statistical and econometric programming that we perform, as well as the theoretical underpinnings supporting the latter. 

 

Our over twenty years of experience with the statistical and data–solution software that we utilize allows us to most effectively identify and leverage both algorithmic and data structure efficiencies when we are developing, for example, a statistical quality control process system, or a large relational database system, or both.  We rely on this experience to complete projects within very tight deadlines for our clients, who know the results we provide to them are always thorough, accurate, and methodologically sound.  We also rely on this experience to produce statistical algorithms and code, in SAS® and other languages, that are orders of magnitude faster than SAS's® own pre-compiled procedures (SAS PROCs), thus enabling critical analyses that otherwise simply would be runtime prohibitive, in SAS® or any other statistical software package.  Examples of the types of questions we answer for our clients include:

  • Business Questions/Problems DataMineIt Solves for Clients

    • When managing operational risk for large and mid-sized banks, how can I use robust statistics to obtain stable, reliable estimates of the parameters of the heavy-tailed loss severity distribution (even when its truncated), arguably the largest driver of aggregate losses?  And how can I do this while not only satisfying Basel II capital estimation regulatory requirements, but also exceeding them and helping to define them?

    • For Venture Capital and other alternative asset classes, how can I simultaneously increase returns while decreasing (downside) risk?  How can I not only shift the likely distribution of returns upwards by utilizing sophisticated investment algorithms, but also change the shape of the distribution to minimize risk by utilizing rigorous, yet complementary, risk management techniques?

    • For large credit card rewards points programs, how can I reliably predict the points redemption behavior of program members for the purposes of  i) reliably booking the massive financial liabilities associated with unredeemed points; ii) performing data-driven business analyses to identify, target, and promote profitable redemption patterns?  Which are the most sophisticated, robust, yet runtime-efficient econometric models that must be used to accurately capture both the timing and magnitude of points redemptions?

    • How can I most accurately predict credit delinquencies, as well as forecast the probability of default at any point during the life of a loan?  Given a loan's probability of default, is it more profitable than the next best alternative, based on "loss-given-default" models?  What are the most efficient and effective ways to incorporate time-varying effects into these econometric models?

    • Which products should be marketed to which customers – and in what combinations and at what prices – in order to maximize profit (not just sales revenues)?  Which customers should be targeted for customer retention programs, and how can brand loyalty most effectively be leveraged?
       

    • How can I optimize the runtime of reports that need to be generated regularly under tight deadlines, but are run off of large databases and/or rely on the results of computationally intensive statistical tests?
       

    • How can I most efficiently and effectively link large numbers of datasets of varying sizes, and how can I optimize the performance of on-line analytical processing (OLAP) queries on such a relational database system?  How can I now use this previously stand-alone data to establish, quantify, understand, and leverage causal relationships between important business variables?
       

    • Which statistical tests are most powerful for (i.e. most likely to detect errors in) quality control systems while still controlling and minimizing the level of false positives?  And what are the underlying causes of differences in quality?
       

    • How can I strategically structure sales and other contracts to maximize profit?
       

    • Are the statistical and econometric methods employed by an expert witness theoretically and empirically valid, defensible, and verifiable in the context of a specific arbitration or court case?
       

    • In a volatile market, how can the accuracy of sales forecasts be increased while maintaining confidence intervals that are robust to the effects of rare occurrences?
       

    • Which statistical sampling methods will minimize sample size requirements (and thus, costs) without sacrificing statistical power, control over the size of the test, or making unsupportable and possibly misleading distributional assumptions about the data?

 

DataMineit is able to regularly and successfully address these types of questions because a) we listen carefully to our clients to ensure that we have a thorough and accurate understanding of their immediate, mid-range, and long-term business needs and priorities, and b) we provide the unique combination of firm characteristics described below.  Both guarantee DataMineit’s standing as a premier provider of statistical data mining and data warehousing for industry, consulting, and research, and that businesses and organizations will continue to turn to DataMineit “when they need to know.”

 

  • Exceptional Training and Broad Industry Experience
     

    • DataMineit professionals have obtained advanced degrees, additional academic training, and received fellowships and honors from top universities, including Harvard University, Yale University, and the Massachusetts Institute of Technology. Our previous professional experience includes research, statistical and econometric analysis, and senior management positions at the National Bureau of Economic Research, Harvard University, Charles River Associates, and the Economic Consulting Group at Andersen, LLP.  We have over twenty years of experience providing rigorous statistical data mining, econometric analysis, and algorithm development of the highest quality to firms and organizations in a wide range of industries.  Below are some examples of the project management experience of the DataMineit professional staff:
       

    • CORPORATE BANKING & OPERATIONAL RISK
       

      • For operational risk management for a Large International Bank (high-net-worth individuals and institutional investors):
         

      • per Basel II's capital estimation guidelines, researched, tested, and developed robust statistical alternatives to maximum likelihood estimation for more stable and reliable severity distribution parameter estimation (heavy-tailed distributions, truncated and shifted).
         

      • designed and gave technical statistical presentations for internal and external use to inform and develop regulatory strategy
         

      • effectively incorporated multivariate regression approaches to mitigate heterogeneity within units of measure
         

    • VENTURE CAPITAL
       

      • As the Director of Quantitative Strategies of a venture capital firm for three years:
         

      • developed from scratch and implemented the firm’s portfolio selection investment algorithms using a unique, proprietary dataset containing tens of thousands of exit-related financing rounds spanning 20 years
         

      • as the sole model developer, wrote over 400K lines of SAS code
         

      • made detailed presentations of model performance to all sizeable potential investors (including the three largest institutional investors as of 4/20/10 prior to their commitment to invest in the fund
         

    • MORTGAGE BANKING & CREDIT RISK
       

      • For a Fortune 100 financial institution:
         

      • performed econometric modeling of the redemption behavior of rewards points (representing over $1 billion USD in financial liabilities) using recurrent events and survival analysis models
         

      • automated model selection procedures for logistic regressions, turning hours of manual coding intervention into seconds of automated runtime
         

      • derived and implemented an original, statistically-driven algorithm that obtains the mathematically optimal solution to automated re-aggregation of “thin data” segments -- this ensured reliable, robust estimation of complex statistical measures on large-data production runs
         

      • single-handedly developed, designed, and delivered formal presentations of new methodologies and empirical findings to senior management
         

      • For a Fortune 50 financial institution:
         

      • performed econometric modeling of credit risk and delinquency behavior using a wide range of proportional and non-proportional hazards, time series, count-data, and logistic regression models
         

      • increased the speed of established company SAS® macros by orders of magnitude (from over a week to 90 minutes), making possible essential analyses that previously were runtime prohibitive
         

      • wrote original, advanced, statistical SAS® Macros that not only are faster than SAS's® own pre-compiled procedures (SAS® PROCs), but also generate statistics more powerful than those generated by the client's own SAS® Macros
         

      • analyzed complex credit class rules and quantified the impacts of proposed improvements to them
         

      • developed, designed, and delivered formal presentations of findings to senior management
         

    • TELECOMMUNICATIONS
       

      • Wrote and implemented permutation test statistical software for an RBOC to satisfy the operations support services (OSS) performance measurement regulatory requirements of multiple state public service commissions and §271 of the Telecommunications Act of 1996.  Code is five times faster than pre-compiled code written by another highly regarded statistical software firm consulted on the project
         

      • Conducted a comprehensive statistical analysis and data audit of the retail and resale markets of an RBOC to satisfy state and federal (§271 of the Telecommunications Act of 1996) regulatory requirements:
         

      • selected and implemented rigorous parametric and nonparametric statistical methods for parity testing on the full range of §271-related OSS performance measure data (close to a thousand performance measures)
         

      • hired and managed a team of consultants during initial phase of data/statistical parity analysis
         

      • developed a statistical algorithm for a mandated, computationally intensive statistical test which cut computer runtime from well over a week into hours – program is over 80 times faster than a competing consulting firm’s attempt at implementing the same test
         

      • wrote a statistical affidavit detailing the appropriate implementation of permutation tests within the context of OSS parity testing
         

      • employed a range of regression techniques to perform root cause statistical analyses to determine causes of disparate service provision to CLEC customers
         

      • wrote the technical appendix of a statistical affidavit filed with multiple state public service commissions
         

      • managed the implementation of strict quality control guidelines verifying the integrity of data and statistical test results for the entire database, reporting, and analysis system
         

      • Performed a cost estimation for an RBOC of ISP traffic relative to CLEC local exchange service revenue
         

      • Determined and implemented a range of statistical sampling methods for an RBOC potentially facing large fines regarding its call-monitoring practices
         

      • Employed a range of parametric and nonparametric statistical sampling and testing methods for two RBOCs requiring the implementation of a performance measure sampling methodology
         

    • QUANTITATIVE MARKETING / ADVERTISING
       

      • For a large marketing firm, conducted econometric time-series / event-study modeling to estimate the concurrent effects of various types of advertising spending on customer patronage
         

    • PRICING / RETAIL
       

      • Developed non-linear price elasticity models based on years of detailed sales data for a global manufacturer and distributor.
         

      • Managed the data analysis component of a comprehensive long-term pricing strategy project for a multibillion dollar global professional services firm.  Identified, cleaned, and merged internal financial and client data from numerous database systems to perform modeling for price prediction.  Methods used include hedonic regression, neural nets, and the application of resampling methods to tobit models.  For each model, constructed GUI interfaces that accept project and client characteristics as input, and as output, predict prices (with user-defined confidence intervals) to aid managers and partners in pricing their projects.
         

      • Managed the data mining component of a comprehensive product, customer, and pricing analysis for the largest privately owned retail organization in the country.  Developed and implemented a data warehouse system linking point-of-sale data (half a billion records annually) with databases across multiple systems (store, merchandise, store account, and department) to perform: a) multivariate customer segmentation utilizing various classification algorithms; b) econometric modeling of purchasing behavior; and c) sales margin, price point, geographic, competitor, departmental, and product class analyses.
         

      • Managed the data mining component of a comprehensive product, customer, and pricing analysis for a national retail department store.  Performed multivariate customer segmentation utilizing various classification algorithms, as well as competitor, price point, profit margin, geographic, departmental and product class analyses.
         

      • For an audit of a retail manufacturer, estimated total dollars correctly invoiced by designing and implementing parametric and nonparametric stratified bootstrap algorithms applied to ratio estimators
         

      • Critiqued an opposing expert’s time-series – cross-section econometric analysis of an event study in retail litigation with alleged damages of over half a billion dollars
         

    • LITIGATION / REGULATION
       

      • For Big 4 and economic consulting firms, on large litigations (e.g. $0.4b), developed econometric models (TSCS, ARMA, non-proportional hazards) for event studies / price estimation (airlines, software, mutual funds); presented expert testimony in federal court arbitration; developed nonlinear price elasticity demand models.
         

      • Managed an evaluation of statistical sampling techniques of a Department of Justice audit of a large city’s administration of federally funded programs
         

      • Performed demographic analyses in support of smoking-related tobacco class action litigation
         

      • Performed statistical analyses for an electric utility estimating incremental benefits of improved metering accuracy from generation to transmission, and transmission to distribution
         

      • For several large law firms, performed statistical programming for a number of large antitrust litigation and merger cases requiring large database construction for predatory pricing and price elasticity analyses, the measurement of market concentration, market definition, and the calculation of various measures of market power
         

      • Critiqued an opposing expert’s time-series – cross-section econometric analysis of an event study in retail litigation with alleged damages of over half a billion dollars
         

    • TRANSPORTATION
       

      • Directed the applied econometric analysis of a large ridership forecast project for a multi-billion dollar airport access monorail system in New York City
         

    • PHARMACEUTICALS
       

      • Conducted the statistical analysis that served as the foundation for a capitated price contract between an international pharmaceutical company and a national managed-care organization
         

    • TRANSFER PRICING
       

      • Utilized a wide range of innovative empirical methods in comparables analyses evaluating non-market transactions in numerous transfer pricing studies.
         

     

  • Substantive Leadership  

     

    • DataMineit professionals have published award-winning papers on statistical computation and optimization techniques (site) that provide concrete solutions to urgent business problems.  Even though technological advances have increased computing speed and power dramatically, efficient statistical code still is needed, now more than ever: newly developed statistical methodologies, with their own computationally intensive demands, are more than keeping pace with these advances.  DataMineit is a leader in the race to develop rigorous analytic business solutions that are proactively based on the most advanced and proven methodologies, rather than reactively driven by technological constraints (see previous and most current conference presentations -- .pps, .pdf).
       

     

  • Client-Driven Innovation  

     

    • DataMineit professionals are absolutely driven and unconditionally focused on implementing the best solutions for our clients, even if what is required has never been done before.  Among the DataMineit professional staff is the sole inventor of patent-pending statistical code that first enabled the implementation of the methodologically most appropriate, but computationally intensive, statistical tests for the state and federally mandated OSS performance reporting of two of the largest telecommunications firms in the world.  The tests previously had not been implemented on the full range of required data, and the only other attempt to date by a competing consulting firm had yielded an unusable algorithm that was over 75 times slower.  The other firm had concluded that an expensive computer upgrade was inevitable -- the current DataMineit professional developed an innovative analytic method that solved the business problem with existing resources.
       

    • The efficient code based on this method still remains five times faster than the later attempt of another highly respected statistical software firm at implementing the same statistical tests on the same voluminous datasets.  It currently is being used by the two largest of the four remaining regional bell operating companies, and is just one example of how DataMineit professionals have provided unrivaled, analytically rigorous, yet actionable solutions to often time-critical business problems.

     

    •  

  • Methodological Breadth and Depth 

     

    • DataMineit’s analytical toolkit is broad and deep.  We have over twenty years of experience utilizing a range of statistical and econometric methods with software that enable us to tackle complex business problems and develop methodologically rigorous and defensible solutions that are based on the data – not vague or unproven management theories.  We are decade-long experts utilizing most of the major modules of SAS®, and have custom-tailored most of the data solutions described above using SAS® software.  However, we also have experience deploying many other statistical and scientific computing packages, often in conjunction with SAS®, including S-Plus®, Mathematica®, Statistica®, Limdep®, Gauss®, SPSS®, and C++.  Below is a partial list of methods we turn to when developing and implementing data solutions for our clients.
       

    • ·  Multivariate regression (constrained linear, logit, probit, tobit)

    • ·  Monte Carlo simulations, bootstrapping, permutation tests, jackknife

    • ·  Survival analyses, recurrent events, proportional and non-proportional hazards models

    • ·  Statistical sampling (stratified, cluster, random, systematic) and power analyses

    • ·  Classification Algorithms – CART, CHAID, QUEST, Hybrid Methods

    • ·  ANOVA, MANOVA, and Multiple Comparisons

    • ·  Time-series cross-section and panel data

    • ·  Neural networks

    • ·  Nonparametric and robust statistical testing

    • ·  Categorical data analysis (exact tests, contingency tables)

    • ·  Nonparametric smoothing: splines, kernels, local regression, k-NN

    • ·  Empirical Likelihood
       

  • Professional Integrity
     

    • DataMineit’s unwavering commitment to professional objectivity and independence, and the highest standards of empirically based consulting and research continue to build its reputation as a firm with the highest levels of professional integrity.  Our deliverable is the completeness, accuracy, professional independence, and methodological soundness of our data analysis and data solutions, which are never compromised in any way for short-term gain.  This commitment permeates our firm culture and our entire approach to consulting and to serving our clients, whose interests we always put first by strictly and continually adhering to this ethic.