More and better data exist
today then ever before, and tremendous increases in computing speed and power
now allow organizations to utilize this plethora of information. However,
effectively and efficiently translating raw information from "Big Data" into critical
business knowledge requires the careful application of the sophisticated data
mining techniques with which
DataMineit professionals have extensive
experience and training. Our clients typically have complex business
problems for which we provide actionable solutions by applying statistical, econometric,
and algorithmic methods to what are often massive amounts of data. Our analyses are
empirically driven, carefully documented, extremely thorough and accurate, and
theoretically defensible and methodologically sound. Examples of the types
of questions we answer for our clients are included below, and our ability to
successfully address them, based on a thorough understanding of our clients’
business needs and priorities, is why firms and organizations turn to
“when they need to know.”
We have managed small, large, and components of extremely large projects spanning
a broad range of industries, some of which are listed below:
operational risk management for a Large International Bank (high-net-worth
individuals and institutional investors):
per Basel II's capital estimation guidelines,
researched, tested, and developed robust statistical alternatives to maximum
likelihood estimation for more stable and reliable severity distribution
parameter estimation (heavy-tailed distributions, truncated and shifted).
designed and gave technical statistical
presentations for internal and external use to inform and develop regulatory
incorporated multivariate regression approaches to mitigate heterogeneity
within units of measure
As the Director
of Quantitative Strategies of a venture capital firm for three years:
developed from
scratch and implemented the firm’s portfolio selection investment algorithms using a unique, proprietary dataset
containing tens of thousands of exit-related financing rounds spanning 20
the sole model developer, wrote over 400K lines of SAS code
detailed presentations of model performance to all sizeable potential
investors (including the three largest institutional investors as of 4/20/10
prior to their commitment to invest in the fund
For a Fortune
100 financial institution:
econometric modeling of the redemption behavior of rewards points
(representing over $1 billion USD in financial liabilities) using recurrent
events and survival analysis models
model selection procedures for logistic regressions, turning hours of manual
coding intervention into seconds of automated runtime
derived and
implemented an original, statistically-driven algorithm that obtains the
mathematically optimal solution to automated re-aggregation of “thin data”
segments -- this ensured reliable, robust estimation of complex statistical
measures on large-data production runs
single-handedly developed, designed, and delivered formal presentations of
new methodologies and empirical findings to senior management
For a Fortune
50 financial institution:
econometric modeling of credit risk and delinquency behavior using a wide
range of proportional and non-proportional hazards, time series, count-data,
and logistic regression models
increased the
speed of established company SAS®
macros by orders of magnitude (from over a week to 90 minutes), making
possible essential analyses that previously were runtime prohibitive
wrote original, advanced, statistical SAS®
Macros that not only are faster than SAS's® own pre-compiled
procedures (SAS® PROCs), but also generate statistics more
powerful than those generated by the client's own SAS® Macros
complex credit class rules and quantified the impacts of proposed
improvements to them
designed, and delivered formal presentations of findings to senior
Wrote and implemented permutation test statistical software for an RBOC to
satisfy the operations support services (OSS) performance measurement regulatory
requirements of multiple state public service commissions and
§271 of the Telecommunications
Act of 1996. Code is five times faster than pre-compiled code written by
another highly regarded statistical software firm consulted on the project
Conducted a comprehensive statistical analysis and data audit of the retail and
resale markets of an RBOC to satisfy state and federal (§271
of the Telecommunications Act of 1996) regulatory requirements
selected and implemented rigorous parametric and nonparametric statistical
methods for parity testing on the full range of
§271-related OSS performance
measure data (close to a thousand performance measures)
hired and managed a team of consultants during initial phase of data/statistical
parity analysis
developed a statistical algorithm for a mandated, computationally intensive
statistical test which cut computer runtime from well over a week into hours –
program is over 80 times faster than a competing consulting firm’s attempt at
implementing the same test
wrote a statistical affidavit detailing the appropriate implementation of
permutation tests within the context of OSS parity testing
employed a range of regression techniques to perform root cause
statistical analyses to determine causes of disparate service provision to CLEC
wrote the technical appendix of a statistical affidavit filed with
multiple state public service commissions
managed the implementation of strict quality control guidelines
verifying the integrity of data and statistical test results for the entire
database, reporting, and analysis system
a cost estimation for an RBOC of ISP traffic relative to CLEC local exchange
service revenue
Determined and implemented a range of
statistical sampling methods for an RBOC potentially facing large fines
regarding its call-monitoring practices
Employed a
range of parametric and nonparametric statistical sampling and testing methods
for two RBOCs requiring the implementation of a performance measure sampling
Developed non-linear price elasticity models based on years of detailed sales
data for a global manufacturer and distributor.
Managed the data analysis component of a comprehensive long-term
pricing strategy project for a multibillion dollar global professional services
firm. Identified, cleaned, and merged internal financial and client data from
numerous database systems to perform modeling for price prediction. Methods
used include hedonic regression, neural nets, and the application of resampling
methods to tobit models. For each model, constructed GUI interfaces that accept
project and client characteristics as input, and as output, predict prices (with
user-defined confidence intervals) to aid managers and partners in pricing their
Managed the data mining component of a comprehensive product,
customer, and pricing analysis for the largest privately owned retail
organization in the country. Developed and implemented a data warehouse system
linking point-of-sale data (half a billion records annually) with databases
across multiple systems (store, merchandise, store account, and department) to
perform: a) multivariate customer segmentation utilizing various classification
algorithms; b) econometric modeling of purchasing behavior; and c) sales margin,
price point, geographic, competitor, departmental, and product class analyses.
the data mining component of a comprehensive product, customer, and pricing
analysis for a national retail department store. Performed multivariate
customer segmentation utilizing various classification algorithms, as well as
competitor, price point, profit margin, geographic, departmental and product
class analyses.
For an audit of a retail manufacturer, estimated total dollars
correctly invoiced by designing and implementing parametric and nonparametric
stratified bootstrap algorithms applied to ratio estimators
Critiqued an opposing expert’s time-series – cross-section
econometric analysis of an event study in retail litigation with alleged damages
of over half a billion dollars
For Big 4 and economic consulting firms, on large litigations (e.g. $0.4b),
developed econometric models (TSCS, ARMA, non-proportional hazards) for event
studies / price estimation (airlines, software, mutual funds); presented expert
testimony in federal court arbitration; developed nonlinear price elasticity
demand models.
Managed an evaluation of statistical sampling techniques of a
Department of Justice audit of a large city’s administration of federally funded
Performed demographic analyses in support of smoking-related
tobacco class action litigation
Performed statistical analyses for an electric utility estimating
incremental benefits of improved metering accuracy from generation to
transmission, and transmission to distribution
For several large law firms, performed statistical programming for
a number of large antitrust litigation and merger cases requiring large database
construction for predatory pricing and price elasticity analyses, the
measurement of market concentration, market definition, and the calculation of
various measures of market power
Critiqued an opposing expert’s time-series – cross-section
econometric analysis of an event study in retail litigation with alleged damages
of over half a billion dollars
In addition to wide ranging
industry experience,
DataMineit’s analytical toolkit is broad and deep.
We have over twenty years of experience utilizing a range of statistical and econometric
methods with software that enable us to tackle complex business problems and
develop methodologically rigorous and defensible solutions that are based on the
data – not vague or unproven management theories. We are decade-long experts
utilizing most of the major modules of SAS®, and have custom-tailored
most of the data solutions described above using SAS® software. However, we also
have experience deploying many other statistical and scientific computing
packages, often in conjunction with SAS®, including
SPSS®, and C++. Below is a partial list of methods we turn to when
developing and implementing data solutions for our clients.
Multivariate regression (constrained
linear, logit, probit, tobit)
Monte Carlo simulations, bootstrapping,
permutation tests, jackknife
Survival analyses,
recurrent events, proportional and non-proportional hazards models
Statistical sampling (stratified,
cluster, random, systematic) and power analyses
Classification Algorithms – CART, CHAID,
QUEST, Hybrid Methods
ANOVA, MANOVA, and Multiple Comparisons
Time-series cross-section and panel data
Neural networks
Nonparametric and robust statistical
Categorical data analysis (exact tests,
contingency tables)
Nonparametric smoothing: splines,
kernels, local regression, k-NN
Empirical Likelihood