news, business, culture, and disruption

Attribution, Recursive, and Predictive Modeling

by | Mar 14, 2010 | Advertising & Marketing | 0 comments

Every marketer wants to understand where the end-users first “touch’ with a company’s advertising originated and to track or even predict how many “touches” it took, and where, to generate a conversion. Then budget can be allocated in a statistically sensible manner.
There are a number of reasons why I call these soft sciences, which I interpret to mean part science, part art, and part magic. First and foremost the cookie level technologies haven’t been developed, let alone making sure that they are collecting data in the same way.  As an aside; web analytics software, typically Omniture or Coremetrics, each has a different approach to tracking. Marketers who have adopted this type of marketing modeling are often disappointed to find that they still have to explain allocating budgets based on “confidence” and probable “significance” levels.
Companies are expecting a little more accuracy than that.
And, of course, there’ll be conflicts within the organization between display (what to do with post impression attribution?), email, and search.
To me there a several reasons why mathematical modeling for interactive marketing is currently in vogue, and the way of the future:
* The home equity, easy credit days are over. The zeitgeist has changed enough so that even people with money are much more prudent. Therefore companies can’t account on a volume of new guests coming to their site every month looking to buy. As a result many companies are trying to make amends for inattention to customers and detail by trying to get the most out of every dollar. Companies get used to getting a certain margin and execs get used to getting a certain amount of bonus.
* What I refer to as the “Sacred High Priest” Syndrome. Priests, lawyers, and financial bankers to name just a few have created a language so obtuse, and processes so complex, that one has no choice but to turn to them. They are the gatekeepers to “heaven” both literally and metaphorically. Why would marketers be any different? Maybe someday we’ll all have to be mathematicians, first and foremost, to hold down our jobs.
* It will actually work someday. The internet is essence a giant tracking sphere that can be made to act like a calculator. Mathematical marketing models are here to stay as part of the natural evolution of the web.
Here’s a compendium I put together of mathematical models that are or could be applied to interactive marketing –
Regression analysis – In statistics, regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent (observed result) variable and one or more (manipulated) independent variables. More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.
Recursion – In mathematics and computer science, it is a method of defining functions in which the function being defined is applied within its own definition. The term is also used more generally to describe a process of repeating objects in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion.
Recursion formula – A formula for determining the next term of a sequence from one or more of the preceding terms.
Recursive model – A model in which the current value of one set of variables determines the current value of another, whereas previous (or lagged) values of the latter determine the current values of the former. A series of independent models to deal with causal chains.
Predictive analytics – Encompasses a variety of techniques from statistics, data mining and game theory that analyze current and historical facts to make predictions about future events. In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. One of the most well-known applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time. Predictive analytics are also used in insurance, telecommunications, retail, travel, healthcare, pharmaceuticals and other fields.
Predictive modeling – The process by which a model is created or chosen to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of a signal given a set amount of input data, for example given an email determining how likely that it is spam. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or ‘ham’.
Structural equation modeling – There are weaknesses in this model that you should be aware of and it is not currently in use  as a marketing model but may well be some day. It may seem odd to begin with a warning, but the popular misuse and misinterpretation of Structural Equation Modeling is so widespread that users should be aware of some of the issues involved before they begin. (Please note that this warning is overly brief.)  A number of these issues also apply to Confirmatory Factor Analysis. While Structural Equation Modeling has been popular in recent years to test the degree of fit between a proposed structural model and the emergent structure of the data, the perceived superiority of the technique is waning. Aside from the fact that the results of Structural Equation Modeling are often poorly reported, the conclusions drawn do not typically grasp the limitations of the technique. The most obvious, and some ways the most critical issue, is that of incorrectly inferring a particular configuration of causal relationships from correlational data. This mistake can be illustrated with the simplest of all structural examples – that of 2 variables (variable A and B). If we ignore the additional complexity of latent structure, the number of possible causal structures is 4. Clearly, the number of possible models grows exponentially as the number of variables grows. In this example, the 4 possible causal models in this example are:
A causes B;
B causes A;
A and B cause each other (a recursive model);
Finally, A and B are unrelated.
If A and B are indeed significantly correlated, it is likely that the first 3 models will be supported by significant fit statistics. If this is the case, what has been proven? Which of the 3 supported models is the correct model? What makes matters worse is that we have not even conclusively ruled out the last model. It is still possible that the correlation between A and B was spurious. To reinforce a maxim that most people know, but fail to apply to Structural Equation Modeling – you can not determine causation from correlation. Yet in most cases, researchers only test one or two models out of all the myriad of potential models, poorly report their results, and then proclaim confirmation of their model (implying the exclusion of all other possible models). So what is the value of Structural Equation Modeling? If large correlational datasets are already available, and a large range of plausible models are assessed, the results can be valuable in conceiving an experimental study that can test the proposed causal relationships.
How does a company determine which model is the right model for it?
First click?
Last click?
Weighted attribution?
Equal attribution?
Cascading attribution?
Is there a simple solution?
No. Not yet. Technology vendors such as ClearSaleing and TagMan provide a universal tracking code that can be dropped on to your site and will identify all your other marketing pixels with the same unique code so that the data matching can be done more efficiently. A tool like TagMan also manages your pixels away from the site, and so tag changes no longer need IT resources.
There are also ad serving tools that allow for path to conversion analysis. Both Atlas and Doubleclick have moved in this direction but theirs solution require all data to flow through their systems.
Agencies are currently or will eventually approach this from the dashboard perspective. Analytics teams will accept that clients have historical tagging place and work then to collect the data from those legacy systems for the purpose of mixing that data together for the purpose of  achieving a desired outcome with a degree of statistical confidence. The data can be presented through our online analytics system and then downloaded into an excel spreadsheet.
Akamai is a content distribution network working with busy websites and ad servers globally to ensure that end-users have a quick and positive experience. They already have a sites content running through their servers and are already placing their domain cookies so they could conceivably be seeing enough data to attribute across channels some day.
But again, right now, in my opinion, the sane course of action today, is the same course of action you took yesterday; optimize each channel to its fullest.
Please note that some of the information in this article came from reference material.