Topics for Discussion

One of my goals for this year is to ‘take the pulse’ of the hydrologic community on topics of interest – academic and practical.  I will maintain some topics for discussion here.  Please feel free to post, or read ahead before I visit you!

What are the most promising frontiers in hydrologic measurements?

What are the limits to incorporating uncertainty into hydrologic consulting?

What are the key impediments to using hydrologic model results for decision making?

Henry_Darcy

 

Conversations about decision making … a collection of interesting insights and questions/challenges that people offered during the year.

 

12/5/16 – Zachary Stanko – USGS, San Diego

How do we make the argument to clients that we should make multiple models?

The way that we see it is this.  Imagine that you are talking to a client and you can tell them, honestly, that you could build ten different plausible models of a system – and that we can’t really tell you which one is correct (if any – probably none).  Give them the choice – we can develop each model partially to give them an idea of what they predict and then let them tell us which ones are important to them; or, we can pick a model, really tune it, but tell them nothing about the other models.  In our experience, all clients have opted for the first case.  They have all wanted to be ‘in’ on the decision.  But, I’d love to hear other ideas for how to present this approach!

12/5/16 – Zachary Stanko – USGS, San Diego

Can we adopt any tools from actuarial science to come up with better ways to give useful answers to clients?

I am sure that your intuition is correct and that we can.  My concern would be that we may not have the data to provide them to use their tools.  In general, we have relatively few hydrogeologic studies (compared, for instance, to statistics on human health or longevity or car accidents).  Those studies that we have are not standardized and usually very data limited.  So, as with our limits due to data paucity, I think that we will be limited in our ability to adopt actuarial models.  Having said that, I would not be at all surprised if those fields have developed very useful approaches for expressing and explaining uncertainty and for using uncertainty measures to one’s advantage – that would be of real interest to many, I’m sure!

12/5/16 – Eric Reichard – USGS, San Diego

Many of the models that we create are very complex and require a lot of time to create and to run.  It seems daunting to try to create multiple versions of these models.  Is that practical?

Sometimes, no … and I don’t come down on either side of the simple versus complex model debate in our community.  Personally, I think that a model should be just as complex as necessary to answer the questions being asked – and no more complex.  I also think that we could do more exploration of which complexities are really necessary – which actually make a difference in the context of the decisions that we are trying to support.  If a complex element can be eliminated with little or no impact on the predictions of interest, then we should eliminate it.  But, I think that it is also difficult to support the idea that ALL models of ALL systems should be simple.  For some systems, the complex elements are the most important (e.g. fractures).  I think that the most important thing is that we ‘own’ our decision to make models large and complex.  This is not cost free – in the context of my talk, it comes at the cost of considering multiple competing models.  My personal feeling is that this trade off cannot be justified for many applications.  But, each modeler has to make that decision (and defend it).  The only think that I think is not supportable is to make large complex models ‘just because’ that is what you can do well or that is what differentiates you from other modelers.  Perhaps it is a real overstatement, but the mark of a real craftsman is the person who chooses the right tool for the job, not the one who can make any tool work for any job.

12/5/16 – Eric Reichard – USGS, San Diego

The SIGMA program is going to require a large number of basins to formulate models, many of which have never had a model before.  Can you offer any ideas about how this process could be streamlined, or at least made more effective?

I did attend the SIGMA conference in Davis in February and I found it very encouraging.  There seemed to be some movement towards adopting a common modeling framework – which has pros and cons, of course.  One thing that I didn’t hear was whether there would be efforts to require that models be shared once developed.  I think that this would be a great idea – to make the model building process more transparent, to provide versions of models – not just the best fitting model – so that others could learn from them.  It would be fantastic if we could also implement ideas like formal identification of uncertainties in assumptions (see Luk Peeters for this) and intentionally looking ahead to model predictions during the calibration process (see Jeremy White for this).  Given that there is both diversity across the basins and, presumably, a high degree of similarly among some basins, I think that this could be an amazing opportunity to develop approaches for community model ENSEMBLE building.  It seems that one of the biggest challenges may be the curse of plenty – there is so much work to be done that everyone will be involved and there may not be much effort placed on coordination of efforts.  But, I hope I’m wrong about that! 

Peter Quinlan added a comment here – in terms of SIGMA, I think that investigating the uncertainty is an absolutely critical part of the effort.  Important decisions, which affect people’s livelihoods, will be made based on these models.  So, we have to give people better information so that they are making really informed decisions.  Too much is at stake to rely on a single model, which is likely to be wrong, as the basis for decisions.

12/5/16 – Claire Kouba – Dudek, San Diego

I liked the idea of points of divergence of storylines.  This made me think of using ‘A Tale of Two Cities’ as a way to explain how stories (or analyses) can turn on a very small event or a small amount of added data.  I think that this could be very true in some cases, but in other cases it may be seen as selling something rather than conducting objective science.  Do you have any examples to describe what you mean more completely?

Firstly, I love the idea of using A Tale of Two Cities … I’ll have to find a way to work that in!  I am also sensitive to the idea that we have to avoid the appearance of doing ‘advocacy science’.  But … only to a point.  I think that the problem with being an advocate for an outcome is that it pushes us to ignore some data or to ignore other equally plausible explanations or descriptions of a system so that we can support a favored predicted outcome.  On the other hand, in the context of forming an ensemble of models, I think that advocacy can actually be useful.  The key is that we should (in my opinion) form models that intentionally represent the concerns (or preferences) of stakeholders as members of the ensemble.  This is the best way to address, and perhaps alleviate, these concerns.  If we have multiple stakeholders, or a single stakeholder with multiple concerns, then we will need a larger and more diverse ensemble.  This formation of models that represent what a stakeholder believes, or at least fears, to be true is what I mean by reaching back to the point of divergence in describing a physical system.  But, it has to be explained clearly and it requires involved conversations about concerns and what would be required to address them at the beginning of a study!

12/5/16 – Scott Paulinski – USGS, San Diego

Is it ever possible to have too many questions to answer?

I don’t think so.  At least as I see it, when we form our model ensemble we should be including models that we think are the best representations of the system (ideally multiple competing versions).  But, we should also include plausible models that represent stakeholders’ concerns (or preferences, in some cases).  If we have multiple stakeholders, or multiple questions, we would simply add some models to represent each stakeholder.  I think that we can do this quite flexibly if we don’t require that the membership of the ensemble also represents the model likelihood.  We want the models to consider important differences, not all possible differences.

11/18/16 – Markus Disse – Technical University of Munich, Germany

Is it possible to assign probabilities to the models that underlie the plume image that you showed?  If I understand correctly, you have chosen some threshold to categorize a model as acceptable, but could it be more quantitative?

I think that one could do that – essentially, calculating model likelihoods based on goodness of fit to the data and then plotting either a cumulative likelihood that a concentration will exceed some value at a location, or plotting the likelihood weighted predicted concentration in space.  But, we are trying to explore approaches that do not, necessarily, produce enough models to be statistically rigorous for calculating probabilities.  Rather, we want to support ‘categorical’ risk levels.  This does require some threshold for model acceptability, which is a place that subjectivity can intrude.  But, this level of acceptable data fit to indicate plausibility is hard to define absolutely – I think that it should also be the subject of discussion with a decision maker.  At a minimum, they should have input into how implausible a model should be before they feel comfortable ignoring it.  All of that said, I think that you have put your finger on one of the key things that has to be discovered if we are to use scientific models to support decision making under uncertainty – how unlikely should a scenario be before you feel comfortable assuming that it won’t occur?!

11/18/16 – Markus Disse – Technical University of Munich, Germany

Averaging of models is very common – it is often shown to lead to these Gaussian uncertainties around the maximum likelihood – isn’t the wide use of this approach an indication that it is useful?

As with everything … it depends.  Let’s take an election between two candidates as an example.  There is a black and white outcome – one or the other wins.  We can take many different polls and simply take the average to determine the maximum likelihood winner.  But, we recognize that some polls are better than others (based on accuracy of previous forecasts).  So, we can weight the polls based on their likelihoods.  But, even this approach relies on the errors in the various polls ‘cancelling out’ errors in other polls.  What if we believe that some polls have a unique and possibly valid way of looking at the data and what if these polls make very different predictions?  If that outcome is of real concern to us, shouldn’t we consider that poll disproportionately?  Perhaps not if our interest is in predicting the outcome of the election.  But, what if our job is to try to design a campaign to win the election?  Then this poll, even if it has a relatively low chance of being correct, should be considered very heavily.  If taking action based on this poll requires a lot of effort, then we should identify discriminatory data that has the chance of testing the validity of this poll against other polls.  Perhaps, if the Clinton campaign had done this, they would have recognized some common failures in the data collection underlying the recent US presidential polls, for example.  We are making this analogy to hydrogeology.  If you only try to build the most likely, most commonly believed model, you will never see these surprises coming.  Even model averaging can’t necessarily help you, because if I tell you that there is a 2% chance of an outcome, you will hear that it is all but impossible.  But, in contrast, if I tell you that there is a plausible storyline (model) that would end up in a bad place for you, you may be more tempted to support data collection to test that model. 

11/18/16 – Eric Seagren – Michigan Technological University

Would you do the same study, in the same way, if you had been hired by the property holder to the north of the traveling plume?

This leads to a fascinating and interesting discussion, I think.  Of course, one of the charges laid against consultants is that they are too willing to tell the story that their client wants to hear.  I think that posing the job as – build the best model that you can of this system and then be prepared to defend it – is really encouraging advocacy science.  But, I believe that if we rephrase the question as – try to envision the pathways of failure of the system, build an ensemble of models to represent them, and then collect data to test the riskiest models – then we align the interests of multiple parties.  In the case of the traveling plume, both parties are interested in testing whether the plume will leave the site.  In the mine/rice case, the parties have different specific interests.  But, we can hope to promote collaboration by seeking data that addresses both of their concerns.

11/16/16 – audience member, Ljubljana, Slovenia

I think that people lose sight of the fact that a model is just a tool.

I agree entirely.  But, the question is – a tool to do what?  We often think that models are tools to test hypotheses.  But, if we only have one model, how can we do this effectively?  We can predict what we would expect to measure anywhere and at any time.  If we measure something different, then we may conclude that the model is incorrect.  But, if the model agrees with the new data, does that increase our faith in the model?  How do we know that the same outcome wouldn’t have been predicted by many other models?  To use a terrible analogy, just because we can drive a screw into wood with a hammer doesn’t mean that we wouldn’t prefer to find a screwdriver!  So, I agree that a model is a tool to help people to make better decisions and to identify discriminatory data – but, an ensemble of models is an even better tool for both!

11/16/16 – Andrej Lapanje – Geological Survey of Slovenia

Can you only use your approaches if you have really simple models?

To some degree, yes.  But, I think that the question may be better the other way around.  Are the proposed reasons for building multiple models, both for insight and for communication, of primary importance?  If so, then this may be another reason to prefer simpler models.  In some cases, like reactive transport with biodegradation in fractured media, the processes are inherently complex.  Perhaps a simple model will never be sufficient.  In other cases, the scale is so large – perhaps the entire US – that even a simple model will be incredibly computationally expensive.  For these cases, multiple models are not practical.  But, in many other cases, I think that we default to building complex models because we can.  Our models always seem to take the same time to run, even though our computational resources continually improve!  This is a choice – a choice to use our resources for examining complexity at the expense of model diversity.  The message of this talk is, simply, that we should think carefully before making that choice.  Perhaps a place to start is to reduce your model complexity and calibration efforts to 1/3 so that you can build three models!

11/14/16 – Daniele Bocchiola – Politecnico di Milano

What about the constraint of time for a project?  Do decision makers actually like this idea given that it requires more work?

You are correct, we often have two related constraints, time and money.  Doing some modeling up front may shift some time and funds to modeling from other activities.  But, our main contention is that because data collection is often so expensive, any savings there can make up for some increased modeling. In particular, if we can avoid collecting data that we should have known was likely not to be informative, we will end up better off in the long run.  In many cases, I think that we can also improve the timeliness of our work by thinking more carefully and critically about the hardest and most expensive work – data collection.  That is, if we can collect more informative data the first time, we may save a repeat trip to the field.  Many, actually almost all, clients have been happy with this approach.  The exceptions are – if the time is simply too short to do modeling, or if the client doesn’t really care about using the model and just wants the cheapest way to satisfy a requirement.  I think that decision makers like the approach because this is how we explain it.  Imagine that our current knowledge of the system is limited enough that there are ten models that we cannot choose between right now.  They have a choice.  We can ell them something about all of them, or a lot about only one of them (one that we choose)? From that context, the multimodel analysis seems pretty attractive to many decision makers.

11/14/16 – Daniele Bocchiola – Politecnico di Milano

I think if I were a member of the IPCC it would be hard for me to sleep at night, knowing that people are making such expensive and important decisions with IPCC models.  But, on the other hand, I think that they have done an amazing job of showing how scientists can work together and communicate complicated ideas.

I agree on both counts.  It would be very interesting to examine the work of the IPCC to see if we can learn some general lessons from them about how to do science on smaller projects.  At the same time, I think that we can also make use of Eagleman’s insights.  That is, can we build specific, detailed competing stories that we can compare in our minds as part of our natural decision making process.  I think that this is especially important for smaller projects, where we may have some hope of developing five stories that really capture the range of plausible outcomes in the future.

11/14/16 – Alberto Guadagnini – Polytecnico Milano

What do you do if different models require different types of data for calibration?  A related, but more complicated, question is – what if we don’t currently have a way to measure some of the properties that we would like to measure?  As an example, I am curious about measurements that may be able to test the concept of a dual permeability medium?

I think that you only strictly need to worry about using a common data set if we want a ‘fair’ maesure of the likelihood.  That is, this only matters if if we want to form a quantitative prediction PDF using multiple models. To be honest, I am not sure that we can ever really do this because of our limited examination of model structural uncertainties.   If we can agree to give up on this goal, then I think that it relaxes the need for common data support and, perhaps, even avoids questions of model complexity comparability.  We are really just trying to find models that are plausible and ‘important’ by some stakeholder’s definition.

On the question of considering measurements that don’t yet exist, I think that this is one of the real benefits of DIRECT.  We can use our hydrologic model results to feed into an instrument forward model and then ask the question – would these data be discriminatory IF we could collect them?  If not, then we shouldn’t spend the effort to build them!  But, if the new measurements are the best bet for discrimination, then we should make the effort.  In some circles, remote sensing I think, these are called OSSE (Observation System Simulation Experiments) analyses. 

The dual permeability measurement question is trickier.  Here, I think that the main difficulty is that some of the ‘states’ are not actually physically real!  So, in that case, we cannot measure them, right?  I think that this is different than saying that some states are spatial averages of actual quantities – that is the sort of thing that we can handle with OSSEs, to some degree.

11/14/16 – Daniele Bocchiola – Politecnico di Milano

What about value of long term monitoring, for example in the lifetime cost of infrastructure?

We haven’t really thought about this enough.  It is a difficult thing to do, as a scientific analysis – mostly because there are so many possible monitoring plans to consider and we won’t get feedback for some time.  But, I think that we should at least try some ad hoc analyses to consider some alternative monitoring scehems.  (It isn’t like us, as scientists, to give up on an analysis because it is difficult, right?)  Again, I think that this is a great example of something that would benefit from developing a limited number of competing stories, say about lifetime costs of alternatives with and without monitoring, to help people  assess the likely value of having (or the cost of not having) a monitoring system.

11/11/16 – Zhao Chen, Karlsruhe Institute of Technology, Germany

How do you design an ensemble of models?  Do you use an approach based on increased complexity?  Do you need to consider the computational demand of forming many models when deciding on the number of models and their individual complexities?  Also, just to clarify, why do you consider the ‘bad’ models?  Aren’t they poor representations of the real world?

The formation of a good model ensemble is, in my opinion, one of the most interesting things that we have to do as we move forward in hydrogeology.  It is very interesting how different these questions are: What do you think is the best model to use to represent this system?  Which set of models best represents what you do and do not know about this system?  I think that they latter requires much more creative thinking and, therefore, is more likely to result in useful results and novel scientific insights.  I think that some form of complexity increase could be very useful, as could a model tree construction approach, like I showed.  Honestly, I think that this is such uncharted ground that any new ideas are likely to make a contribution!  In the end, we will likely have to balance accuracy, uncertainty, and computational demand (cost/time).  The last metric changes (decreases, generally) all the time, but it cannot be overlooked.  This offers another domain for optimization – I can’t wait to see what smart people come up with in this area!  As for the ‘poor’ models – I was not clear in what I meant.  By ‘poor’ or ‘bad’ I don’t mean that they have a bad fit to data.  I mean that they predict a ‘bad’ outcome in the opinion of one or more stakeholders.  They should still be plausibly good, based on likelihood.  But, I do think that we should balance goodness of fit with importance of prediction as part of the calibration process – rather than relying entirely on goodness of fit, as we tend to do now.

11/11/16 – Nico Goldscheider – Karlsruhe Institute of Technology

You showed an example with many models.  Did you only use one type of software – and which software did  you use?  Secondly, does this represent ‘different’ models?

Tim Bayley did the modeling and I, if I remember correctly, he used MODFLOW with MODPATH.  You are right, too – we used the same software packages for all of the simulations.  So, in some sense, this really wasn’t an ensemble of different models.  But, I tend to be more liberal in what I define as a ‘different’ model.  In some sense, if I only change one parameter, but it cause my model to fit my data while making different predictions, then I prefer to think of that as a different model.  Ultimately, though, we would want to expl;reo more ‘fundamental’ differences – in part because they are more likely to have profound effects on predictions and process understanding. But, I am comfortable with an incremental approach.  It is much better to include some additional models with some added complexity than to rely on one model.  It is even better if we examine different boundary conditions, or geologic structures, or process descriptions, or even underlying mathematical representations.  The real challenge ahead is to try to develop guidelines, or at least intuition, to help to predict which differences are the most important differences to continue for any given application.

11/10/16 – Anneli Guthke – Universities of Tübingen and Stuttgart, Germany

(Anneli explained something very interesting during a conversation in the afternoon following my lecture.)  We should consider an error model as another degree of freedom in our models.  By acknowledging the existence of conceptual uncertainty and the need for multiple models, we admit that no individual model will ever be perfect and that each model suffers from structural errors. Hence, the discrepancy between model predictions and data cannot be explained by measurement error alone. That is, while it may be correct to assume that our measurement errors are not correlated – only due to random operator and instrument variability – it is not reasonable to assume that errors in our model predictions are uncorrelated.  In fact, they are almost certainly correlated because they relate to structural deficits of the model that typically result in systematic biases as opposed to random errors.  Because of this, it may be misleading to calibrate our model simply by seeking the maximum agreement with data. We should find a way to isolate the model error from the uncorrelated part that can be attributed to data error. Only then, our routines for model calibration, comparison and averaging will be formally valid and will provide more honest results.

I have to admit that I am still working my way through this.  But, it sounds fascinating.  I especially like the idea that it might be used to assess very different models subject to the same data.  I’ll look forward to reading more of Anneli’s papers on this (and related) topics!

11/10/16 – audience member – University of Stuttgart

Many of the models that we produce are really similar to each other.  This doesn’t seem very efficient.

I agree entirely!  I think that much of the work that we do (including having to run many, many models) to produce quantitative prediction PDFs is, essentially, wasted.  I don’t know how to do it – but, I think that we need to develop a reduced model ensemble that covers prediction outcomes with plausible models without this wasteful repetition.  At some level, this requires that we have some measure of ‘model similarity’ that we can use to exclude like models from consideration during model selection efforts.  But, I am not sure how to do that.  I don’t think that it should be in parameter space, or even, perhaps, in a space of model structural descriptors.  I think that the model closeness has to have something to do with the predictions that models make that will be used for other science or for decision support.  Until we can find clever ways to do this, we are, to some degree, stuck with producing a lot of repetitious models because we are not smart enough not to!

11/10/16 – Sergey Oladyshkin – University of Stuttgart

You use the term ‘models’ – but, are they really different as you have defined them for your problem?

This can be a very important question – at a minimum, it is an important definition to agree upon.  Different fields of study have different definitions of what constitutes a model – where is the boundary between a new model and variations of a model.  But, for our purposes, we usually don’t care about this difference.  We prefer to use the term ‘model realizations’.  In this sense, if I only change one parameter, but it results in an acceptable fit to the data and importantly different predictions of interest – then, I want to keep track of this description of the system.  In some cases, the goal is really to determine which ‘model’ is correct (this can be applied to any definition of model) across all of its realizations.  But, I think that this is actually quite rare – especially in hydrogeology.  Even our ‘pure’ scientific endeavors are usually, really, input to other fields of study.  They don’t, typically, care why we see some range of hydrogeologic states or fluxes – they need to know what we think they will be, when they will occur, and, often, whether they will exceed some thresholds that will lead to important outcomes in their linked systems.

11/10/16 -Wolfgang Nowak – University of Stuttgart, Germany

Understand – I am a strong proponent of multimodel analyses!  But, let me ask a question as a devil’s advocate.  Can you really, as an individual, propose different models?  Or, do you need to hire a team of individual consultancies?  If so, won’t that be prohibitively expensive?  Does your proposed way not shift money from rather cheap computer power for detailed calibration to more expensive costs for modellers‘ manpower?

At some level, you are absolutely correct.  I think that the best examples of multimodel analysis have done this – the IPCC and both the European and US nuclear waste facility investigations.  But, these are big, expensive, important projects that warrant this level of expense.  For smaller projects, which is the vast majority of the work that we will do as hydrogeologists, the overhead and establishment or mobilization costs make it prohibitive to hire multiple consultancies.  Now, that could be changed, given enough incentive.  In fact, I have proposed a scheme to city governments that would spend ¼ of the budget for an initial model proposal stage, split among multiple consultants.  They would all contribute their models and then, based on the quality of their ensemble, a winner would be chosen to move forward – with the advantage of having everyone’s models in place!  So far, no takers.  The other alternative is to find ways to encourage ourselves to be more flexible and creative.  This should start in school, frankly.  But, we can also take advantage of approaches like ‘pre-mortem’ analyses (imagining looking back and discussing why your project has failed) and the use of hired devil’s advocates.  I don’t pretend that the switch to the approach that we are recommending would be simple.  But, I am convinced that it could offer tremendous benefits, both for science and practice.

11/10/16 – Martin Beck – University of Stuttgart

How does DIRECT work in a legal context?  Is the effort worthwhile for these applications?

We think (and we have heard from a few lawyers) that this is quite promising.  I have a link on the blog to documents provided by Paul Hsieh, who describes how this approach contributed to the decisions regarding the amount of oil spilled in the BP case.  In general, we think that it has benefit because of the structure of the legal system (at least in the US).  There, if a hydrogeologist is called as an expert witness, the objective of the opposing council is to cast doubt on their work.  If you have a very well defined system and you have a lot of local experience, this may not be an issue – your judgement is better than anyone else’s.  But, I think that it is more commonly the case that there are significant uncertainties – major decisions that you had to make with insufficient information.  When this is the case, it would seem that you would be in a much stronger position if you can say that, yes, we examined models that made a different assumption, and they we can describe what they predict.  The alternative is only to say, no, we didn’t try those models because, in our judgement, we knew that they were wrong without ever testing them.  Again, this may be acceptable for some assumptions, but not for all of them!

11/10/16 – audience member – University of Stuttgart

Most typical approaches use minimal brain power (to develop a model) and a lot of computer power (to calibrate and assess uncertainty).  Your approach is advocating more human work and less computer work.  But, computer time is getting cheaper with time and human work is not.  So, does your approach make economic sense?

Very interesting way to cast the problem!  You are right, of course – we are suggesting that we trade human work for computer work.  But, only at the first level of analysis.  What we are saying, more fundamentally, is that human work is required to combine relatively soft knowledge (e.g. geology, basic physical principles) with hard data (state and flux observations, maybe some property measurements).  This is necessary to do two things, demonstrated in the following.  Imagine that we rely entirely on computer work to make a set of hydrogeologic predictions.  At least at this point in history, we will end up with a lot of models that do a great job of fitting the data that we have, but that are profoundly wrong.  (They are right for the wrong reasons.)  Some of these models will make very important (but completely wrong) predictions that may waste our attention and human effort.  To avoid this, we need to filter the good fitting models for ‘reasonableness’ – which requires human work.  In some cases, the physically or geologically unreasonable models may pass our first inspection.  If they make important predictions, then we need to collect data to test them.  If they are wrong, then we will expend even more human work to collect and interpret data to eliminate them.  But, they were wrong all along and we should have known that – so, again, an overreliance on computer thinking costs us more human work.  Essentially, all that we are saying is that an improved mix of human and computer work, early in a project, could reduce both our total human and total computer work.  Then we can use our extra computer time to automatically write pop songs, and our extra human time to listen to them on a beach.

11/10/16 – Giulia Giannelli – University of Stuttgart

I had a summer job with a hydrogeologic consulting company that tried to determine the history of a contaminant plume to determine who was likely to be responsible for clean up.  But, in this case, we can’t ‘go back’ and collect any more data.  Does DIRECT still work in this case?

I think that it does with only a slight modification.  For the plume migration case that I showed, we have the advantage of being able to measure what we are interested in predicting … the spatial distribution of the contaminant through time.  In your case, as you said, you can’t measure the history of the plume.  But, you can still look for discriminatory data.  The data may be less directly related to the concentration distribution – maybe head values across the field would discriminate models that suggest that company A is or is not responsible.  Maybe the later migration of the plume, or the distribution of another constituent would be discriminatory.  We can still use our ensemble of models to search for these discriminatory data and then we can update the model likelihoods with the new data.  In many ways, we are acting like a detective – trying to piece together evidence to test different theories of a crime (that has already occurred) and looking for information that we can collect today to test what happened previously. 

11/8/16 – Olaf Cirpka – University of Tubingen, Germany

This seems to be very related to scenario testing.  But, that approach is usually limited to a few models of concern.  Your approach seems to deal with a lot of models.  Can people afford this?  What if they want to do something like this for reactive transport, say arsenic models on the scale of Bangladesh?  This is impossible, especially when we consider that geochemists may have many different opinions about the fundamental processes and the related models.

There is no doubt that you are correct.  In reality, we cannot run full, complex models with a lot of process uncertainty at large scales.  But, I would question the value of building any one model under these conditions.  In this case, we can only be sure of one thing … that our model will be wrong.  OK, maybe two things, our model will be wrong and we won’t have any considered any viable alternatives.  So, I think that on the one hand, our approach would suggest that this is an inappropriate goal for modeling.  We should reduce the scale, if nothing else, to make sure that we can examine the most important uncertainties.  If we don’t, then we, as hydrogeologists, are making some pretty poorly constrained decisions that could have very important consequences.  Further, we won’t be the ones taking responsibility for those consequences.  On the other hand, our approach argues for MORE complexity than simple scenario testing.  At least as I understand it, this requires a hydrogeologist to propose a small number of combinations of structures and parameters and boundary conditions to represent different expected conditions.  (Often these are coupled with end point descriptions of human actions – e.g. business as usual, or aggressive conservation.)  I think that this is a useful way to consider the effects of human actions on well understood systems.  But, if you want to consider the impacts of our uncertainty regarding those systems, we need to do some amount of searching through model space.  In the end, I think that you were right to cast it as a question of what we can afford.  But, this has two sides to it.  We cannot afford to do all of the tests that we might want to do (real or numerical).  But, neither can we afford to make decisions as scientists that lead to predictions that don’t examine plausible outcomes that are driving people’s decisions.  If we do, then science won’t be seen as relevant when important decisions are being made.  (Not to overdramatize the point!)

11/4/16 – Arthur Petersen, University College, London

If we think about something like climate models, which have major impacts in the future, don’t we run into the problem that we cannot find something to measure that can discriminate among them?  Further, how can we convince the climate change community to operate differently?

Our premise is that models that predict different future outcomes will also have some measurable differences in the nearer term.  There is no law that states that this is sure to be the case, of course.  But, if we do not believe this, then there is little reason to collect any data!  If we do start from this premise, then we are simply developing tools that can allow us to assess different proposed observations on the basis of how likely they are to be discriminatory.  The essential point is that if a measurement will be useful to constrain a model, then that model must be able to predict a value of that observation.  If this is the case, then we can examine the expected differences in any proposed measurement from our model ensemble.  This allows us to find discriminatory data.  As to changing the approach of the climate community?  First, that is well above my pay grade.  Second, I’m not sure that they have to change entirely.  But, I do think that it would be worthwhile to add some element of this approach to the discussion.  Personally, I would like to know what constitutes those 5% of models that are most optimistic?  Is each dependent on some unlikely assumptions?  Do they assume some disconnection of processes that cannot be defended?  Similarly for the best case models.  I don’t think that we mean to say that if we ‘ran the world’ into the future 20 times, one of those times might be just fine.  I think that we know more than that and I would like to have it explained to me in a digestible way.  I think that many people, especially nonscientists, may agree!

11/4/16 – Mario Schirmer – EAWAG, Zurich, Switzerland

The plume study just had one stakeholder, right?  What if we also asked the neighboring land owners which models are of greatest concern to them?

In this case, interestingly, both of these stakeholders may have the same concern – that mass will leave the site!  The key point is that the perception is that one side wants the opposite ANSWER to the question than the other.  Or, at least, they want all assumptions to be made to support the opposite answer.  Here, we can align their interests because we say that the first step is to try to build as many plausible models as we can, with an emphasis on those that suggest that mass will leave the site.  Then, our objective is to test these models against all other models – not to prove or disprove, but to collect the data that has the best chance of testing these models in either sense, pro or con.  In other cases, like the stakeholders will have very different preferences.  One example is the ‘mining and rice’ example.  In that case, the mine wants large drawdown at the mine site while the locals want low drawdown at their rice paddies.  In this case, we need to find data that can test models that predict either of these outcomes all other models.  So, conceptually, it is the same for one or multiple stakeholders!

11/4/16 – audience member – EAWAG, Zurich, Switzerland

How are these ideas accepted in legal contexts?

I think that it has real promise – in particular because it insulates scientists from being forced to make too-strong statements about models that are, by their nature, uncertain.  If we had to defend any decision that we made in constructing a model, I think that we would happily opt for approaches that allow us to examine these key assumptions.

11/3/16 – Niklas Linde – UNIL, Lausanne, Switzerland

(The following is distilled from conversations, not a question after my talk.)  Most of the techniques that we use to quantify prediction uncertainty should really be highly qualified as, ‘Appropriate given the many assumptions that we have made for the specific scenario described.’  That is – even if we use formal Bayesian analyses, we cannot escape the influence of our priors.  The potential risk of not being explicit about these influences is that we can overstate our confidence in our uncertainties.  My students and I are trying to tackle problems related to this, especially those relaed to extracting information from geophysical surveys.  But, I think that this has implications to many issues within and beyond hydrogeology.

After meeting with Niklas and his students, I have a much more solid grounding for some of the arguments that I have been making qualitatively.  I am really looking forward to seeing some of the work that they are doing in press soon!

11/3/16 – Josh Larsen, University of Queensland

You mention using 8000 models for your transport example.  But, are the models really fundamentally different?  Or, are they slight variations of the same model?  If they are actually different, can you combine models with different levels of complexity in a rigorous way?

I think that this offers some flexibility to approach this as a pure scientist, an applied scientist, or something in between.  Ideally, we would fully explore model space, including considerations of fundamentally different approaches to describing processes, different numerical or analytical schemes, etc.  In practice, it may be a stretch to even consider multiple parameter sets for a single model.  To me, the most important thing is to avoid using a single model.  The more time and budget available, the more fundamental uncertainty should be considered.  But, as we increase the scope of our investigation, I think that we should make use of some professional judgement.  In particular for applied cases, we should focus on those differences among models that we could imagine would lead to ‘important’ differences in predictions.  I’m not convinced that we will ever reach the point that we can fully describe model structural uncertainty – so, perhaps it is a fool’s errand to try.  But, I do think that we should make efforts to expand our modeling horizons.  I think that this relates to the second part of your question, too.  I agree that it can be difficult to combine very different models in a single, quantitative description of prediction uncertainty.  (Essentially, I think, it is hard to define a ‘fair’ quantitative weighting of models with different levels of complexity.)  What is more important, to me, is that we will never really know how completely we have explored the range of plausible models.  As a result, no matter how clever we are and how much work we do to quantify prediction uncertainty, I think that we are always strongly limited in how confident we can be that we have captured the true prediction uncertainty.  So, I think that there is real value in developing multiple competing models, regardless of how quantitatively comparable they are, as a way to provide a comprehensible, qualitative description of our uncertainty grounded in a discrete set of models.

11/3/16 – Natalie Ceperly, UNIL, Lausanne

In keeping with the nature of your talk, I would like to ask a different kind of question.  Can you comment on how your approach might be used in a case like the ongoing Standing Rock Reservation conflict?  Specifically, it seems that water quality is being used as the main argument to rally people around the cause; but, it is not clear that water quality is really the central issue, and I do not think that perfect engineering of the pipeline to prevent water contamination would satisfy the protests.

 Thanks for this … it is great to get a really different question!  I think that this is an interesting case.  Perhaps I will take a bit of a change of direction from your specific question and try to contrast this case with the general case of fracking and water quality.  Both cases do, ultimately, concern water quality.  Of course, they involve much more, too – property values, natural ecosystem health, traditional land uses and valuations.  But, they both have some element of a potential threat to water supply. What can we do, as hydrologists, to help people decide whether they should fight these developments?  We contend that what we can do is to listen to the concerns that are driving the decisions in both cases, as they relate to hydrology.  For the Standing Rock, the concern is the possibility that a pipeline leak will contaminate drinking water supplies.  For fracking, the concern is that drilling activities (including return water, propants, etc) will affect drinking water supplies.  In each case, we can take a standard approach, which is to try to define a ‘current best’ description of the hydrologic system and then consider specific scenarios to determine the risk to the water supply.  We help them to understand the risk that they face by ensuring that we are considering a range of release scenarios with associated risks of these releases.  This is a good first step.  In the case of the Standing Rock, I would guess that this analysis would show that the risk of a leak over an unconfined aquifer is too high to ignore.  So, I think that it would add fuel to the fire of fighting the pipeline.  But, what about the fracking case?  There, I could imagine that the ‘most likely’ subsurface model might suggest that fracking is not likely to cause significant pollution.  (My understanding is that this is the current consensus, but I could be wrong about that impression!)  If this is the case, I don’t think that this would be enough to satisfy most people who oppose fracking on environmental grounds.  Rather, I would expect that they may realize that the most likely model is not necessarily a good representation of reality.  Further, if they ask us this question directly, I don’t think that we can honestly claim otherwise!  So, what should we do?  I think that we should also build other models – specifically, models that we cannot discount as plausible, but that would lead to contamination if they are true.  Then, we should try to find data that can test these models against other, nonpolluting, models.  The key discussion to have is to ask stakeholders what evidence they would need to change their mind about the risks that they face.  Some people will say that there is no amount of scientific evidence that could change their mind.  Honestly, I think that we have to agree that we cannot help them with our science and focus on people who are open to scientific input.  For those people, we have some chance of focusing our scientific studies on reducing the uncertainties that may change their mind.  Still, in these cases, we have to retain our objectivity – we need to be clear from the onset that any measurements that we make are selected to EITHER increase or decrease their concerns.  The only thing that we are doing is to try to reduce the chances that we will do work that will not change their mind about the risks that they face.  In the end, I think that this is the best way to advance our science while also demonstrating the value of science to society at large.

10/28/16 – audience member – Kyoto, Japan

How do we form a model ensemble in a way that avoids bias?

This is most important if we want to produce predictions with statistically valid uncertainties.  I am increasingly convinced that this is not possible given our lack of formal treatment of model structural uncertainty.  Furthermore, this carries a huge computational burden.  I think that we need to find ways to develop ensembles that represent the range of plausible outcomes.  This means that we give up on the goal of producing quantitative uncertainties.  But, it also relaxes some requirements for forming ensembles.  The emphasis is then on producing the ‘most different’ plausible models, which should help to guard against bias.

10/28/16 – Masaru Mizoguchi – Tokyo University, Japan

Following on your element of storytelling – I have spent a lot of time working on the problems related to Fukushima these five years.  In this case, there are many facts and many people telling different stories.  The problem is – even if we make a good, scientifically informed decision, many non-scientific people don’t believe the scientific basis of the decision. Maybe this is most serious problem in science and technology caused by the Fukushima Daiichi nuclear disaster in Japan. So, I think that it is also important that we find better ways to communicate our science to the public to support decisions that have been made using science.

The difficulty is that we, as scientists, have spent a lot of time developing our ‘scientific senses’.  It is a lot to ask a non-scientist to understand what might seem apparent to us.  So, part of our responsibility is to find clear ways to represent both what we know and what we don’t know to non-scientists. But, we also have to recognize that these people may have ways of seeing the world that are also valid and that are foreign to us.  I think that the only way to get past these communication barriers is to have guided focused conversations before starting on a project.  At a minimum, we need to try to understand what really matters to people.  Then we have to explain what we plan to study to address these concerns.  We should also use models to explain the specific expected value of a study BEFORE we collect the data. Finally, we need to revisit the initial questions openly and honestly at the end of the project.  What did we learn? How do we think that will be useful? Were there any surprises and what did we learn from them?  Why didn’t we anticipate these things? How are they better off than before we did our investigation? What is the next thing to examine, and why?  I think that you are entirely correct. No matter how good our work is from a scientific perspective, it won’t change people’s thoughts and actions unless we explain it to them clearly. We must always think about ‘science communication’ after decision making based on data collection and models.  (Mizo also provided the following references 1 and 2)

10/28/16 – Audience Member – Kyoto, Japan

How did you choose the locations for the nine observations in your example?

The basic approach is always the same, regardless of the application.  We form an ensemble of models.  Each one can predict what any proposed future measurement would be if that model was exactly true.  We then subdivide the models into different groups – based on prediction differences, related cost differences, or underlying differences in their physics.  We are looking for observations that cluster similarly to one of these model subdivisions.  In this case, measuring this thing is as close to equivalent to measuring the outcome itself as we can get.  Colin’s paper presents the mathematics of how we define this discriminatory power of different data.

10/28/16 – Idowu Olusegun – Kyoto University, Japan

Model development depends on data accuracy and availability.  What do we do, for instance, in parts of Africa where these basic data are lacking?

This has to do with the stages of model development that I mentioned.  If we truly have no data, then we cannot begin to form a hypothesis.  In this case, we need to try to collect the data that will get us to the stage of developing categorical models.  I haven’t thought of how to use DIRECT to do this – but, I would like to!  If we have enough data to build categorical models, then we can look for discriminatory data in this context.  Eventually, we will have enough data to propose quantitative models.  Then, we can follow the approach that I described for refining the model ensemble based on predictions of interest.

10/26/16 – Ken Kawamoto – Saitama University, Japan

Sometimes it can be unclear who the actual decision maker is. What do we do in these cases?

You are right about that!  I think that this is one example of a trap that we fall into as scientists.  When we are presented with a challenge – a problem that has to be solved – we can tend to think of it in terms of our own expertise.  We translate the question from – what can be done? – to, what can you do?  Then we become very focused on the details of what we want to do to advance our science.  Unfortunately, when we come back to the person who asked for help, we often find that we forgot (or never really understood) the problem!  I think that the only solution for this is to involve stakeholders in the design of the scientific study from the beginning. We need to restate the problem – asking how what we are investigating will affect the decision making process, say how it relates to what we do not yet know, then state how we will address the problem through our proposed investigation.  I expect that it will often become apparent which groups care most about the question – often, at least in the US, these are the people who really DRIVE the decision even if they don’t officially MAKE the decision.  If the stakeholders don’t tell us that they will not be the ones making the decision as we go through this process, at least we can know that we made our best effort to find out!

10/26/16 – Yosuke Matsuda – Mie University, Japan

How does your approach relate to global climate change discussions?

I find that when you present a range of predictions based on thousands of models to a nonscientist, they are tempted to see what they want to see in the ensemble and feel that their belief is justified.  Or, they see the huge extent of predictions and simply conclude that we don’t know anything.  I think it is better to show a limited number of models that cover the range.  Then, if one model represents their preferred storyline, we can attempt to associate their general concern with this scientific model.  Then, if we can test this model and call it into question, we may have more opportunity to change their mind.

10/26/16 – Tats Sakamoto – Mie University, Japan

As scientists, we are trained to seek the truth about a situation.  So, it can be difficult to defend the idea of doing science in a way that seeks to just answer questions from a practical standpoint.

I agree – this is a natural tension that we face as scientists.  It is made worse by the fact that we can be unite judgmental about what we consider to be ‘real’ or ‘pure’ or ‘big’ science.  Unfortunately, many of these grand universal truths that we have found only apply in isolation – in clean, simple systems.  But, the real world is complex, heterogeneous, and affected by many interacting processes.  Personally, I think that the bigger challenge that we face is how to propose multiple explanations for observations in complex settings. This really requires that we fight against many natural human tendencies, including the desire to have one clear and unchallenged internal narrative to understand the world.  So, yes, I agree that this tension that you point out is real and important.  But, for me, I think that we may be mistaken in thinking that the right role for a scientist is to seek the truth.  I think it is, rather, to pose the questions.  Maybe engineers can then take the lead in figuring out the specific answers for specific conditions.

10/24/16 – audience member – Tokyo, Japan

How do you make a decision if you have multiple models that make different predictions?  In particular, what do you do if you have multiple different groups who are involved in the decision making process?

This is a difficult question – but, it is also probably the most important thing to consider.  This really falls under the RECT part of DIRECT, which I did not cover in this talk. But, here is a brief version.  Every decision maker has a ‘style’ of decision making.  That is, they see risk differently and they balance risk and reward differently.  Although we tend to depict decision makers as ‘rational actors’ who make full use of probability prediction curves – I don’t think that this is often a true description of the decision making process.  For one thing, it requires that decision we are able to provide an accurate projection of the probabilities of different outcomes.  To be honest, I don’t think that even the best of our analyses provide this.  In particular, because we don’t generally explore model space as completely and rigorously as parameter space, I think that our prediction PDFs actually reflect the uncertainty of ‘what we have considered’, which is fundamentally different than the uncertainty of ‘what will occur’.  In addition, a full quantitative risk analysis requires that a decision maker can link every predicted outcome (or combinations of predicted outcomes) to a projected cost. This likely takes more effort than our analysis!  So, what are we left with?  I think that what is important is that we try to provide a scientific analysis of the plausibility of different possible outcomes – those outcomes that control the decisions of one or more groups.  We cannot pretend that this represents ‘the uncertainty of the system’.  Rather, it is a functional, objective, scientific examination of stakeholder concerns.  This requires that we have clear discussions with stakeholders at the beginning of a project to identify the key predictions and the thresholds of concern for different groups.  Then, we need to change our model building philosophy to make efforts to build these models of concern.  Finally, we need to work with stakeholders to determine what it would take, scientifically, to change their mind about the plausibility of the outcomes that concern them.  This is a ‘softer’ approach than we have been trying to take of late.  But, in my opinion, it is more supportable given the state of our science and it has more chance to see science used by decision makers.  Finally, it makes it clear that we, as scientists, are not advocating for any decision – it is the responsibility of the decision maker to make decisions as informed by science and by other relevant factors.

10/24/16 – Taku Nishimura – Tokyo University, Japan

Were the nine wells that you show for the transport example already in place, or did you choose where to put them?  In other words, did you define these as discriminatory observations and, if so, how?

This is the heart of the DI part of DIRECT.  The simplest way to visualize it is to imagine that we have proposed only ten different model realizations (some combination of different models and different parameter values).  If all of our models say that an action that we are proposing (in this case, to not build a treatment plant) is a good decision (because the plume will not leave the property), then we can feel pretty confident in that decision.  But, if two of the models predict off-site migration and eight predict no offsite migration, then what is important for decision support is to test the two polluting models AGAINST the eight nonpolluting models.  To do this, we need to find a good proxy, or discriminatory data, that would be measurably different if polluting or nonpolluting models are true.  So, we predict what we would measure at any given location using all of our models.  Then we compare how different the predicted values are for all of the polluting versus all of the nonpolluting models.  If their predicted values are distinct, then the proposed observation is discriminatory.  If there is a high degree of overlap in the predicted values, then the measurements are not discriminatory. Colin Kikuchi’s paper shows how we can do this quantitatively.  For the case that I showed, and for many practical cases, we couldn’t afford to choose one well, measure the concentration, update the models, then choose another well.  That isn’t practical for a real field campaign.  So, we used a short cut to choose multiple discriminatory points – we chose the most discriminatory points with the stipulation that they had to be some minimum distance from other selected points.  Colin’s paper addresses this, more difficult, question, too.  He examines how we can choose points that are both discriminatory and have minimal expected redundancy of information.

10/24/16 – Satoshi Izumoto – Tokyo University

When can you use statistical models and when do you need a physical model?

This is a bit of a hot topic.  What seems to be catching on is some version of surrogate models.  These can be statistical or correlation-based or neural networks.  I don’t pretend to have expertise in this area, but I can tell you my impression from what I have heard.  I think that all of these approaches work well – and many or all can be used in a multiple model framework – but, largely when we are modeling things that are within the conditions used for calibration.  For example, if we collected measurements of a plume through time and try to reconstruct the source conditions, then these models may work really well.  But, if we want to train models based on hind casting and use it for forecasting … where will the plume go in the future?  Then I think that we are asking for trouble.  The only way around this, I think, is to develop a number of forward numerical models – an ensemble – and then use this ensemble to train faster running surrogate models.  I think that idea is ‘in the air’ these days!

10/24/16 – Toshiko Komatsu – Saitama University, Japan

It is not clear to me how your results for the contamination example relate to a specific decision that the client had to make.  Can you clarify?

Actually, you are right – I didn’t make that clear!  Hard to believe that almost 6000 other audience members have not pointed that out!  What I should have said was that the decision that had to be made was, first, do they have to plan to build a clean up facility and, second, if so, what are the expected characteristics of the plume (maximum concentration, time of arrival, duration of plume treatment)?  So, we examined the first question and found that, somewhat surprisingly, it does NOT look like they will have to plan for a cleanup.  Of course, they may not be convinced by what we have done so far.  So, the next step would be to try to think of some other plausible models that WOULD predict a plume that needs to be cleaned up. If we (or others) cannot come up with one, then we would strengthen our recommendation.  If we CAN come up with a polluting model, then we should look for discriminatory data to test it against all of our other plausible models.  Ultimately, it may turn out that polluting models become the most likely.  In that case, we would need to design a treatment facility that is most likely to treat the ensemble of predicted plumes.  Or, we would need to design the best balance of performance and cost given the prediction uncertainty.  Does that make more sense?  I will definitely work that into the rest of my talks.  Thanks!!

10/21/16 – Mitsuyoshi Ikeda – Nagasaki, Japan – Japanese Association of Hydrogeologists

I appreciate your use of poetry in a scientific talk.  I will return this with a question inspired by Shakespeare.  I have over 30 years of experience in geologic modeling.  I have found that the relationship between cost and improved understanding is not linear – there are times of very steep learning for relatively little cost and other times of little learning at high cost.  So, how can we find the informative conditions?  To measure or not to measure, that is the question!

Fantastic!  I will never think of my work again without referencing this idea!  I think that you have exactly captured what we aim to do.  That is – can we find measurements that are most likely to be able to change our mind?  In this regard, measurements that are predicted to be the same by all of our plausible models are certain to have little value – a shallow slope.  Those that have different expected value among all of our models are likely to be in an area of steep slope.  But, those measurements that are discriminatory between models with ‘bad’ outcomes and all other models are especially useful.  They have the greatest likelihood to change our minds about something that we think is important!  There is another consideration.  What about those measurements that completely change our mind about how a system works?  In my opinion, these are a matter of good fortune.  We must always make as many measurements as we can afford to increase the chances of funding these surprises.  But, while we hope for good fortune, we should plan for practical benefits.  As Shakespeare said, ‘Good luck is often with a man who does not include it in his plans.’

10/14/16 – Mahadev Bhat – Florida International University

If you use model averaging, don’t you lose the ability to learn from the models, as you pointed out?

Correct.  Our suggestion is that we should produce the ensemble of models that can be used to help to answer questions.  Some users will want to use these in a quantitative way – usually some form of model averaging.  But, the example that we showed offers an alternative approach.  For the transport case, we searched for cases that lead to a negative outcome (mass leaving the site).  Then we tried to find data that could test those models.  We were lucky – we were able to discount those models with our first set of observations.  In other cases, probably the norm, we could reduce the likelihood of those models, but we would need more data to test them sufficiently.

10/14/16 – Mahadev Bhat – Florida International University

How realistic is it to find discriminatory data that will isolate one model?

I should make it clear – the goal is not to isolate one model, or certainly one realization of one model.  Rather, we subdivide the ensemble into two or more groups.  Then we look for data that can discriminate among those groups.  It does mean that we may not, in fact that we probably won’t, choose a single model in the end.  But, the hope is that the ensemble of models is informative in concert.  It’s not always possible, but we have had good success so far!

10/12/16 – Terry Griffin – University of South Florida

Is it really important that you use different models, or can you just have different realizations of one model.

I think that the simplest way to say it is that we want to explore the plausible range of descriptions of the system, paying closest attention to those that potentially have the biggest impact.  It isn’t necessarily the case that we have to explore a large number of models or realizations – we just need to explore the right ones.  In some cases, the most important uncertainty will reside in the models, others the parameters, others the external forcings.  It will require judgement to try to determine which uncertainties are most important – and we will make mistakes doing this.  But, it is still better than making an a priori decision to only  explore one element of uncertainty (e.g. parameters, or scenarios described by boundary conditions).

10/11/16 – Rafael Munoz-Carpena – University of Florida

I think that we have to be cautious no matter what approach we take to defining uncertainty, and in particular model ensembles, because, in the end, our models aren’t really different.  Aren’t they all based on the same basic physics, even the same mathematical representations of the physics, approaches to numerical solution, etc?

You are absolutely correct.  Or, at least, I completely agree with you!  What we are proposing is only a small step in the direction that you are pointing out.  Ultimately, it would be great to be able to explore the ‘deeper’ shortcomings of our scientific models.  But, at least as a start, I think that it is important for us to examine those things that we already know that we do not know!  Even doing this – exploring parameter uncertainty, geologic structure uncertainty, boundary condition uncertainty, etc – is already taxing our computational resources.  The most common response is that we don’t consider it at all (in practice) or we limit our investigation to the relatively easy aspect of parameter uncertainty (academia).  I think that we need to start thinking creatively about how to propose reduced model ensembles that capture the important range of models for specific applications (practical or scientific), but that essentially give up on quantitative uncertainties.  Largely, I believe this because of the point that you make!  Still, if we can start down the path of more efficient model ensemble definition I think that it could allow us more flexibility to examine the deeper model uncertainties that you point out.

10/10/16 – Jim Gross – Florida Defenders of the Environment

I have worked on many different regional water supply plans that utilized groundwater flow models to assess the sustainable limits of groundwater supplies.  In developing the “District Water Supply Plan 2010” for the St. Johns River Water Management District, we sought to address the uncertainty associated with critical water supply constraints known as minimum flows and levels (MFLs).  By this time there was a lot of concern about whether the water management district’s models were dependable for planning purposes.  We sought to address the question of what do we know and how do we know it.  To address model uncertainty, we utilized PEST software to come up with multiple realizations of the groundwater flow models that met all calibration criteria. The principle parameter we allowed PEST to vary, within prescribed limits, was vertical conductance.  We chose vertical conductance because it was the aquifer parameter that was least well known.  Then we ran the entire family of models to see whether the MFLs were met.  As I recall, we came up with approximately 100 different models that met calibration criteria.  We then reported the results from all of the different models and characterized the MFLs by how many of the models found that they were not met.  I seem to recall we used three categories of results to assign qualitative uncertainty:  high certainty – MFLs not met in greater than 2/3 of the models; intermediate certainty – MFLs not met in greater 1/3 of the models but less than 2/3 of the models;  low certainty – MFLs not met in fewer than 1/3 of the models.

I like this approach and no one does this better than John.  But, from what I hear, even is moving away from a high dimensional inversion approach.  Ultimately, the concern is that our prediction PDFs don’t really capture the uncertainty of outcomes – especially if we don’t examine model structural uncertainty.  Essentially, out PDFs describe our ignorance much more than they describe the likelihood that something will occur. So, we have to caution users to view them semi quantitatively.  If this is the case, then it seems that we should be able to focus our efforts to better defining qualitative risks estimates directly.  I think that this could be more efficient, more useful, and more honest.

10/10/16 – audience member – University of Florida

In the end, how can you choose which model to use if you have spent your time trying to uncover multiple plausible models?

I like the Lincoln analogy here.  If you advisors have different bases for their opinions, how do you choose which one to listen to?  Well, if they come to the name conclusion from different places, then you are in really good shape!  But, if they disagree, then you would ask why they disagree.  Perhaps they have a way of processing and presenting information that could convince others to change their minds.  Or, perhaps they could be convinced if some key assumptions that are necessary for their conclusion can be falsified.  But, sometimes, you will not find agreement.  Then you need to weigh the plausibilities of their arguments and the importance of their conclusions.  It doesn’t matter which advisor you choose to follow, it matters that they have all, collectively, informed you more completely.  This is a perfect analogy to what we are recommending regarding the use of scientific models as advisors.

10/6/16 – John Molson – Laval University, Quebec, Canada

As someone who has spent a lot of time calibrating models, and living with the frustrations of calibrating models, I wonder if there is some benefit that we can take from the runs that are less successful.  It seems that there is some information that we are throwing away.

I have to admit that, listening back to my recording, I didn’t fully appreciate your question.  At the time, I picked up on the idea that we could benefit from considering those runs that don’t calibrate as well, but that lead to ‘differently important’ outcomes.  But, I think that your question was more subtle and insightful.  It does seem that we should be looking to extract as much information from the runs that we make the effort to complete.  I wonder if they could be used to train surrogate models, or to look for important model ‘signatures’, or simply to improve our understanding of the full range of plausible outcomes.  I haven’t thought this through fully enough, I’ll admit … but, I think that your instinct is correct.  We are throwing away a lot of useful information in our current calibration approach!

10/6/16 – audience member – University of Laval, Quebec

Thank you so much for your presentation. Do you think that using statistical modeling approaches is more effective in simulating a hydrogeological case in comparison to deterministic models?

Personally, I have a preference for physical models.  But, I can also see the great promise of statistical models.  My sense, perhaps unjustified, is that statistical models work best when considering one of two things.  First, processes that are simple enough to allow for the use of very fast models.  Second, processes that are so complex and poorly understood that we don’t even know where to start with building deterministic models!  In this case, too, we need to have very fast models so that we can explore many of them.  For us, although we don’t know everything, there are some things that we know with some confidence.  It seems somehow wasteful not to somehow include process knowledge in our model development.  Furthermore, even if we adopt statistical models, we will likely have to provide some level of physical explanation of the models (and/or the results) before we (or anyone else) can have any confidence in the results.  I think that it will be fascinating to see how scientists figure out how to merge process and statistical models in the future to leverage the insights of the former and the speed and out-of-the-box exploration of the latter.

10/6/16 – Marc Laurencelle – INRS, Quebec

Are all of the tested models calibrated with an equal effort before comparing them with the likelihood?

This is a very fair question.  We have to find some way to balance the effort that is required to calibrate models with the wasted effort that could be spent on considering poor models.  I don’t know how this could be done efficiently, but it would be very useful to find efficient ways to ‘approximately’ calibrate models – perhaps even with a user-defined threshold.  One approach, included in PEST, is to calibrate a model and then produce variations by altering less sensitive parameters.  But, I have a feeling that we could actually rethink the way that we explore model- and parameter-space to focus on finding models that balance goodness of fit with diversity.  This would allow for ‘fair’ consideration of many models without the cost of calibrating all models to too-high a degree first.

10/6/16 – audience member – University of Laval, Quebec

How does the overall cost of this approach compare to standard approaches?

The key savings is that we avoid the standard approach of measure, calibrate, wait until the model fails, repeat.  This allows us to find some efficiencies in avoiding the collection of non-informative data and in building models earlier in the decision process.  That is, we get the advantage of many models at once, rather than only learning from one at a time.  Ultimately, the advantage is that we give the decision makers more context, more granular context, to make more robust decisions.  But, to make this practical, we need to automate some fraction of the model construction process the same way that we have the model calibration process.

10/6/16 – audience member – University of Laval, Quebec

What constitutes two ‘different’ models?

I think that this is a fascinating question.  What constitutes ‘distance’ in model space?  If we can make some progress in determining this, then I think that we should be able to balance model diversity with model accuracy.  Perhaps this can also help to guide us to more productive searching of model space.  My personal feeling is that this distance should be based on predictions of importance.  It is particularly interesting to think about how we can define distance for discrete models as opposed to continuous parameters.  But, I would be very interested to hear other ideas for defining model distance!

10/6/16 – audience member – University of Laval, Quebec

Sometimes our limitation for practical problems is more related to time than even money.  Does this approach really work if you are time limited?

This is a reasonable question.  I accept that there are some projects that are very time limited.  In these cases, or if the problem is very simple, then we may be justified in just pursuing one model.  But, I think that it is more often the case that we are just less COMFORTABLE with the idea of exploring model uncertainty.  I don’t think that it should, necessarily, take more time to explore model uncertainty as we have come to be efficient in exploring parameter uncertainty.  For those cases (which I believe to be the majority) for which model uncertainty leads to the most important uncertainties, the efficiency argument would have to be very, very strong to outweigh the benefit of exploring the more fundamental structural uncertainties in the system.

10/6/16 – Louis-Charles Boutin, Matrix Solutions, Quebec

Do you stop calibration efforts when you reach a minimum acceptable measurement objective function threshold?

We haven’t figured that out.  It seems like it would be a really good idea – although it has the difficult question of defining an acceptable threshold a priori. So far, we have just limited the effort that we spend on calibration.  But, that is also quite subjective.  To answer your question – I don’t know the best way to do this, but I think it is something that we should all be considering!

10/6/16 – Marc and Patrick (on line), Laval Univeristy, QuebecShould you use the same amount of effort to calibrate all of your models?

This really speaks to how you will use the models in the end.  What I have trouble with is imagining how to be ‘fair’ among disparate models.  Should we spend more effort calibrating more complicated models?  How do we account for the impacts of our initial parameter values or the parameter ranges?  In the end, I think that all of these questions only matter if we want to make quantitative use of the prediction probabilities.  I am not yet convinced that these have much meaning, so I wouldn’t suggest that we spend as much time trying to create them.  But, this means that we have to find a defensible and objective way of determining and communicating relative probabilities – especially those that are not based on a ‘full and fair’ examination of all possible models.  I think that this is a fascinating question that reaches well beyond hydrogeology!

10/6/16 – Sarah Alloisio, Fresh Water Solutions, Quebec

Do you think that neural network models can replace physically based models in your approach?

I am hesitant to say yes to this – I am a bit too much of a physicist to believe that physics doesn’t add value!  In particular, I think that we have to apply some level of ‘reason’ at some point in the process.  We can either do that ‘up front’ by building physical models, or we have to do it afterwards to ‘check’ the plausibility of the models that are produced by less physically based models.  I do, however, think that these tools can be a great help in breaking from our tunnel vision once we have built a model.  Perhaps we can find a way to use NN’s or other simple, non physical models to convince us to look for other possible explanations for data.

10/4/16 – David Lapen – Agriculture Canada, Ottawa

I especially like the part of your talk that talks about how multiple models can be used and how simple ‘wisdom of the crowd’ approaches can be misleading.  I have seen similar things in source attribution and infection risks for pathogens.  There is often very limited ‘hard’ data, so we need to rely on many models (hypotheses, opinions). We can do this by gathering experts to elicit their professional opinions about pathogenic risks.  But, what do we do if more than 50% of the experts say that a pathogen poses little risk, but some significant fraction disagrees?  It could be irresponsible to ignore this potential threat.  But, at what level of expert concern do we take action?

 

I love these kinds of connections!  My gut feeling is that this is the norm for most scientifically-informed decisions.  Unanimity is unlikely for all but the simplest, or most highly engineered, systems.  Should we, as scientists, focus on figuring out the ‘most likely correct’ description of a system?  Is it OK that we just push of the uncertainties and associated risks to ‘decision makers’?  Even if this is OK, can we hide behind statistics, or should we offer some specific, plausible alternative descriptions of a system.  Clearly, I think that the latter is more honest and more useful.  If nothing else, it can point the way to what we have to do next as scientists to address the most important risks.  I think it also has a real chance of integrating science into the decision making process from the beginning – maybe forming better interactions between scientists and decision makers and the general public.  OK, I’ll step down off of my soapbox now … but, I do appreciate your comment and I’d enjoy hearing more ideas in future as they arise!

10/4/16 – Mike Melaney, Ottawa, Canada

I have worked in some capacities that are driven by strict regulations – for instance, a projected mass flux or concentration at a property boundary is a yes/no indicator for development.  Often, these models are based on parameters that are by no means relevant for the specific site.  Would your approach allow for a different way to apply and analyze these highly variable parameters which may result in the enforcement of these regulations?

I haven’t been asked anything like that, yet!  I do think that multimodel analysis could help here.  This is one idea … you would define a range of parameters for any location, better covering the possible local conditions.  Then, you could structure a regulatory framework as follows:  run all of the models.  If they all say that you are ‘within code’, then you are approved.  If none say you are in code, you are likely not going to pass muster.  But, if two of ten say there is a problem, then you could pursue a pre-specified set of tests to challenge whether any of those parameters is plausible.  I imagine that you could allow for the same flexibility on the ‘opposed’ side, too.  I’d like to follow up with you one this … I think it could be a nice way to allow for structured and limited additional science to support a decision.  Thanks!

10/3/16 – Kevin Mumford – Queens University

There may be an increasing tendency for some decision makers to listen to the loudest voice in the room and to ignore scientific (or expert) advice.  Do you have any advice for emerging professionals about how to better communicate science to combat this trend?

That is a tough battle.  I’ll offer an optimistic answer, as it is too easy to think of pessimistic responses to this question!  I think that it is tempting to want to fight back with a kind of full-throated scientific advocacy.  Some people are good at this and probably make a real difference.  But, I think that there is a reason that scientists are more commonly seen as guarded, cautious, and underspoken.  That is – through the practice of the scientific process we learn over and over again that all knowledge is partial and likely to change, sometimes radically, in real time.  So, it is hard for us to adopt the easy overconfidence that characterizes many of the loudest voices in the room.  (Unless that other voice belongs to another scientist … then the claws come out!)  But, it is clear that the general public, by which I mean people who have not spent time practicing science, to misunderstand our usual mode of presentation.  They often see our qualified answers and embrace of uncertainty as an unwillingness to be definite, as evasiveness.  Even when we try to present statistics that quantify our uncertainty (which is actually an act of decisiveness), the results are not explained in a way that resonates with the public.  I think that we need to find ways to communicate what we know and what we do not know more clearly.  For me, this has to do with presenting discrete competing storylines that cover our uncertainty, but also demonstrate that, in most cases, we know a lot.  Take the example that I give in the talk related to solute transport.  We could produce a single ‘best’ model and state unequivocally that the plume would evolve in some predicted way.  But, we know it is likely to be wrong.  Alternatively, we could produce one million models, that differ in their structure and parameters, and then present some ‘most representative’ model.  We could then show a fuzzy region around this model’s prediction to represent our uncertainty.  But, this sends the message that anything that lies within the uncertainty cloud is possible.  Without some subtlety of interpretation, people will think that anything within the cloud is equally likely.  In other words, we haven’t given them any results that will help them to understand what we don’t know.  One alternative would be to produce 10 plumes, some that represent the most likely case and others that cover the range of predicted outcomes (some good, some bad).  For each of these plumes, we could describe the conditions that have to be true for this plume to occur.  Then, considering all of the plumes and their required conditions, we could describe what we want to do to test whether one or more of the plumes of concern is likely to represent reality.  In many cases, and with some changes in our thinking, we may even be able to identify many of these models of concern without having to build the million initial models.  On the one hand, we will have given up our ability to produce quantitative uncertainties on our predictions.  Of course, I am not convinced that we ever do a credible job of producing these uncertainties, given our inability to search model space, so it may not be such a great loss.  On the other hand, we produce a series of competing plausible hypotheses that we can use to guide debate towards opposing, but scientifically defensible views.  OK, this has wandered far from your question.  But, I would summarize as follows: I don’t think that it is in the best interest of science for scientists to try to out-shout the loudest voices in the room; but, I do think that we need to abandon our timid, apparently vague approach to communication and replace it with a style that emphasizes what we know, clarifies what we don’t know, and provides easily understood reasons for any proposed scientific investigations to support decision making.  If we do this well, we can develop a kind of scientific jiu jitsu that harnesses the passions of differing opinions to help us to formulate more diverse model ensembles!  (OK, that was geeky even for me.)

10/3/16 – Kyle Stephenson, Kingston, ON

If scientists can better communicate scientific uncertainty to decision makers, courts, etc. do you think that the legal and business worlds would accept the reality of scientific uncertainty such that it can be incorporated into agreements, settlements, decisions etc?

Great question! The honest answer is that it is probably above my pay grade to predict this!  But, I’ll tell you what I think about it … in fact, what motivates my efforts in this area.  I think that the approach that we are taking now is not working and is not likely to work.  We have, for too long, thought that if we could just produce ‘better’ science, or more accurate descriptions of uncertainty, then people will eventually accept science as the best basis for decision making.  This sets up an unreasonable expectation of non-scientists.  So, what can we do instead?  To me, it makes sense to start from what people think, believe, or worry about, and to try to move from there towards a scientific view.  To make this as efficient as possible, we should ask what is concerning people (driving their decision making) and then determine (with them) if there is something that we could test scientifically to change their mind TOWARDS a more scientifically based view.  This would give us the opportunity to explain why we can formulate a scientific model that represents their concerns, but that it is only one of multiple plausible models at this point (thereby introducing the idea of scientific uncertainty).  Once we make this connection, we should be in a better position to do science in a way that it can actually affect decision making.  What is also interesting in your question is whether this process can be codified.  I have just started thinking about this … but, I think that it is possible.  Specifically, I think that we could propose regulations based on multiple models (or model realizations).  If all (or some high fraction) of the models would disallow some action (or impose some fine), then the rule would be enacted.  But, the regulated person or organization could challenge this by paying for a prescribed set of additional tests that could be used to challenge some (or all) of the models used for regulation.  It sounds clunky, and would likely be as described, but I think that there is something useful in there!  Would be interested to hear your thoughts and experiences! 

10/3/16 – Alan Chou – Queens University

I was involved in a problem during the design of a dam.  There were real questions about the undrained shear strength of the underlying material. This led to two options for construction. There was a deep polarization in the independent technical review board about the choice. How can you make a decision when there are uncertainties and strong differences of opinion even amongst technical experts?

I’ll start with a cop out. I am very careful to say that we, as scientists, shouldn’t think of ourselves as making decisions. I think that it is important for us to do that, so that we can honestly claim to be impartial and conducting our science. More often, we have a mixture of science and advocacy. That is, each side of an argument hires its own experts. We can and do attempt to remain impartial. But, especially if we find ourselves in a condition in which we are encouraged to build a single model, we end up having to make difficult choices. More often than not, we try to convince ourselves that we are making The impartially best choice. But, what do we do in those many situations where that choice is not clear? I think, it is actually better to open ourselves up to the opportunity to make both choices. So, to eventually try to answer your question, I would say that the best thing that we can do as scientists is to offer a platform that both sides, or multiple sides, can use to test their concerns in a quantitative and objective way. Then, with this common framework, we can ask both sides the same question. What would it take to change your mind about your position? It is quite likely, that for many problems that we address, one or both sides will say that no matter what scientific evidence we produce they will not change their mind. This can be a sign that they are not rational or willing to compromise. But, it may also simply indicate that the risks, no matter how unlikely, are too costly for them to except. If this is the case, then there is real question as to the potential benefit of doing the science. I think it is much better to learn this before we start on a project than to discover it after we’ve completed the work. After all, there should be plenty of problems that need our help.

10/3/16 – Adrian Mellage – University of Waterloo, Canada

Is it ever defensible to construct only one model?

As much as I would like to be able to say no, I think that there probably are times that this is appropriate. In fact, Graham Fogg commented after my talk that he wasn’t convinced that we need multiple models to assess water supply in closed basins.  From his perspective, the physics is pretty well understood and the uncertainties that we have are unlikely to have important ramifications for water management.  For these conditions (and based on Graham’s extensive knowledge and insight!) I would buy the idea that a single model may be sufficient.  Then, effort can be focused on model calibration to the immeasurable details that characterize a real system.  But, I think that this is the exception in science, rather than the rule.  Actually, I think that this sounds more applicable to many engineering problems – especially those for which the environment can be constructed and controlled to ensure that the accepted analyses apply.  To personalize this, I would suggest that anyone working with models of real sites try to estimate how often they have been surprised by something that they, probably, should have known.  In other words, how often have they produced a model, calibrated it, and then had it predict something incorrectly?  When this has happened, were they able (often with relatively little effort) to come up with an alternative explanation that did accommodate the new data and the old?  The more often this seems to happen for the problems that you are addressing, the less defensible it is to provide a single model in the first place.

10/3/16 – Bernie Keuper – Queens University

How do you explain to politicians and lay people this idea that you cannot create a perfectly accurate model of the hydrogeologic system?  More importantly, how do you do this in a way that they will still value your findings?

Those are two different species!  But, for lay people, I think that they can understand the idea that you can tell multiple plausible stories and that you need more information to figure out which one is (most) correct. For business people, they have told me that they are actually quite comfortable making decisions under uncertainty because all of their decisions are made under uncertainty.  As for politicians, what I am learning, is that the mistakes that they cannot afford to make are the ones that they should have known better than to have made.  This is really what we are trying to avoid.  As for the black swans that we really didn’t see coming?  I’m not sure that anything can really help!

10/3/16 – Bernie Keuper – Queens University

Bernie Keuper. Following on the dam question, what do you say if someone stands up and says why didn’t you just take the lower risk option if you had scientific uncertainties?

I would first offer a big cop out and say that I don’t make decisions as a scientist. But, beyond that, I think that the risk that we face in always choosing the zero risk option is that we miss opportunities. Furthermore, very often, we are choosing between two or more options, all of which have some non-zero risk. So we are really trying to help people to make better risk-based decisions under uncertainty.

10/3/16 – audience member, Queens University

How many models should we build?  Is it possible to build too many models?

That is a fascinating question and one that I have not been asked before. In fact, if I think about it, there is such a thing as too many models.  I think that, both for communication and to better interface with other scientists, what we really want is a concise set of different models that cover the important pathways to failure (or whichever other metric is most important to those who will ‘consume’ our science).  We want to be able to put these models in some relative likelihood context.  But, much of the effort that we spend to try to quantify model (and, therefore, prediction) likelihoods may be better spent examining ‘plausible outliers’.

9/30/16 – Emil Frind, University of Waterloo

Your plume changes direction just before reaching the property boundary. Does this not concern you?

Absolutely!  In fact, I haven’t thought of describing it this way (but, Iwill from here one) – but, if this was based on a single model, no matter how good that model might be, would you believe it?!  I wouldn’t.  I would be thinking of all of the reasons that it may be wrong.  That, in essence,is what we are trying to do proactively.  We can say that we built 4000 model realizations, based on eight different underlying models, and NONE of them says that the mass will leave the site.  From the plot I showed, we can say that the worst case has a plume that is swept along the boundary (there may be others that simply don’t reach the boundary).  I think that this is much more convincing.  But, even more importantly, I think that it is much more specific.  If someone still disbelieves the result, they would be encouraged to propose other mechanisms or property distributions that would cause the plume to leave the site.  If none of these proposals cause the plume to leave the property and are deemed to be plausible (acceptable fit to existing data), then there is no need to collect more data.  But, if some new models do result in outcomes of concern, then our job is to identify discriminatory data that can test these models against all other plausible models that have been proposed.

9/30/16 – Igor Markelov – University of Waterloo

How do you decide how many models to construct?

Ultimately, this is the $1 million question!  The only answer that I have with some certainty (although see my response to Adrian Mellage!) is that the wrong answer is ‘one’.  Perhaps I am too enamored of the idea of multiple models by now. But, I really cannot see how you can decide how to move forward on a project if you only have one hypothesis. (That is, you only have one model.) Beyond two, I think that the next minimum number of models that you should have should be dictated by the number of fundamental uncertainties that you have about how the system works.  One major question may be covered with two well considered models.  Two major uncertainties?  Then it depends on whether you think that you know how these uncertainties interact.  Using the binary model tree approach that I showed in one of my slides, you might expect that four models could cover two uncertainties.  This may be overkill if some of the ‘overlap’ models are not likely to be importantly different.  But, it may be far too few models if you think that there may be poorly defined threshold behaviors or, worst case, if the results of transitioning between model conceptualizations may not lead to monotonic changes in outcomes of interest! So, as a guide, I would say that you should look at your project and try to assess whether you think that the major uncertainties are conceptual or related to details (e.g. parameter values, some boundary condition values, etc).  If the former, maybe commit ¾ of your modeling time and budget to creating competing models.  If the latter, maybe commit ½ of your time to parameter estimation and communication of uncertainties.  Not a very satisfying answer … but, it is a step away from the single model trap!  

9/30/16 – Jim Roy, Environment and Climate Change Canada

I the ideas that you presented because it seems that they can help you to go back in a structured way and consider if things that you wrote off as noise because they didn’t fit your current hypothesis were actually important.

Great point!  In fact, I think that you should be required to keep track of these write offs!!  So far I have been thinking about identifying potential errors in model construction.  But, I think it is a really clever idea to use your ‘outliers’ to help you to formulate alternative hypotheses.  Thanks!

9/27/16 – Erick Burns, USGS – GSA

I like the practical example, and I think that it is important to remember that often we don’t necessarily have to spend a lot of time, money, and effort in getting the “best” answer.  I agree that if we can bound the problem with reasonable confidence, that’s all it takes in many cases.  Recently, we are doing a lot of work on groundwater temperature, and biologists and ecologists, who are interested in the effects of groundwater inflow on the temperature of streams, have been asking us if we can help them understand how groundwater temperatures will change in response climate or anthropogenic change.  In response, we’ve developed some new analytic solutions and tools to help them understand system vulnerability and timing of response so they know when and what to go measure.  The tools aren’t perfect, but they provide rapid assessment and insight.

That is fantastic!  What is funny is that in all of my recent talks I have described this examplealmost exactly as an example of how DIRECT could be applied to pure scientific problems!  (Ttemperature of water inflow to a salmon redd, to be precise.)   I really just cut it out today in the interest of time!  It is great to hear that you are already doing this … I’d love to follow up and hear more details!

9/27/16 – audience member – GSA

How have regulators received these ideas?

That is really the key question.  To rephrase it slightly, where should we start injecting these ideas?  So far, my current and former students have had a lot of success proposing these ideas as consultants.  I’ve actually been a bit surprised, and very encouraged, to hear that almost every company that has been pitched this idea has bought into it.  In meetings that I have had as part of the Darcy tour, I have also been very happy with the interest expressed by mid-level regulators – those people tasked with actually performing oversight and assessing environmental impact statements.  The idea that there is a path that could eliminate or at least reduce the competing models scenario is music to their ears.  But, they are not necessarily in a position to require that multiple models be considered.  So, the key is to get the ear of upper level regulators and, perhaps, policy makers.  Again, I have had some very positive feedback – not least of which was from Governor Hickenlooper of Colorado.  But, the challenge will be finding a way to inject the idea of limited, targeted multi-model analysis with the specific aim of supporting decision making into our political process.  I think that the key leverage is this … DIRECT protects us against making mistakes that we shouldn’t have made – mistakes that we made because of assumptions we made early in the process – my sense is that those are the mistakes from which you cannot recover! Stay tuned for that one! 

9/27/16 – audience member – GSA

Are you saying that the ensemble, or model averaging, approach is not good for any applications?

I think that it is useful for some applications, but not as many as we may think.  I think that it is useful if decisions are to be made based on the most likely outcome.  But, as I’ve presented here, I think that this is not usually the case for practical applications.  I don’t even think that it is that useful for purely scientific applications – rather, I think that we are always interested in knowing about lower probability, higher consequence (for decisions or scientific inference) outcomes.  The other occasion that I could see this approach being useful is for full-blown risk assessment.  In these cases, we could use the entire prediction PDF to assign an expected cost (or benefit) of an action.  But, again, I have concerns.  In this case, my concern is that the best that we can do in developing a prediction PDF is to quantify WHAT WE DON’T KNOW about the outcome.  As I understand it, risk-based decision making should be based on the probabilities of different outcomes – my guess is that this PDF is rarely the same as the best prediction PDF that we could produce.

9/27/16 – Tim Grundl U WI, Milwaukee  – GSA

It seems to me that if you went to a client and said, ‘We have thousands of models, none of which we really think are true’ then they would be nudged toward saying that you haven’t really done your job!

This is a completely valid concern and I can imagine just such a situation.  But, there are a few points that I would offer.  First, the key to avoiding this is to communicate clearly what you have done and why.  The point is that NO ONE can come up with a single correct model of the system.  This is evidenced by two things: first, if you ask 10 consultants to develop models, you’ll get 10 different ‘best models; second, if a consultant makes a ‘best’ model for you, you can be guaranteed that they will be making another ‘best’ model as soon as you have the budget to hire them to do it.  So, the extreme alternatives are: business as usual – make a single model and, essentially, pretend that it is correct (but, perhaps add some disclaiming text to disavow responsibility if it isn’t); or, make many, many models in the hopes that you can capture the statistics of your ignorance.  Unfortunately, the latter approach is likely to be quite expensive and quite disheartening.  So, we are very interested in a middle path – proposing a limited number of models that produce ‘importantly different’ predictions.  Then find as many ‘plausible’ realizations of those models as the remaining modeling budget will allow.  Then, be sure to tell the client that you have built a number of models, any one of which has a comparable chance of being correct, and that you can use them to tell them what is most likely to be correct, what other specific outcomes cannot be excluded based on what we know now, and which data should be collected to test the models that lead to unacceptable outcomes.  To go further in this vein, I think that we are underestimating how comfortable people are in considering competing narratives when making a range of decisions.

9/27/16 – Andrew Fisher, UC Santa Cruz – GSA

The example that you showed is from climate modeling.  I think that this approach has been used widely to address differences between results from different models in the climate modeling community – generate multiple predictions from different models, then average the ensemble of results. Is this what you are saying and, if so, do you find this as unsettling as I do?

Yes … and yes!  Firstly, I will say that I keep waiting for someone who is more experienced in this area to correct me … to tell me that my simple-minded understanding of how this averaging is done is incorrect.  But, it hasn’t happened yet!  So, at this point, this is my best understanding of how this averaging is done.  Furthermore, yes!  I find this quite unsettling.  At the simplest level, it seems strange that we would average many models to come up with a prediction that NONE of them may have made and use that as our best representation of the truth.  Furthermore, I think that it is a real opportunity lost to not consider those models that produce outcomes of concern (or outcomes of great benefit, if that is your thing) and to specifically search for discriminatory data to test these models against other hypotheses (models).  Thanks for letting me know that I am not alone in my concerns about this!

9/27/16 – Amanda Schulz, Syracuse University – GSA

Where does data fit into the model?  Do you build the models first and then ingest the data?  Or, do you collect data first and then build the models?

This is the intent of the model building exercise in the talk.  It is a bit subtle, but I really think that it is true that if you have essentially no data, you have to just collect something.  Now, I think it would be cool to think about what those data should be – I’m sure we would learn a lot by trying to do that!  But, let’s imagine that we have that threshold amount of data.  From what I have seen, we can use that data to define some broad categories for our systems – water/energy limited, high/low gradient, thick/thin sediments.  At this point, I would definitely support Cliff Voss’ approach to building very simple models.  Essentially, if we can trust our categorization, then what is the range of general behaviors that we might expect, as represented by a range of plausible simple models.  But, if we want to make predictions that will support decisions, then these models may be too general – their predictions might just be too uncertain.  So, the next steps are to look at our range of predictions (linked with specific models) and use the models to find discriminatory data to test ‘models of interest’ or ‘models of concern’ against all other models.  From that point forward, we are continually collecting discriminatory data and modifying our model ensemble.  If we have developed a good ensemble – meaning that it covers the range of plausible models with importantly different predictions – then most data will just cause us to change our relative belief (likelihood) among the models.  This is a form of data assimilation, I think.  But, every once in a while we will collect data that really changes our view of the system in an important way.  Then, we may need to add models to the ensemble – or we may even need to throw it away and begin again.  In Stephen Jay Gould’s terms, we would have a ‘punctuation’ – an evolutionary jump.  Like biological systems, and unlike our current approach to model development, we would only make this decision based on the ‘evolutionary pressure’ of new data.

9/27/16 – Greg Foster, Retired Geologist, Jackson, MI – GSA

I’m going to give a talk that is very similar to this on WE morning – it’s called you can lead a horse to water, but you can’t make it drink – about religion and science

I’d love to hear your talk!  If you don’t mind sharing your slides, that’d be great.  I am very interested to hear if there are some internal narratives that cannot be nudged by science.  More importantly, I’d really like to explore how we can decide if it isn’t worth trying to apply science or, if it is, how we can do it most effectively.

9/27/16 – Andrew Wolfsberg, LANL – GSA

Our high level regulators are on board with these ideas.  We are being required to consider uncertainties in our models, and their consequences, and to engage stakeholders in the process.  So, just a comment to say that there are some promising movements in the regulatory circles!

That is really great to hear.  It isn’t surprising to me to know that DOE is leading on this … you have the intellectual and computational firepower to make these kinds of approaches a reality.  It would be fun to catch up to hear what you are doing in this area!

9/27/16 – D. Wilz, Johns Hopkins University – GSA

The Chesapeake Bay program has been building a watershed model – they are using this approach to draw from an ensemble of models to make predictions for regulation.  You might be interested in following up with them.

I would love to!  Would you mind putting me in touch with someone involved in the project?

9/27/16 – D. Wilz, Johns Hopkins University – GSA

Have you worked with random forests for the ideas that you presented?  It seems like this approach to building sets of related models and then butchering and reformulating them could be really valuable.  Do you know if anyone has done this with hydrologic models?

I have thought about it and I had a great conversation with a colleague who is now working in banking doing this very thing.  But, I can’t say that I’ve had the time to do anything concrete.  As I understand it, what we did for the contamination problem could be classified as a model tree (although I am not sure how the parameter realizations fit in).  If we repeated this for a different concept of how solutes transport at the site, we would be starting to form a forest of model trees.  So, the short answer is that I agree that this sounds really exciting – and I would love to find someone that knows more about these approaches than I do to try to do something.  Are you that person by any chance?!

9/27/16 – audience member – GSA

How do you find clients who are willing to pay for even more models – especially if they already find the cost of one model to be too high?

This is a very important practical point.  What I didn’t mention is that we did the multimodel analysis shown here as revenue neutral.  That is – we proposed to do the work following the state of the practice – build our best model and run parameter estimation tools to quantify uncertainty.  Then we later proposed to do the multimodel approach that I showed for the same budget.  We could do that because we put less effort into calibration; we abandoned finding the absolute peak.  Looking forward, we can improve on this if we develop tools that allow us to do for model construction what we now do for parameter estimation.  This will require that we conceive of models as lying on a continuum, rather than as discrete things.  Then we will have some hope of using some of the clever tools developed for calibration to explore model space.  Finally, even without these advances, I think that we need to spend more time on client education.  In many cases – likely excluding those cases that a client is only having you build a model because they are required to do so – clients want model results to help to make predictions to use as the basis for decisions.  The key step is to have the discussion about the cost of the model being wrong.  My guess is that you will discover that this cost is asymmetric – overprediction of something has a different cost or importance than underprediction.  The idea behind DIRECT is to try to give the client some insight into how to examine those model outcomes that have the greatest potential impact.  Then you are in the position to tell them that by doing this you can also save them from paying to install wells that, in retrospect, you should have known would not be useful for answering their questions.  Finally, you can tell them that this approach will extend the useful life of the models that you provide, make future model updates less expensive, and protect them against legal costs related to the perception (or lawyer-assisted narrative) that they cherry picked assumptions in a way that would benefit themselves.  I think that could be a pretty good pitch, if you can get their ear to deliver it!

9/21/16 – Per Moldrup – Aalborg University

I like the approach that you have described.  But, don’t you still face the problem that, if your budget is fixed, spending money on modeling means that you don’t have money to spend on something else? How do you defend that it is worth spending that money on the modeling?

This really is the key question from a practical standpoint.  For me, I think that there are four reasons that DIRECT can be defended, even if it comes at the cost of some measurements on a fixed budget.  First, we will save money on data collection by avoiding collection of data that we should have known would not be informative. Second, we can help to avoid costly mistakes that we should have anticipated – which saves both money and political capital!. Third, by building multiple models early in the process, rather than serially, we have use of more models for longer.  I think that this makes sense simply in terms of extracting as much value from the model building effort as possible.  Finally, and perhaps most important, if the process allows us to communicate what we do and don’t know in a clearer and more useful way, I think that we will build faith in the scientific method over time.  At least in the US, I think that this is becoming critically important!

9/20/16 – Troels Vilhelmsen, Aarhus University, Denmark

In my experience, decision-makers often really only want one number. They only want to know the most likely outcome without uncertainties. Should we, as professionals, insist on trying to educate them about uncertainties? Or, should we give them the best prediction if that is what they want?

I am a bit conflicted on this.  I think that it is our responsibility to try to make it clear what we know and what we do not know when we present scientific findings.  To me, this adds strength to those things that we feel that we do know confidently by not mixing them with those things that we do not.  The trick is to do this in a way that doesn’t leave a user feeling that sciences ‘knows nothing’.  Through discussions over this year, I am becoming more convinced that we have to move away from the approach of trying to present results with ‘quantitative’ uncertainties.  Firstly, I think that it takes too much computational effort to do this in a defensible way.  Second, I don’t think that we ever really capture the complete uncertainty because we have no way of sampling in model structure space.  Third, even if we could provide this level of detail, I’m not sure that people would use it (at least, not often).  So, I think that we need to find a compromise – a way to sample the range of scientifically plausible uncertainties (story lines) without attempting to do this exhaustively.  I think that this requires careful and thoughtful discussions with decision makers at the beginning of a study (and repeatedly throughout the study).  I imagine that this will often involve exactly what you have asked about … an initial discussion to describe why it is important to consider predictions beyond the ‘most likely’ (more accurately, the ‘currently most likely given the limited data that we have’).  So, long answer, I think that instructing decision makers and encouraging them to consider useful uncertainties will make our work better and more useful, so we should strive to do this wherever possible!

9/20/16 – Lone Klinkby, COWI – Vejle

What is the cost of this compared to more common approaches?

This is one of the most interesting things to me.  I think, ultimately, the question is – are we better off having one detailed, highly calibrated model (which is still wrong) or can we develop many less well calibrated models, each of which is wrong, but that when considered jointly give us a better view of reality?  I think that the latter is more true.  So, we have to save some effort in calibration to allow for building more models.  We also have to build tools that allow us to build multiple models efficiently.  Finally, we need to develop our intuition not to try to find THE right model, but to try to define a collection of disparate models.  On top of this, I strongly believe that we will show such savings in more effective data collection, that we may actually come out ahead on balance.

9/20/16 – Niels Bøie Christensen, Aarhus University

I heard Ty’s Darcy lecture when he was in Denmark a few months ago, and we had an interesting discussion over coffee afterwards. The contribution that follows are thoughts provoked by the Darcy talk and our discussion.

When a scientist/contractor meets a client/politician/administrator for a discussion about a potential collaboration about a real-world issue that needs to be solved, it is very often a clash of worlds and the parties may have very different agendas. And, quite often, part of the different agendas is to keep them a secret to the other party. An example: some public authority needs to find out about the severity of a pollution situation, but there is a definite, though not explicitly voiced, wish that the result will be that there is no need to worry. This may save the public authority a lot of money and save the political level (perhaps leaning heavily on the administrators) embarrassment and possible law suits. The scientist knows this, but also knows better than to show that she knows, to avoid spoiling the good atmosphere, and to avoid being confronted with an ethical dilemma: on the one hand, the scientist cannot accept a situation where a specific result is required, but on the other hand, whether in private business or in a research institution, she really needs the job and the money that comes with the project.

The situation is not necessarily that bad all the time. Over the past 15+ years, all major groundwater resources in Denmark have been mapped under the Danish National Groundwater Program with the aim of understanding the groundwater resource thereby enabling management of the resource in a sustainable way. The mapping has been done by covering more than half the country in a dense web of geophysical measurements with various methods and building geological and hydrogeological models of the aquifers and their vulnerability. This mapping effort has been possible through close collaboration between research institutions, contractors, national and local authorities in a open dialogue based on trust.

I agree completely with Ty’s tenet that project planning between a contractor and a client should focus on clarifying what the most crucial question is that the client hopes to be able to answer through the investigations, what the options are regarding the investigation and to which extent the needs of the client might likely be met. This, of course, requires an open and truthful exchange in a relationship based on trust, and how far you might get in achieving this ideal depends totally on the level of trust between the parties and on the professional skills of both parties.

In the best of all worlds, the scientist/contractor must be creative and be able to choose the right methods and approaches that will provide the client with the best possible basis for decisions. What Ty suggests in his lecture is that it is not sufficient to deliver one final, ‘best’ model to the client. In any situation, there are numerous equivalent models that will all be consistent with the collected data, and it is important to convince the client that this is as good as it gets and to explain to the client that the answers to the initial questions must necessarily come as a probability of various scenarios and that the client will have to be able to relate to that and make decisions in the presence of uncertainty. Also, in case the client wishes to extend the survey, the contractor should point to those additional investigations that are most likely to invalidate as many of the equivalent models as possible.

It is quite obvious that the above ‘best of all worlds’ scenario is often a far cry from the actual situations. The contractor might not have the flexibility or the creativity to enter into an optimal dialogue, and the client might not be able to understand or appreciate what a good contractor can offer, and as we all know, there are many suboptimal projects being conducted around the world – to put it in a polite way. Below I will list a few of the cases that I have come to know, mostly second hand, I should say, but from trustworthy people.

Contracting is business and both owner and employees depend on being able to sell projects to clients. Either unknowingly, because of lack of competence, or knowingly because of economic pressure, the contractor might claim that she can do the job when in fact she can’t.  On the other hand, the client might not have sufficient scientific education or insight to be able to define the correct terms for the survey. A client might demand that the contractor must deliver results that are correct with a likelihood of at least 80%. Except in very rare cases of exceedingly simple questions, that is completely unrealistic, especially for hydrological models considering the most often ridiculously sparse data base for the modeling. However, any contractor is also faced with dilemma that if you don’t accept those terms and instead try to convince the client that they are completely unrealistic and that the conditions only testify to the client’s lack of scientific savvy, there is always another contractor who has no qualms. The result is that even deeply professional contractors will often accept the terms, knowing that they are unrealistic, to make sure that some good science is done. In the final reports, the results can always be masked in such a way that the client (lacking savvy) will not see that the conditions are not truly met.

Another situation that I have met is that the contractor actually delivers really good, high resolution results, but that the client is not able to handle the degree of complexity that is offered. Results can be ‘too good’ it seems.

The final point that I will make is that in most of the situations I have encountered, the client is not really able to relate in a meaningful way to the presence of uncertainty, i.e. the client has no scientifically based methodology of making decisions in the presence of uncertainty. This is quite understandable, really; there seems to be no leeway for uncertainty or for the inevitable errors that will be made in public administration. Politicians go to great lengths – and they have become REALLY good at it over the past 30 years – to be as invulnerable as possible to the inevitable attacks from the general public, and in particular the press, that will follow after mistakes and errors. When did you last see a politician admit an error and explain that errors are inevitable? As far as I can see, the capability to handle uncertainty resides only with very few entities: the oil companies (they make their profit on very small margins), the insurance companies (this is what they base their business on) and the military (who will lose if they can’t make the best decisions in uncertain situations). I’m a geophysicist myself, and I clearly remember the struggle and the educational effort that was required in the 1970’s to make engineers accept that drilling was not the only answer to everything and that geophysics might provide insight and save money in the long run. To a large extent we succeeded – at least in the parts of the world that I am familiar with – but I think we are in the same situation now regarding uncertainty: nobody wants it; nobody understands it; nobody can handle it wisely. A huge effort is needed on all levels to educate ourselves, our students, and our clients about how to estimate uncertainty and how to handle uncertainty wisely.

9/20/16 – Niels Bøie Christensen, Aarhus University

I think it’s a very good idea to turn in the process around: ask the question; build the model; collect the data. But, that puts a heavy emphasis on defining the question clearly at the onset, and on having a solid geologic understanding of the system.  In the end, there are three groups of people who can manage uncertainty: Oil companies; insurance companies; and the military. Beyond these groups, I think that we as scientists have some responsibility to educate people about uncertainty. Otherwise, the real problem lies in people being able to make decisions even given quantitative uncertainty.

I could not agree more that the ‘hard’ problem is helping people to act on information to make better decisions.  It is a bit of a dodge, but I think that this is beyond the job description of physical scientists … more in the realm of psychology, perhaps!  But, I do think that we can help by making sure that we are providing the kind of information that is most likely to be useful to decision makers.  It seems hard to argue that we cannot do this if we don’t start by asking what they want to know.  Then, we can work with them to translate any questions or concerns that they have into scientific questions or hypotheses to be tested.  This step usually takes place in secret, maybe in the silence of a scientist’s mind!  No wonder people are confused and distrustful when we present results based on a model that seems to have little bearing on the questions that they are asking!  If we can do a better job of co-defining the role of science in decision support, then we need to build the bridge to the concept of multiple models.  I think that this can be done easily with analogies to every other facet of life in which we make decisions.  After all, how common is it that we absolutely know what will happen if we take some action?  I think that the norm is that we weigh multiple possible reactions to our actions – we just need to make this link for the sciences, too.  I really believe that, if we do these two things, we will find that it places no limitations on the ‘science’ that we do, but greatly increases the public acceptance of and reliance on science.  As to your three exemplars of decision making – this is quite interesting.  I think that insurance companies make the best attempts to make quantitative predictions of outcome likelihoods.  But, this only really works if there is competition in the insurance industry; otherwise, they are incentivized to overestimate risks and the costs of those risks.  In other words, they are risk averse because they pay the cost of underestimating risk and reap the benefit of overestimating it!  In speaking with reservoir engineers, I learned that they, too, have an interesting and risk-averse preference.  Specifically, they put a heavy weight on risks that cannot be reversed once they occur.  Anything that they feel that they can engineer their way out of, they can accept.  So, they are on the lookout for low probability, high cost outcomes, too!  Finally, the military.  It would be very interesting to have a chance to talk to military strategists!  Not having had the chance to do so, I will add my own imaginations here.  I expect that the most difficult part of their job is determining what constitutes a risk or a benefit.  As a civilian, we cannot imagine loss of life of our own troops as an acceptable risk; they have to do this all the time.  Similarly, we have difficulty imagining the ‘value’ of recapturing a town, for example.  But, in the calculus of military strategists, this must have some perceived value that justifies the risks and costs.  I would be absolutely fascinated to know how quantitative these decisions are – or, are they largely based on perception, argument, and experience?!

9/17/16 – audience member – Austin, TX – Texas Water Development Board Meeting

Can you comment on the tendency that often arises to build a model that can do everything?

This is a really interesting question.  I had a conversation with some modelers in USGS district offices who made a case that they are charged with building models before their use if known.  When I mentioned this at the headquarters in Reston I got pretty quick and complete pushback.  Their opinion was that models must start with the definition of the problem to be solved!  So, there is a tension here!  Perhaps one compromise is to propose multiple possible uses for models and try to build an ensemble of models that can be used in concert to deal with all of these considerations and more.  Then, of course, as new questions are asked new models should be ADDED TO the ensemble rather than replacing everything that came before.

9/17/16 – Andy Weinberg – Texas Water Development Board, Austin, TX

To use your mountain climbing analogy – can we think of building models that are base camps?  Then we can develop detailed models from these bases as questions arise?

That is an intriguing idea … and a nice extension of the analogy!  I think that this is an excellent way to think of the process of building multiple models – maybe more along the line of random forests that model trees.  I’m going to use that moving forward!

9/17/16 – Audience member – Austin, TX – Texas Water Development Board Meeting

We have had to work for some time to convince people of the value of modeling for hydrologic analysis.  Won’t it be a hard sell to convince them that we not only need one model, but we actually need several?

It is really a challenge of communication.  The truth is that we always run multiple models – whether it is deciding what to wear or how to get to work or building complex hydrologic models.  I just think that, somewhere along the way, we made a strange choice.  We decided that for the complex problems that we deal with, we should only produce a single model.  Maybe this came out of a romantic image of science as ‘seeking THE truth’.  But, I think that if we are honest we have to admit that most of what we do isn’t capital-S science.  In fact, the applied science that we do is just a more logical and structured form of human thought.  In that light, it is almost irresponsible NOT to build multiple models.  The trick is to get this point across without undermining faith in hydrologic (or other) modeling.  I think that it would be worthwhile to work on an approach to redefining modeling as a community so that we do this in a clear, useful, and somewhat coordinated way.

9/16/16 – Mike Young – Bureau of Economic Geology, UT AustinHow do you protect against decision-maker bias?

Actually, I don’t think that is our job.  Now, that is too simple a statement, of course.  Even though we are scientists, we are also citizens.  There will be times that we feel that we need to advocate for positions.  But, as scientists, I think that our best role is to advise and I think that we can do that more effectively if we relate our science as directly as possible to the concerns that are driving decision making.  This requires that we listen to decision makers to identify their biases – that is, what is driving their decision.  Then we can try to determine if they are plausible (that is, we can build models that represent those concerns).  If we cannot, then we explain why.  If we can, then we ask the decision makers to let us know what would change their minds about their concerns.  Then we propose the best data to collect to test the models that represent their concerns.  Maybe it is too naïve, but it seems like that might actually end up as a path to give science a bigger role in decision making.

9/16/16 –Bayani Cardenas – UT Austin

How does your approach relate to data assimilation?

To the degree that I really understand data assimilation, I think that DIRECT makes allowance for it.  Here is how I see DA – data are collected continuously.  As they are, we modify the predictions of our models to be some average of the model’s predictions and the observations.  At some point, if our model predictions are weighted too little (because they aren’t accurate enough), or if we reach some fixed interval that we have set in advance, we redefine our model.  DIRECT is essentially the same, with some subtle differences.  First, we use an ensemble of models – chosen to represent a range of importantly different outcomes.  We update the weights of the models as we collect data – but, we could also make use of data-driven predictions in the mix.  If we collect data that doesn’t agree with any of our models – then it is definitely time to update the ensemble!  The difference, though, is that we use this same ensemble of models to identify discriminatory data.  I don’t know how that can be achieved with data assimilation alone … but, I’d be very open to hearing about it if it can!

9/16/16 – Rich Ketcham – UT Austin

How do you determine the appropriate level of model complexity to include in your ensemble?  Or, perhaps, if you had an order of magnitude more computing power, would you run more models, more realizations, or both?

This is one of the ‘hot questions’ in hydrogeology!  Usually, we are comparing the benefit of being able to construct multiple simple models (or to do more parameter estimation or uncertainty analysis on simple models) versus the ability to consider more, potentially important, structures through the use of complex models.  The simple answer in this case, is that it depends – the complexity shouldn’t exceed the complexity of the system.  More important, we should only include complexity that is relevant for the decisions that we will use the model to support.  I think that the best way to do this (and to resolve the larger model complexity question) is to do some of both.  Include both simple and complex models in the ensemble.  Eliminate those models that don’t make significantly different predictions.  Use the rest of the models to identify discriminatory data.  But, without this context it is hard to decide which models (complex, simple, analytical, numerical, etc) should or should not be considered.  I really like your follow up question!  Personally, I would always want to run more models than more realizations.  I think that we can apply more intelligence there – more judgement in deciding which models to include.  I think that parameter estimation is quite well developed along the line of ‘using a big hammer’ – perhaps it isn’t the right fight to try to change that!

9/16/16 – Colin McNeece – UT Austin

How do you balance your multimodel approach with the more targeted use of simple models that described Darcy’s approach?

That is a very clever question!  But, I think that it suggests that I misrepresented multimodel approaches.  Essentially, as I see it, there are two opposite camps – the build as many models as possible and see what shakes out folks; and the work hard to find THE right model folks.  Each has limitations – computational demand and the possibility of proposing a lot of really bad models on the first side; myopia and overcomplexification on the latter.  I think that we need to find a happy middle that avoids relying on a simple model, but also avoids construction of a lot of silly models.  To do this effectively, we should focus on those simple, physically based approaches that Darcy used.  We need to think critically about what we know, and what we don’t.  Then we need to build intuition so that we can identify which models are worth building and considering – which ones have the potential to change our mind (or the mind of a stakeholder) if they turn out to be right.  Does that answer your question?

9/16/16 – Jeremy White, USGS, Texas Water Science Center

 

(During a small group discussion at the Texas Water Development Board, in response to the idea of including model importance in the selection of models.) 

We (at the USGS) have started to encourage modelers to develop the forward model at the same time that we do model calibration to historic data.

To me, this is one of those fantastic, short statements that really captures something important.  As I understand it, this simply means that when they run models for calibration, they extend the simulation to the future time at which predictions of interest will be made.  By making this apparently simple change in procedure, they open up the possibility of examining the impacts of (apparently) small decisions regarding model calibration on important predictions.  This is a great idea – I hope they don’t mind if I advertise it in future talks!

9/15/16 – Binayak Mohanty – Texas A&M University

To extend the previous question-  what if we have been measuring groundwater levels in an area for 100 years?  Can you use your method to suggest new observations that you could provide that would complement exiting data?

I think that you have hit on a fundamental question.  To be valuable, any new measurement must have two characteristics.  First, it must contain information.  Second, it must not be redundant with existing observations.  Our primary point is that we can only assess information content in the context of how well an observation discriminates among models.  Unfortunately, we usually don’t know which model is right – if we did, we wouldn’t need new measurements!  So, we can’t determine whether data discriminates the right model from other models.  So, the next best thing to finish to try to identify data that discriminate ‘models of interest’ from other models.  That is what we do in DIRECT.  It turns out that we can also predict whether observations are likely to be correlated with existing observations – again in the context of the predictions of models of concern and other models.  So – a long answer to your short question – I think that something like DIRECT is an ideal way to predict whether a new measurement is likely to add value to existing data on the context of specific decisions to be made.

9/15/16 – audience member, Texas A&M University

In consulting, we tend to have data that we will monitor through time, and use that to continually test that we have correct models. This allows us to make projections into the future.

That makes perfect sense. The only problem that I have, is that we tend to build and revise models serially.  That is, when we find that our models don’t match new data, we modify the model and throw away the old model. I think it would be much more efficient if we were to formulate multiple models at the onset and use them throughout a project.  In this way, we are formulating hypotheses and counter-hypotheses.  This allows us to identify discriminatory data and to provide concrete explanations of our prediction uncertainty.

9/15/16 – audience member, Texas A&M University

A more fundamental problem is that our basic conceptualizations of soil function and pedology are not included in our hydrologic models. I think this is a much more important place but effort in and building many more models.

I agree with your point in time. I think that pedology is one of several examples of things that we do not capture fully in our models. The question is, how do we move forward practically? I think that the most efficient approach is to propose models of those things that we do not know, and to test whether or not these alternative models make a difference in the predictions that matter.  Of course, the point can be made that we are limited in proposing alternative models without some level of fundamental research.

9/15/16 – audience member, Texas A&M University

I work in wastewater treatment. We have the case when we have two models. One calibrates under storm conditions. The other calibrates during low flow. How do we use your approach to know which model is correct or how to move forward and making decisions.

Firstly, I think that this is a classic case of evidence for an evolutionary jump. That is, neither of your models really explains the data adequately. I wonder if there’s a way to use it nights from your first two models to propose alternative models?  That would be interesting to pursue. Beyond that, I think that the first question is: do these models fundamentally different predictions about something that would be used to support decision making? If not, then perhaps there is no real need to develop new models. If so, then perhaps we can learn something about the ways in which they disagree.  Lastly, I think we need to explore how to use multiple models to support decision-making. We still tend to want to settle on a single model that explains everything as the basis for decision support.

9/15/16 – Ralph Wurbs, Texas A&M University

(During a discussion on the use of models for water management.)  Sometimes, the model IS the reality.  By that, I mean that if all parties agree to be managed by the predictions of a model (for example in defining a flood plain), then those predictions are the actual values that should be used for management and planning, regardless of whether the model’s predictions are completely correct.  More specifically, models are an important aspect of reality (not just a simplified representation of reality) in the context of regulatory agencies making decisions based on model results.  The amount of water available to a water right permit applicant in Texas is set by the TCEQ WAM model.  A floodplain delineation that restricts a developer from building a house at a certain location is based on the result of HEC-HMS and HEC-HMS.  However, I am not sure about the definition of consensus model.  Floodplains delineated with HEC-HMS and HEC-RAS are employed to regulate floodplains without the consensus of developers or anyone else. Likewise, the TCEQ denies water right permit applications based on WAM results regardless of agreement or lack thereof from permit applicants.

This is a really good point.  In my talk, I take a pretty hard jab at consensus modeling, essentially because I don’t think that it has any better chance of developing a correct model than any single modeler would.  But, your statement is true in a more important way – the consensus does allow people to have something that they can use for planning.  If future rules or values are based on a constantly shifting model, then it might really stifle investment and hamper civic planning.  Of course, this has to be balanced with the damage that can be caused by using a consensus model that is clearly incorrect.  In this case, people are likely to lose faith in a model that is consistently and clearly wrong.  I think it would be very interesting to examine how this balance should be made for different groups with different decision processes.

9/14/16 – Tom Bjorklund – Houston Geological Society

Is there a similarity between your work and what they are doing with climate science?  Also, how can you determine the validity of models to include in your ensemble?

I do think that there are similarities.  But, I also think that there is a major difference.  That is, for climate change, as I understand it, most of the uncertainty has to do with stresses on the system.  There are clearly uncertainties related to how elements of the system interact; but, anthropogenic effects are major drivers.  (Notice the differences among scenarios compared to among models for one scenario.)  For hydrogeology, we still need to focus on ‘internal’ uncertainties.  In other words, we can add value if we can predict how the subsurface will moderate (or amplify) climate changes.  Perhaps this isn’t as big a difference as I imagine.  But, I do think that it makes it easier to imagine measurements that could be made to resolve the key uncertainties.  As to the validity of the models, I’m not sure that we can every verify this.  I think that we are on stronger footing if we build physically-based models rather than correlative models.  There is also an argument to be made that we can be more certain that simple models might be realistic than overly complex models.  But, that is an open debate!  In the end, I think that one of the benefits of DIRECT is that we can propose models and then test them against competing models.  In the end, this is the only way that we can address model validity.

9/14/16 – Lori Green, Hess  – Houston Geological Society

Do you make use of sensitivity analysis in your approach?

So far, we haven’t made specific use of sensitivity analyses.  In part because this tends to be applied more often to parameter estimation – we are aiming to reduce the effort on this front.  On the other hand, if we can find ways to describe model structure as continuous, too, then we could apply sensitivity analysis approaches to this aspect of uncertainty, too.  Having said that, I know that there has been a lot of good work done in developing thoughtful sensitivity analyses – it seems a shame not to put them to good use for DIRECT!

9/1/16 – Perth – Sarah Bourke, University of Western Australia

What do you do in the case where you do not have a specific question posed before you build the model.

It is interesting how often this comes up – there seems to be a real division within hydrogeology, with some people insisting that all models must be developed to address specific questions and others saying that the model should simply be the most accurate representation of the system so that it can be used for any question. (At least, any question that is covered by the processes included in the model.)  I come down on the former side.  I think that there are too many decisions that we make, with minimal supporting evidence, to think that we can generate a single accurate model of our systems.  Rather, I think that we need to construct several models – ideally for multiple questions considered from multiple perspectives.  I think that this is the best way to ensure that we have a broad and representative model ensemble and that we cover what we don’t and what we don’t know in the context of the tough choices that we have to make when building a model.

8/29/16– Penny Wurm – CDU, Darwin

I am an ecologist, but I recently took part in training in a social science methodology known as Realist Evaluation (RE) and Realist Research (developed by Pawson and Tilley, 1997) to apply on a project on integrated water management. RE constructs public programs e.g. posting police officers in schools (or in our case it was constructing irrigation infrastructure in remote communities) as a hypothesis (or model) about how things will work. This approach moves away from mean results e.g. 60% reduction in young people getting into trouble with police (or 60% of household livelihoods improved)  to asking “what about those other 40%”. The basic of this RE approach are the questions: what works, for whom, in what context and via what mechanisms. Consequently there may be several programs required to solve a problem. This struck me as very similar to your approach in creating multiple models around problem that may capture different pathways/scenarios around the one hydrological system.

That is a really cool connection!  Firstly, I would really like to follow up on the RE idea – and I’ll get back to you once I have.  The difference may be that we are tasked with imagining the different outcomes before they happen.  But, once we make that connection – our models as stories versus actual documented outcomes as stories – I think that the similarities are very promising.  In our case, I think that we have to present multiple plausible stories that express both what we know and what we don’t know.  Then, with these narratives in mind, people can decide if these different models are importantly different to them or not.  If they aren’t, then we can move forward without doing more hydrogeology.  But, if the differences are important, then we can explain more clearly why we need to do more work and what work we would propose to do.  Ultimately, the reason to do this is to describe the processes at work in the system and to find data that can help us to test competing ideas about how these processes interact.  But, this has to be grounded in the different impacts of these potential outcomes – perhaps better viewed through RE approaches!

8/29/16 – Audience Member – Darwin – Darwin

Isn’t there a danger of bias when you are setting up your model ensemble?

There is.  But, I would say that the biggest danger comes when we decide to propose only one model.  In a sense, I think that you could say that bias only really arises when you exclude ideas due to incomplete analysis.  We think that the only way that you can guard against this is to intentionally include many competing models of the system.  Having said this, the key to formulating a good ensemble is to identify the importantly different biases that you could propose (and support with the existing data) and to make sure to include them in your set of models.

8/29/16 – Steve Tickell – Darwin

What was the proxy for the real world case that is analogous to doubles for the dice rolling?

It isn’t as clear, is it?  Essentially, we are asking if there is anything that we could measure today that would be a good predictor of what will happen in 200 years?  To determine this, we gather all of the models and break them into two groups – one that says that mass will leave the site and the rest that say that mass won’t leave the site.  Then we can predict anything that we could measure – any value, at any location, at any time.  Then we look at all of those predicted, or proposed measurements and we try to find something that we expect to be measurably different between the groups.  The more difference we predict that there will be between the groups, the more that the measurement can act as a proxy for the basis that we used for dividing the models. 

8/29/16 – Dale Cobban – Northern Territory Government, Darwin

All of the ‘good’ models that we can construct actually follow the same basic physics.  So, are the different models in your ensemble really ‘different’?

This is a great philosophical question.  In many (although not all) cases, we may believe that we understand the physics well enough that we could even say that we could use the same basic (e.g. MODFLOW) model.  But, even in these cases, we have geologic uncertainty to consider, and boundary conditions (or scenarios) that are not well known.  Even if these could be considered to be certain, we still have uncertainties regarding parameter values and objective functions that are uncertain.  Leading us to be able to formulate multiple models (or multiple realizations of a model) that are different than each other.  The critical job that we face is to make sure that we are addressing the uncertainty that we actually face – whichever of these levels represents our problem.  Then, whatever the level of uncertainty, if the goal of data collection is to reduce the uncertainty, then we should look for data that can test hypotheses against counter hypotheses.  I don’t think that is possible if you only have one model (one hypothesis) that you are putting forward.

8/29/16 – Dale Cobban – Northern Territory Government, Darwin

I would like to push back on your basic premise.  As I see it, I am a modeler, a scientist.  My job is to develop the best, most objective, most complete representation of the system as possible.  I shouldn’t consider how people will use the model – in fact, I don’t even want to know that.  I agree that the model should evolve through time and with improved understanding.  But, the goal is to develop a scientific model.  I think that this is the only way that science can contribute pure scientific understanding to problem solving. Speaking from a regulatory context there is a temptation on the part of decision makers to try and get the model to generate the answers they want to hear.

I’m really glad that you pushed back!  I’ll push back, too … and then invite you to reply.  To some degree, I would agree with you.  But, and this is a big but, only if we are confident that our models are very nearly complete and correct.  Firstly, I think that is extremely rare (again, not impossible) in hydrogeologic analyses.  Our systems are generally too complex and our data generally too sparse to have that level of certainty.  If our models are that certain, then we shouldn’t claim that we need any more data – they’re done!  If our models are uncertain, and if our budgets are limited, then we have to decide what parts of the uncertainty are most important to address.  I think that if we try to make models that are generally good, then they will be specifically bad when we use them.  Not only that, but that just leaves us to make somewhat arbitrary and poorly explained choices about what we happen to think makes a model ‘good’.  Based on all of these complications, I have come to a simple conclusion.  People want us to help them to make decisions by injecting scientific understanding.  The best way to do that is to try to improve our understanding in the parts of the system that have the most impact on the outcomes that matter to people.  Further, the only way that we can decide which data have the greatest likelihood of testing our hypothesis is to find data that would discriminate between our hypothesis and other counter hypotheses.  This requires that we form at least two models and preferably as many models as we need to describe our uncertainty adequately.

Dale replied on line: Agreed I grapple with this every day, we have large basin-scale models in excess of 100000 km2 in extent where data is *very* sparse. Developing a model to answer pre-asked questions brings the danger of tailoring the model to meet expectations that change and then other expectations are expressed, or even once the model findings are presented the people change their minds again. Which is why we have moved to an approach that the modelling tool is the best reflection of the real world using everything we can throw at it, with the hope that whatever questions are asked subsequently can be reasonably accommodated. As for uncertainty – managers don’t want to hear about uncertainty! It can paralyze decisions on one hand, Ive had managers use uncertainty as a reason to discard model findings completely = “oh well in that case we don’t even need a model Ill just decide”. Others don’t want to hear the word uncertainty they develop an unhealthy dependency = ‘the model says … therefore its true”. I’ve caught them at it when they think I’m not around, after hours of explanations when I thought I’d convinced them. Who would be a modeller?! That said it’s a great way to try and understand large complex aquifer systems – and a great sense of achievement when we get a close match with new observations.

8/25/16 – Eric Lawrey and Aaron Smith, AIMS, Townsville

We are just getting into modeling the Great Barrier Reef.  There is a hot discussion going on about how to decide which data to collect next.  I think that you have some really nice ways to describe complex topics in a way that anyone could understand them.

Great – I’d love to join in the conversation if that would be of interest.  We have done some work in data downsampling – trying to figure out if existing data are actually helping us or if they are largely redundant.  As I presented, we are also looking at forecasting which data will be useful, even extending that into data sets rather than single measurements.  Let’s follow up!  Also, please feel free to grab and use the slides from here my references page!

8/25/16 – Eric Lawrey and Aaron Smith, AIMS, Townsville

How can we build the framework to construct, run, and then harvest the output from models.

Agreed … it is really worth thinking about what you will do with the output before you run multiple models!  In hydrogeology there have been some nice developments, incuding flopy and Martyn Clarke’s work on automated model building.  More generally, some related work has been developed for different fields by this group in North Carolina.  These references may be worth looking at to see if they would apply to your problems!  Having said all of that, I think more and more that we need to inject more reason into our analyses.  Do we really need to run 100,000 models just to convince ourselves which models are most likely?  Can’t we use some insight to find a smaller number of models that are all plausible but different than one another?

8/25/16 – Eric Lawrey and Aaron Smith, AIMS, Townsville

One problem that we have is that some measurements on the reef are very difficult and/or very expensive to collect.  How can we address that aspect of monitoring design?  This could be really important as they redesign Reef monitoring.  Should we continue long range data to take advantage of the baseline, which can overcome issues of low signal to noise?  Or, should we add new, discriminatory data?  Can you include those considerations, too?

Absolutely, but it becomes a bit more difficult.  Essentially, this is the problem that we deal with when trying to identify multiple new measurements to add simultaneously.  The alternative (which is pretty unrealistic) is to say that we will collect one data point, then update our model, then collect another data point ….  The problem with these efforts is the vast number of combinations of measurements that you can propose.  (This quickly swamps considerations of parameter estimation’s computational requirements.)  Accepting that we have to deal with these issues, cost is not so hard to add.  Basically, for each set of observations that we propose, we also assess the cost of that set.  Then we can produce a cost:benefit, or Pareto trade-off analysis of different measurement sets.  So, it isn’t too hard, it’s just hard to do it efficiently!

8/25/16 – Don Pollock – Townsville

I’m an agronomist leading an extension team in the wet tropics on sugar cane.  Our issue is that there is an overwhelming body of WQ models that indicate causation and linkage to agricultural impacts on the reef.  I would be interested to know if there is a way that we can investigate whether these models are sufficiently diverse.  My interest would be to make the models interpretation more plausible to the array of land managers.  I am not a modeller.

I would love to follow up with you about this.  To me, this is the most important thing that we can change about how we use models to help answer questions.  Too often, we have a model, or even multiple models that are largely similar, and we use them to describe a system.  Then we are surprised when we discover something new that overturns these ‘established’ models.  There is no way to protect against this perfectly – our scientific knowledge is always limited.  But, I think that it is critical that we try to look for these surprises.  In my mind, the best way to do that is to chart a model as it is developed (or even after it is complete) and identify the choices that were made in building the model.  Invariably, we will find many choices that we made, that we can’t fully justify.  The important step is to then make the opposite choice (or, at least, a different choice) and see if we can still make the model fit the data.  If we do this for several key decisions, we will have at least a basic model ensemble that defines both what we know and what we don’t.

8/25/16 – Diana O’Donnell, Townsville

How would we sell the idea to government on one side and the community on the other that we should build five less-good models rather than one good model?  Of course, we could do much more if we could balance the efforts on planning and prioritization studies with the scientific investigations that support them!  I think that this would end up with much better integration of the science with the decisions.

Ultimately, I don’t think that we should be as confident in the one ‘good’ model.  Further, we have to be clear on the benefits of the multimodel analysis.  Firstly, I think that we should try to do this approach for the same cost as developing one well calibrated model.  I think that we can do this based on the savings of not doing as much calibration, on not collecting non-informative models, and on not having to build a brand new model each time that we gather new data!  As for the scoping studies, it is a shame if they are not focused on defining the part of the problem, or the concerns, that are really driving stakeholders’ concerns.  If we do that first, then build the models, and only then collect data, I think that we could actually have a bigger impact for less money.  Let’s follow up – I’d love to have a conversation about this!

8/25/16 – Angela Bush, Townsville, Queensland, Australia

Do you need to have the PDFs, which  are based on the ensemble of models, to choose data?  It seems that this could require an unreasonable amount of computation before you can make any decisions about what to measure.

I’m really glad that you asked this … I haven’t been asked this question in this way to date.  I hear a key point in your question … at some point, you just have to make the leap and measure something!  I couldn’t agree more.  But, on the other hand, I think that we could do a BIT more thinking before we collect data in most cases.  This is consistent with a general line of thinking that has emerged during this lecture series.  Basically, I have come to believe that it is a bit of a fool’s errand to try to formulate quantitative measures of our prediction uncertainties.  Firstly, I don’t think that they are really what risk managers need – they need probabilities of outcomes, the best that we can give them is probabilities that we KNOW the outcomes, which are subtly, but importantly different.  Secondly, I don’t think that they are ever used quantitatively.  I think that managers use categorical estimates of risk for groundwater problems – no way, not likely, maybe, likely, definite.  So, why not try to develop a smaller number of models that cover the range of plausible model space and assign them qualitative likelihoods?  Then, we can say something like – there are a couple of models that say that your rice paddy could go dry, the most likely of those models only ranks as not likely, but possible, is it worth it to you to spend more money to try to test them further?  If we collected data that said that the models were in the ‘no way’ or ‘maybe’ categories, would it change your decision?  If we can take this approach I think that we can do a reasonable amount of modeling and then give stakeholders an understandable basis on which to decide whether we should continue.

8/25/16 – Keith Bristow – Townsville

I think that many of the questions that face scientists can be answered with very simple models – really just energy and/or mass (water) balance.  Do we need to consider multiple models or stakeholder preferences in these cases?

This is a good point.  On the face of it, this is an argument in favor of simpler, more defensible models.  I agree entirely … unless, of course, the processes or structures that control the outcomes are actually complex.  Then, modeling them will require model complexity.  But, on a deeper level, your question points to something fundamental about science.  There are, ultimately, truths in nature.  Regardless of what our political ambitions or economic valuations, some representations of the world are ‘more correct’ than others.  I think that the challenge that we face lies in developing multiple plausible models that honor these basic truths, while acknowledging those things that we don’t know.  In many cases, this will require us to put of foot down and say that some models are simply scientifically indefensible.

8/23/16 – Jim Stanley, BrisbaneI’m a third year Earth Science student, we spend a lot of time on the technical aspects of modeling.  The talk gave me a good perspective on why we model.

Great!  I’m glad that it was enjoyable and I hope that it continues to resonate as you learn more about modeling.  Never be afraid to push back and ask how model assumptions are justified.  Feel free to throw me under the bus if your instructor gets upset.

8/23/16 – Jerome Arunakumaren – KBR, Brisbane

What should we do if our model is very well calibrated for short term predictions, but diverge for longer term predictions?  In particular, most of our models are developed for EIS studies and these models are not well suited to modeling higher stresses on the system, like high volume extraction.  As a result, I think that we may not be producing models that really describe, for example, the connection of aquifers during pumping.

This is a good question because it points out an important decision to be made.  Basically, it is this.  If you can demonstrate that your model makes reliably good predictions within certain limits (say of time before the prediction), then by all means use it.  But, if it were me, I’d be pushing the edge of where the model starts to fail.  I’d look at some competing models, all of which behave well within the limit, but which disagree at the edge.  My guess is that this examination will be the most productive for identifying discriminatory data, which will improve your model against some surprises down the road!

8/23/16 – Jerome Arunakumaren – KBR, Brisbane

Should we spend time on model complexity or diversity?

The simple answer is, yes.  But, if you have a limited budget … I’m guessing you do … then I would suggest that you start with multiple simple models and only add complexity as you find that they, in their entirety, don’t bound your data.  But, even as you build more complex models, I’d keep the simple ones in your ensemble!  The other way to go, which may be a bit safer, is to go forward with what you are doing for model building.  But, once you have your model, go back and try to identify the key decisions that you made that you cannot justify fully.  Then try to intuit which of these might have the biggest implications and see if you can build models with the ‘opposite’ assumption and still make them calibrate.  This may be a shortcut to forming an initial diverse ensemble.

Jerome replied – Most of the groundwater models have been developed during the predevelopment phase for the EIS.  These models normally do not consider how  the model parameters (i.e. storativity, vertical leakage, etc.)  should be varied with the pumping stress applied  to the model during the development phase in order to match the observed water levels.  For example, groundwater extraction in aquifers with low transmissivity creates deep and narrow cone of depression around the well.  This creates strong hydraulic gradient across the adjacent aquifers and allows vertical leakage around the well.  The models developed during the EIS have coarse grid spacing and high horizontal to vertical anisotropy values cannot used to adequately investigate these impacts.  During the EIS process, not enough information is available to calibrate the model parameters such as storativity and vertical leakage etc.  adequately.

8/19/16 – Matt Chamberlain – CSIRO, Hobart

How do you ‘sell’ the idea of multimodel analysis, especially given that it will likely cost more than single model approaches?

Cost is always an issue.  But, I don’t think it is a matter, necessarily, of explaining higher costs.  Rather, different costs.  My sense is that what we propose is to spend more money up front defining the range of what we do and do not know by formulating ensembles of plausible models.  This should save money that might be ‘wasted’ on data that we should have known would not be discriminatory.  Also, as we collect data, it should offer savings in model re-development.  More often than not, new data will only change the relative likelihoods of models.  It will only be those times that we collect really ‘surprising’ data that we would propose to rehaul the model ensemble.

8/19/16 – Rob Virtue – GHD, Hobart, Tasmania

(This is in opposite format – I asked the question and Rob provided the response!)

Ty – I have had some modelers express concern that developing multiple conceptualizations of geologic structure would be too difficult to implement.  In particular, that generating new model grids for each conceptualization would not be feasible.  Have you had experience that could add to this.

Rob – We have found that GMS is quite flexible for handling this because the conceptualization is separate from the grid development.  In principle, I think that you could define a set of geologic structures to examine and then feed them through GMS without too much difficulty.  Perhaps the biggest challenge lies in dealing with the output of so many models in a way that it will make sense to a client.

8/19/16 – Shawn Hood – University of Tasmania, Hobart

How can you produce the ensemble of models that you are suggesting?  (I should note that I am coming to this from a geologic perspective, in particular metals resources.)

Great!  I’d love to follow up to hear how our ideas might (or might not) translate to your field!  To answer your question, the idea that I like at this time is as follows.  Go ahead and follow your ‘usual’ approach to building the best model that you can – but, do two things.  First, keep a list of the decisions that you make along the way, especially those decisions that you feel are not fully supported.  The second thing, is to not obsess over perfecting that model.  Get it to the point that it is ‘acceptably good’ for your field.  Then go back and look at the series of decisions.  Think about which ones might have the biggest impacts in the context of some specific decisions that you may have to make (e.g. mineral reserves, ability to mine, mine stability).  Try to make choices that might be ‘worst’ for these outcomes (it works best if you consider several decisions of interest, I think).  The goal is to develop a range of different, but plausible models.  Let me know if that makes sense to you!

Shawn followed up:  In the metals sector, most conflict faced during resource modelling is between the technical staff (field geologists, resource modelling geologists, etc..) and engineers or [often non-technical] managers.  Geologists are usually working to produce the most ‘realistic’ model possible. Such models change often, as new information becomes available. On the other hand, managers usually promote or attack models within the context of corporate fiscal planning or quarterly bonus schemes. Additionally, engineers become frustrated as ore blocks shift or change value, losing confidence each time they do so.  I have worked in situations where I have been part of a team that created multiple models representing a variety of hypotheses, to the horror of our managers! Approaches for informing the public’s narratives, as you outlined in your Darcy lecture, could aptly be applied within companies.

8/19/16 – Pat Quilty – University of Tasmania, Hobart –The difficult thing can be that when you have a model that works, for example a recipe to make a cake, it can be a challenge to decide to make a change.  Especially when that recipe (or model) has been tested across many conditions.  How do you make the case that a new recipe might be worth trying?

I really like that analogy. But, I’ll take one exception to it.  That is … I don’t think that a cake recipe qualifies as a model in our sense.  That is – it is a set of steps to take, but it is not predictive beyond what we have already experienced.  It would be more of a ‘model’ of making a cake if we had to predict how to make a cake on Mars, for instance.  In that case, we could explain what might be different and how that might make our current recipe inadequate.  Essentially, I think that this is the process that we don’t follow often enough – thinking about could go wrong and proposing multiple possible, plausible pathways to disaster.  Then we can figure out what to test to guard against a fallen cake on Mars!

8/17/16 – Adelaide

Vincent Puech – have you worked on fracking (coal seam gas) problems, where the opinions on both sides are so entrenched and so strongly held?

I have not, personally.  But, I think that this is a perfect target for an approach like DIRECT.  The important thing in that case, in my mind, is to describe what is driving the concerns of stakeholders (for example, leakage of proppants into overlying aquifers.  Once defined, we should make every effort to develop plausible models that support this concern. These models would be useful members of the model ensemble as ‘representatives’ for these stakeholders.  If we simply can’t find any, then we need to say that – but, it is a much stronger statement than, ‘my best model says that won’t happen.’  If we can find such models, then we can work with stakeholders to see these as proxies for their concerns – then we can actively look for data to test these models (their concerns) against all other plausible representations of the system.

8/17/16 – Vincent Puech – Adelaide

How do you balance model complexity and model diversity?

This is a hot debate in hydrogeology.  I like the idea of simplifying models.  But, some questions actually require a degree of complexity.  One approach is to build your first model.  Ideally, make a list of assumptions along the way.  Review them and choose those that are most poorly supported and most likely to be important.  Then build a set of models that take the ‘other’ assumption at each branch.  This is one way to start to identify models that are different from each other, span ‘outcome space’, and represent different stakeholders’ concerns.  At the same time, it avoids the shotgun approach of building a lot of models, most of which are very similar to one another.  In short, we need to replace brawn (lots of computation) with brain (thinking about which different models are important).

8/17/16 – Huade Guan – AdelaideI wonder if DIRECT has some relationship to the “cross validation” concept in geostatistics. For all existing data points, we can run the models with one point left out. We can then examine the distribution of model estimates at each of these left-out points, and collect those points with a larger range of model disagreement. These points are then selected as the critical ones to determine which model(s) is the best. I wonder if this is consistent with DIRECT. I can see some difference from using NSE to compare the model performance, which is based on the model estimates of all observation points (not the selected ones with larger disagreement between models), but wonder if this difference is significant for model selection. Relevant to this, using DIRECT to guide down-sampling is a very good and useful direction.

Absolutely – this is consistent for the problem of downsampling.  Pat Reed, at Cornell, does great work in this area.  There is a major complication, though … we are often interested in combinations of observations to remove (or keep).  This ends up requiring a lot of computation because of the combinatorial nature of choosing sets.  Our emphasis is on which measurements to add to the system.  As a result, we can’t remove measurements and recalibrate because we don’t have the measurements, yet.  In place of this, we are looking for discriminatory measurements – those with the greatest likelihood of separating one subgroup of models from others.

8/17/16 – Adelaide – Huade GuanIs there a reason that your Darcy Lecture is not to be delivered in Asia (except for Japan), such as China and India? I am a little familiar with the situation in China. Your lectures would be well received in China, given the rapid growth in the research mass and the need to address serious environmental problems. It is probably too late to arrange anything for you. But it would be good to bring this to NGWA attention, as well as the counterparts in Asian countries. If funding is the only issue, it might be solved by getting local funding support. And it is easy to find multiple stops for Darcy Lectures in China. I am sure NGWA already has some connections to the groundwater research community in China.

I think that this is a major, but manageable problem.  The problem is the timing of placing requests – I know that I received requests from China, India, and Russia, but after the closing date for requests.  Basically, we need to get the word out more effectively to scientists in China, India, and other countries with active hydrogeologic communities, but who are not as plugged in to the Darcy Lecture series.  They need to be ready to submit requests and we need to encourage them to do so.

8/17/16 – Luk Peeters – CSIRO, Adelaide

We are often asked to build models without being given a specific question to address.  Is there a way to build generally useful models?

Avoid this temptation where possible, or, at a minimum, try to avoid multiple possible questions that might be of interest in the future.  But, more seriously, I would always try to avoid the temptation to stop with one model.  I think that the idea that you described – documenting the decisions that you have made and describing their supporting evidence – is a great one.  If you don’t have specific questions in mind, then I think that you should prioritize those decisions that are least well supported and that you expect will have some impact on the model predictions.  If you do have a question, or questions, then you should also consider those unsupported choices that, if made differently, could have bad effects for some stakeholders.  Note also – Luk presented a very clear explanation of an idea that I was wrestling with – he proposes keeping a list of assumptions – and their bases for support – during model construction.  Then, you can revisit them after you have calibrated your model as a useful way to generate competing plausible models.  This has since become ingrained in my thinking!  Thanks!!

8/15/16 – Audience Member – Melbourne

Can you also include consideration of multiple driving scenarios?

Absolutely.  I think that it is important to cover both our structural uncertainty and uncertainties regarding the possible applied stresses to a system.  In some ways, though, I think that our main job as hydrogeologists is the former.  How can we do the best job possible of defining how the system is likely to respond to stresses.  Or, more precisely, can we test hypotheses that make predictions of concern in this regard.  Then we need to work with others who can provide scenarios regarding external stresses so that we can put more realistic limits on predictions of hydrgeologic responses to these scenarios.

8/15/16 – Ben Moore – Melbourne

I am often in a team where I am the only physical scientist.  It has been very interesting to realize that they each have their own narrative in mind.  We can tend to disengage if they won’t ‘come on board’ with our scientific opinion or story.  But, I think that you are right that we need to be aware of what each group brings to the table and then think about how your science can help.

I’d be very interested to hear more about these experiences.  This has been my assumption – that we can actually learn things from nontechnical people that would bear directly on how we do our physical science.  The most obvious influence might be that they can help us to define which predictions (and at what level) are most important for decision support.  But, I think that there are other, more subtle and maybe more important, contributions that they can make in forcing us to think of competing models.  That is, how can we push our assumptions to support their concerns – personalized worst-case assumptions – as a way to try to bring their concerns into the most scientific (and therefore testable) format possible.  Also, they may provide ‘soft’ observations of system behavior that could improve our models and that can help to ensure against unknown unknowns to which we might be blind if we ignore these observations.

8/15/16 – Kate Dowsley – Jacobs, Melbourne

How do you get multiple sides to collaborate and describe their primary interests.

The key is to make it clear that you will advocate for both sides.  But that you need to know what matters to them before you start. I do think that these social science aspects are the hardest part of the process.  But, they are the most important.  Anything that I hear about that works, I’ll share!

8/15/16 – Kate Dowsley – Jacobs, Melbourne

In climate modeling, they do scenario neutral modeling to figure out what we could not handle and then work back from there.  Is that related to what you do?

I do think that it is related. But, there are a couple of differences.  First, I think that it is important to include ‘worst case’ outcomes.  But, I think that they need to be one of many models.  Ultimately, the idea is to build a series of plausible stories so that we have a framework for people to discuss their concerns in a scientific context.  The second difference has to do with the types of uncertainties that we consider.  As hydrogeologists, I feel that our main contribution is to understand (and predict) how the physical system responds to external forcings.  Climate feeds those uncertainties to us.  So, in many cases, the uncertainty that people care about is climatic (or land use / behavioral).  Our job is to try to determine if the physical system will moderate those future forcings, or be dominated by them.

8/15/16 – Kate Dowsley – Jacobs, Melbourne

How do you sell this to clients … saying that we’ll build a lot of models first and only then collect data – especially if clients don’t want to hear about risk.

I think that a lot of clients may THINK that they don’t want to hear about risk. But, I wonder if we have the RIGHT converstations with them if they might actually recognize that risk is driving their decisions.  I think that this is especially true of the project will have some lifetime, needing future revisions.  Finally, I think that you can make a case that this is an especially robust approach for problems that will face litigation – because you can protect yourself (and your client) against having to make a series of difficult-to-defend decisions!

8/15/16 – Rikito Gresswell – CDM Smith, Melbourne

What direction is the industry headed – toward simple, parsimonious models, or highly parameterized models?  In some ways, we find that the former is easier for regulators to understand.  Also, there is an understandable reluctance in the consulting industry to ‘admit’ that we have uncertainty because of risks that clients/communities may perceive model results as not useful.

As a hydrogeophysicist, we found that we could interpret a lot of complexity, producing nice cover images, but that we not used by hydrogeologists.  It has been a hard road for us to give up the complexity that we CAN infer in favor of thinking about using geophysics to define a range of plausible simpler models.  I think that hydrogeology is in the same place now.  The challenge that now exists is to think about how to best describe our uncertainty as it is embodied in multiple, simple models – no one of which fully describes the system.  I think that if we work on this, it can be a much simpler and more complete way to describe uncertainty – based on discrete sets of competing models.  But, we need to think about how to do this effectively.  Ultimately, we need to develop tools that help us to develop tools (like those that help with parameter estimation), but we need to balance this with careful thinking to identify importantly different models as efficiently as possible.

8/15/16 – audience member, Melbourne

How can you be more efficient in finding ‘implausible’ models?

You could.  But, it runs the risk of trying to discount stakeholders’ concerns.  I think it is more powerful to try to find plausible models that DO represent their concerns.  In this way, first, we are ‘on their side’.  Second, it leads us to think more broadly about models.  Third, if we can convince them that we have model that represents their concerns, then we can apply science (and discriminatory data) to test their concerns.  This is also related to the idea of looking at ‘extreme’ models – there is a nice short paper about this on the blog.

8/15/16 – Heath Pawley – Golder, Melbourne

It seems that you are making the case that stakeholder engagement is integral?  Was this your initial aim, or has this emphasis evolved as you’ve given the talk?

Our aim was to make science more useful.  To find a way to have people take up all of the work that people put in to developing models.  It evolved somewhat naturally that this led us to think that this means that we have to start with the problem and then build models that address the important questions. But, almost every element of my thinking on this has evolved during the tour.  I am really amazed by what people are doing in our field – and heartened, in some ways, that we are all facing the same issues!

8/11/16 – Audience member – Canberra

Are we still limited by funds as regards the number of runs that we can do?

To some degree, yes.  But, I think that we should be able to run fewer models and still cover the range of important models if we are more thoughtful in our model creation.  Essentially, I think that this will rely on developing tools to help build models, as we have for model calibration.  But, I also think that we need to forego the idea of developing quantitative measures of prediction uncertainty.  In other words, we don’t need to generate 10000 very similar models near the maximum likelihood model to know that it is our current favorite model.  We would be much better off generating fewer models that span model space.  The challenge is how to use these models to support decision making.

8/11/16 – Audience member – Canberra

Should we intentionally examine models under greater stress to gain more insight?

No question.  Many of our systems are nonlinear (and potentially irreversible), at least in their responses to extremes.  So, it is important that we capture data during extreme events.  I think that this points to two needs: first, long term monitoring; and second, rapid response modeling that can be done in potentially hazardous conditions.  There is another related question that is more related to our work.  That is, should we apply greater stresses to systems to better test their responses?  For example, should we make the investment in high rate, long term pumping tests?  Given the cost and potential impacts of these tests, I think that it is critical to do something like DIRECT to assess whether the results are likely to be discriminatory before we collect those data.  That is … model the long term pumping test before you conduct it and see if it actually tests models of concern against other models.

8/11/16 – Audience member, Canberra

Have you looked at the trade off between fewer, local points with greater precision versus greater coverage at lower precision (e.g. geophysics)

We haven’t done this explicitly.  We have looked at the value of airborne EM for constraining hydrogeologic predictions.  Separately, we’ve looked at the advantages of different combinations of local observations.  But, it is a very good idea to think of how to weigh the discriminatory value of an entire geophysical survey, for example.  We’ll have to think about that!

8/11/16 – Ken Lawrie – GA, Canberra

Can you identify keystone processes as an efficient and effective way to build model ensembles?

I really like this concept.  I do think that it is critical to think clearly about possible processes that could be at play and that could be of particular importance to predictions of interest.  This speaks to one thing that we are really emphasizing – spending more time thinking during the multi-model development phase.  I think that the difficulty comes in determining which processes will be most important.  To me, that is the power of building multiple models – checking and improving our insight, especially when we are building more complex models.  The other advantage of actually building out the models is that we can use them to try to find discriminatory data by finding measurements that act as proxies for future outcomes of concern.  Regardless, I think that the idea of identifying keystone processes and using them to build an efficient set of importantly different models is key.

8/11/16 – Ken Lawrie – GA, Canberra

Can you comment on the relative contributions of conceptual models versus numerical models?

I think that both have important roles.  I strongly believe that the most important step of model building is conceptualization.  More specifically, it is the creative act of developing multiple, importantly different models of a system.  This is the only way that we can try to guard against falling in love with our model and becoming blind to other plausible explanations of the data.  But, if we want to try to improve data collection efficiency and efficacy, that is hard to do with conceptual models alone.  To my way of thinking, we need to build predictive (analytical or numerical) models based on multiple conceptual models so that we can identify data that are most likely to be discriminatory.  But, I would be very open to hearing ideas of how to do at least some of this analysis using simpler, even conceptual models!

8/11/16 – Sarah Marshall – GA, Canberra

Have you considered the cost of observations in your selection criteria?  What about model complexity costs?

This is actually two very good questions.  The first has to do with the cost of data.  We are absolutely trying to consider this – looking at how to identify sets of observations that, taken together, are most discriminatory.  When you do this, you need to start considering whether individual measurements contain too much redundant information.  You also need to consider many combinations of observations, which leads to very high computational demands.  But, we are working on it!  The second question relates to the cost of models.  This lies at the heart of some hot debates in hydrogeology!  Many are promoting the exclusive use of simple models – both because they are more conceptually defensible and because their short run times allow for more consideration of competing models.  On the other hand, some processes are controlled by complexities (e.g. the role of continuous fractures in contaminant transport).  In these cases, it doesn’t matter how many simple models you construct, you will miss the most important processes.  I think that the key concept is to build model complexity in stages (see the paper on the blog by NAME that offers one example).  Ideally, we would start with the simplest defensible model that we can propose.  Then we would list possible complexities and their anticipated potential impacts on the predictions of interest.  We would then balance the likely importance with the cost of implementing the complexity to decide the order of addition.  This is clearly a pretty subjective approach – but, I don’t think that is necessarily a bad thing.  In fact, this kind of subjectivity encourages thinking and discussion about how processes interact at our sites.  The key point is that it shouldn’t always stop with qualitative discussions – we should build out some competing models to test and improve our insight.

8/11/16 – David Lescinsky – GA, Canberra

Can this approach be used to build generic models that can serve as a basis for models to address a range of questions?  That is, will there be some advantage in building a suite of models that are midway up the mountain?  Then, could you identify which data are most useful for building the entire suite of models?

Great idea.  In this regard, we should be looking for data that somehow increases the differences among models.  In all honesty, I don’t know how to do that.  Perhaps it lies in finding data that may promote the consideration of less-likely conceptualizations.  But, that is clearly risky as the data are most likely to support the more likely models.  It seems to me that improved model ensemble building lies more in developing tools that encourage us to think creatively and to identify models that are most different from one another in terms of the prediction or predictions of interest.  But, I will follow up with you on the idea of finding data to help with this!

8/11/16 – Eaman Lai – Canberra

Would your approach differ between questions asking for equality (e.g. ‘where will the plume be at this time?”) compared with questions of inequality (e.g. will the plume cross this line by this time?)?

I haven’t thought about this.  It is relatively easy to define subsets of models if we apply clear prediction boundaries.  It is less clear, but not impossible, if we consider a continuum of outcomes.  More importantly, perhaps, is asking stakeholders to place value on the range of outcomes and then asking them to categorize ‘pain points’ in the valuations.  These should really be the basis for ensemble segregation, moreso than model predictions per se.  It may be that we still cannot find ‘good’ and ‘bad’ outcomes.  Then we may have to think about some way to consider data that can discriminate among a spectrum of model groups.  But, I think that this will be harder to communicate to stakeholders in the end.

8/11/16 – Malcolm Sambridge – ANU, Canberra

As a preface, hydrology is not my field – I work in seismology and geophysical inversion.  So, I wonder, within hydrogeology, where do the likelihoods come from?  Essentially, do you use any criteria to determine which models are reasonable or plausible?  Perhaps you are using the term for what we call a misfit measure?

Most of the measures that we use are based on goodness of fit to data and model complexity.  To my way of understanding, we refer to these as likelihoods when we normalize them to sum to one.  But, I would not be at all surprised to find that I am mistaken!  I also know that there are many such measures that we use and there is considerable disagreement over their relative benefits.  In some ways, we are trying to avoid these specific arguments by focusing on more qualitative considerations of likelihood.  That is, we are really interested in finding models that have high importance and some measure of likelihood that is too high for some stakeholders.  Then, we can try to find data that have the possibility of reducing this likelihood until it is acceptably low (or it becomes unacceptably high).  We think that this avoids academic battles about which measure to use and it also opens up some interesting possibilities related to the use of plausibility measures, rather than likelihoods, and it may allow for ‘softer’ measures to be used, such as model behaviors and consistency with first-person observations.  But, I’d be very interested to hear what you think from the perspective of your field!

8/10/16 – audience member, Sydney IAH

In a multimodel framework, is there still room for sensitivity analysis?

Yes.  I think that it can help us to develop measures of model difference, which can make ensemble construction more efficient.  But, to be most useful, we need to find ways to apply them in model space as comfortably as we do in parameter space.  I think that requires that we find ways to think of all model descriptors as continuous functions.

8/10/16 – audience member, Sydney IAH

How do you strike the balance between model complexity and model diversity within a fixed budget?

This really is the key question – both practically and scientifically.  I think that the answer lies in why we model in the first place.  To date, we have come to believe that our task is to build the best representation of the system – to find the minimum of the objective function.  We are proposing that this is a false goal.  We need to describe many plausible models, seen from multiple perspectives, so that we know what we know and what we don’t know.  (Or, at least, what we know that we don’t know!) So, I think that the balance should shift toward conceptualization and that we should take up the slack by building tools that allow us to construct competing models as effortlessly as possible.

8/10/16 – Lucy Marshall, UNSW

You mentioned automated model building platforms, like those that Martyn Clarke is building at UCAR.  We had an interesting experience with these.  Essentially, the structure was too restrictive – we really needed to spend more time thinking about the conceptualization rather than relying on the automated model building.  That is, we needed to consider processes that were not included in our initial model.

I couldn’t agree more.  I think that automated model building tools are a great advance because they can remove the overhead involved in translating different conceptualizations into predictive models.  But, I think that the most important step is to spend more time considering as many plausible explanations of the data that we have rather than trying to identify the right model too early.

8/10/16 – audience member, Sydney IAH

This is one of the best Darcy talks I’ve heard.  OK – my question – How much of my budget should I spend on data and how much on modeling?  In my experience, especially in areas where we have little or no data, the budget runs out on modeling before we collect any data.

In this, I like Cliff Voss’s approach – or the approach of Hank Hejtema (see the blog for relevant papers).  We should start with simple models and only add complexity as we can justify its need. The only difference that I would add is that we should be building multiple models at each stage.  We should examine whether they make the same or importantly different predictions.  It may be that we need to make measurements to differentiate among these simple models first.  Then we can propose the next level of complexity, looking for importantly different models.  But, we should only add more data if we have a motivating reason to do so.  So, I think that desk top studies are not all bad.  But, they need to be focused on uncovering what we don’t know well enough rather than attempting to build a representative model of the system with little or no data.

8/10/16 – audience member, Sydney IAH

To enact what you propose, do we need to train hydrogeologists in logic and advanced statistics?

Yes, and emphatically no.  I think that some training in logic, including decision making, would go a long way for any scientist – especially those that will interact with nonscientists.  But, personally, I think that we have done ourselves a disservice by delving too deeply into statistics.  For me, anyway, it can be hard to make sense of what it means that a prediction has a 5% chance of saying something versus a 20% chance of saying something else.  This is especially true because we like to have everything follow a bell curve, so we seem to have equal probabilities of very high and very low outcomes.  It seems much more meaningful to me to represent categories of plausibility and populate them with discrete descriptions.  Then we can say to a stakeholder that, yes, your concerns are worth considering.  But, for them to come true, all of these specific conditions have to exist.  Then, we can make a case for testing those conditions to try to help to address their concerns.  But, I appreciate the spirit of your statement.  We need to be much more conversant in dealing with uncertainty.  This probably does mean that we have to understand the underlying maths, so that we can simplify the analyses in ways that are appropriate and that we can explain to nonexperts.

8/10/16 – audience member, Sydney IAH

How do you develop a small number of models to deal with uncertainty?

I think that we need to strike a balance.  I do think that there is value in developing what we believe to be the ‘best’ model of a system.  But, I don’t think that we have to obsess over making it perfect (especially given data limits).  This should be balanced by developing multiple models that are designed to examine important outcomes.  (e.g. early breakthrough, or excessive drawdown, or reduced baseflow, as the case may be)  The key is to try to do the best job possible to represent all of these conceptualizations – rather than setting out to disprove some concerns.  Basically, I think that it will help us to have our science taken up by different stakeholders if we can make the case honestly that we are trying to develop a scientific representation of their narrative that can be tested to test their concerns.

8/1/16 – Julieth Galdames – University of Concepcion, Chile

I think that we often misrepresent ecosystem services when we translate them to economic value.  Is there a way to use DIRECT to support the idea that natural systems also have some right to access to water?

I really like this line of thinking.  In fact, I think that it can be a real problem to translate ecological services to monetary values.  I can see the motivation – we think that people don’t understand the economic value of ecosystems, so it can be a way to raise awareness. But, on the other hand, what if the ecosystem service value is low relative to other economic values of water?  Are we willing to say that in those cases it is OK to ignore the ecological impacts of an activity?  If not, then I would prefer to think of ecological impacts as separate.  We can imagine forming a utility curve for the environment that has a completely different y-axis, one that just represents preference.  We will still have low probability, high risk outcomes that may drive stakeholders’ decisions.  Using DIRECT, we can relate these to models and then look for discriminatory data to give us a better understanding of the likely risks of these bad outcomes.  This can give citizens a more accurate basis for considering these risks, independent of the dollar value assigned to them – simply based on their preferences in the environmental realm.

8/1/16 – Maria Alejandra Leit –University of Concepcion, Chile

How can we deal with extreme events better?

I know that there are several researchers who are trying to develop more reliable statistical descriptions that capture low probability events more completely.  I think that where we come in is thinking about how to use these descriptions.  Basically, the goal is to propose models that are plausible, but improbable, so that – if they are very important – we can test their actual likelihood in our system.  Part of this is external to the system.  We need help in defining extreme weather events that may impact our system.  But, at the same time, we need to do a better job of defining the poorly characterized aspects of our system that will control the response of the system to these extreme external events.  This is where we are focused with DIRECT.  We want to encourage people to develop alternative, plausible descriptions of a system that may amplify (or at least no diminish) the effects of extreme events.  Once we have them described, as models, we can look for something that we could measure under non-extreme events to determine if these important models are more likely than we may think given our current data.  So, to summarize, I think that DIRECT is a good addition to the treatment of extreme events, but it requires input to know which extreme conditions to consider.

8/1/16 – Bhekumuzi Sifuba – University of Concepcion, Chile

What would we like to learn from social scientist to improve the use of DIRECT.

If I could sit down with a social scientist and pick their brain, I would want to know two things.  First, do you have ways to uncover what the low probability, high risk concerns are that are driving people’s decisions?  Second, we are proposing that we can increase scientific uptake by stakeholders by making good-faith efforts to build models that support their concerns.  But, the key is that we have to find a way to convince stakeholders to allow these models to act as proxies for the stories in their heads.  I would really like to know if there is research into how to actually do this!  Does it really work?  What can we, as scientists, do to make this more effective?

8/1/16 – Julieth Galdames – University of Concepcion, ChileI think that we often misrepresent ecosystem services when we translate them to economic value.  Is there a way to use DIRECT to support the idea that natural systems also have some right to access to water?

I really like this line of thinking.  In fact, I think that it can be a real problem to translate ecological services to monetary values.  I can see the motivation – we think that people don’t understand the economic value of ecosystems, so it can be a way to raise awareness. But, on the other hand, what if the ecosystem service value is low relative to other economic values of water?  Are we willing to say that in those cases it is OK to ignore the ecological impacts of an activity?  If not, then I would prefer to think of ecological impacts as separate.  We can imagine forming a utility curve for the environment that has a completely different y-axis, one that just represents preference.  We will still have low probability, high risk outcomes that may drive stakeholders’ decisions.  Using DIRECT, we can relate these to models and then look for discriminatory data to give us a better understanding of the likely risks of these bad outcomes.  This can give citizens a more accurate basis for considering these risks, independent of the dollar value assigned to them – simply based on their preferences in the environmental realm.

8/1/16 – Jose Luis Arumi – University of Concepcion, Chile

When you think of how we do hydrogeology, we build a model and calibrate it, then we collect more data.  Often, the new data doesn’t fit our model.  So, we reconceptualize and build a new model.  It seems that you are suggesting that we do the reconceptualization first.  But, how do we know what to consider without new data to push us off of our model?  Put another way, whenever we collect new data we are ‘pushed’ to reconceptualize our system.

This is a fantastic point.  You are, of course, correct.  We have it within ourselves to come up with new conceptual models – we do it all the time.  What is hard, what we need to learn how to do, is to be creative earlier in the process.  I can think of two things that we can do.  First, we can think of different outcomes of interest and then try to think of how we could build models that might come up with these outcomes – that could be a way of finding low probability, high risk models.  Second, we can play critic to our models.  Pretend that we are competitors who want to take over the project, or lawyers who want to undermine our testimony.  If we can learn to fall out of love with our models, I think we should be the most effective critics.  As to the last point – what is interesting to me is to ask, why do we have to wait to be pushed?  Why can’t we propose multiple conpeting conceptualizations earlier in the process?

8/1/16 – Alejandra Steh – University of Concepcion, ChileHow can we build the sets of models that you describe?

This is the hardest thing to imagine, I think.  But, we have some options.  The simplest, if we are already doing automated parameter estimation, is to make use of many of the parameter realizations.  We definitely shouldn’t throw them away, and I think that we are underutilizing them if we just use them to define statistical outcomes.  The next step is to explore multiple objective functions, much like they do for the GLUE methodology.  Essentially, our choice of objective function is often quite arbitrary, so we may be able to generate more plausible models by exploring multiple objective functions.  Then we move to treating geology as a continuum, rather than one discrete description, maybe using TPROGS for this.  Similarly, we need to think of boundary conditions as a continuum, rather than discrete scenarios.  All of these things can be done by modifying the tools that we have constructed for parameter estimation so that they: 1) search model space, not just parameter space; and 2) they look for multiple plausible models rather than focusing on finding one best model.  The hard part, the part that I think will always require human ingenuity, is to develop multiple conceptualizations of a problem.  Personally, I think that this is where modelers should be spending their time because it is the most likely to capture less obvious descriptions that may turn out to be correct!

7/28/16 – Francisco Meza– Universidad Catolica, Chile

You suggested that we spend more time on conceptualization and less on calibration.  But, if you do that, how can you weight your models?  How can you decide on which model to use?

Practically, we are somewhat stuck with the use of data to determine the relative likelihood of models.  Goodness of fit, perhaps added to some penalty for model complexity, is the most objective measure that we have.  (As a side note, the goodness of fit is often based on a highly arbitrary objective function.  Discussions of the GLUE method deal with this.) But, we can use this in two ways.  The simplest is to apply some threshold of likelihood – essentially saying that we want to examine whether we can reduce the relative likelihood of some subset of models against other models.  The results will make us more or less confident that the models of concern may be true.  But, we may not try to make quantitative use of the model likelihoods – say for a full-blown risk analysis.  The other approach assumes that the likelihoods that we calculate based on our ensemble, including multiple realizations of each model to cover parameter space, give a real measure of the likelihoods of the associated model predictions.  I am increasingly less comfortable with this idea – basically, because I think that there are too many differences between our PDF of predicted outcomes and the PDF of outcomes that we would really want to use to make decisions.  In particular, our PDF often reflects our uncertainty about the physical system, whereas the true PDF of possible futures is based on the actual (unknown) physical system.  However, if we decide that our PDF is the best that we will be able to produce (which it probably is), then we can use the model likelihoods based on goodness of fit to the data in a quantitative sense.

7/28/16 – Saskia Roels – Universidad de Chile

Wouldn´t the order of constructing several models first (before data collection) bias our data collection approach towards a preferable scenario?

We would need to guard against this possibility intentionally.  That is, if we are not careful to construct a diverse model ensemble, then we will end up selecting data that discriminate among models that are, essentially, the same.  Interestingly, I think that this is what we do unconsciously in our current state of practice!  That is, when we go out to the field to collect data, I think that we do have a model in mind, even if it is only a general, conceptual model.  Guided by this model, I think that we look for times and places to collect data that are likely to ‘show something’.  But, if we only have one model in mind, we can only be seeking data that would CONFIRM that model!  The only way to identify data that are likely to be both discriminatory and useful is to base the choice of data on identifying measurable things that could test models of concern AGAINST other models.  That is what we are trying to promote.

7/28/16 – audience member – Universidad Catolica, Chile

Have you tried to validate the approaches that you are proposing?

I can only think of two ways to test the approach.  One is through synthetic studies, when we know the right answer and can determine how well we can define it given different data sets.  We have shown that DIRECT works very well for these conditions; but, they are limited.  One major limitation is that we usually (implicitly or explicitly) assume that we have included the right model (although not with the right parameters) in our ensemble.  We have more limited experience with data downsampling studies.  But, I am very keen to try these.  For one experiment that we examined, we showed that you could infer hydraulic function of the vadose zone with 1/100th of the data that was collected using standard approaches.  That is, rather than monitoring at regularly spaced depths and times, you can identify the depths (and to some degree the times) that would be most useful for monitoring before you ran the experiment.  This was based on not knowing which model was correct and only knowing ranges of parameter values.  But, one of my goals for next year is to find someone to work with on a data downsampling experiment.

7/28/16 – audience member – University of Chile

The models that you present are physically based and have limited data.  Would your approach work for data-driven models, something like neural networks?

Interestingly, I think that the multi-model approach that underlies DIRECT is used widely in big data applications, such as in banks looking to identify good credit risks.  So, in some ways, these approaches can replace physical models with data driven models.  What is more interesting to me is that in some of these cases they will come up with many plausible models – but, they still need to craft a story that ‘makes sense’ of the model.  Human managers still need that to have a context (a story) that can make sense of the model and the data.  Finally, I thik that the importance weighting that we propose in DIRECT is perfectly applicable to other model types.  Still, if we end up with multiple ANNs that can predict the data that we have, we can prioritize testing those models that make differently important predictions.

7/28/16 – Saskia Roels – Universidad Catolica, Chile

Isn’t our usual approach to propose a model and then use parameter estimation to fit the data to the model?

I think that is our usual approach.  But, I think that this really opens ourselves up to, maybe even encourages, confirmation bias.  I think that the only way to avoid this is to ensure that we have other models to act as counter hypotheses.  Then, we can look for data that could discount one model, while supporting another.  Otherwise, in a way, it is too difficult to find data that are likely to discount any model or subset of our models.  I think that we can go a step further, too.  That is, we look at our ensemble of models and decide on a subset that have particularly important predictions that are different than the predictions of the other models.  Then we choose to test those models (and their associated assumptions and predictions) because they are, in some sense, more important.

7/28/16 – Francisco Meza– Universidad Catolica, Chile

Wouldn´t the order of constructing several models first (before data collection) bias our data collection approach towards a preferable scenario?

Practically, we are somewhat stuck with the use of data to determine the relative likelihood of models.  Goodness of fit, perhaps added to some penalty for model complexity, is the most objective measure that we have.  (As a side note, the goodness of fit is often based on a highly arbitrary objective function.  Discussions of the GLUE method deal with this.) But, we can use this in two ways.  The simplest is to apply some threshold of likelihood – essentially saying that we want to examine whether we can reduce the relative likelihood of some subset of models against other models.  The results will make us more or less confident that the models of concern may be true.  But, we may not try to make quantitative use of the model likelihoods – say for a full-blown risk analysis.  The other approach assumes that the likelihoods that we calculate based on our ensemble, including multiple realizations of each model to cover parameter space, give a real measure of the likelihoods of the associated model predictions.  I am increasingly less comfortable with this idea – basically, because I think that there are too many differences between our PDF of predicted outcomes and the PDF of outcomes that we would really want to use to make decisions.  In particular, our PDF often reflects our uncertainty about the physical system, whereas the true PDF of possible futures is based on the actual (unknown) physical system.  However, if we decide that our PDF is the best that we will be able to produce (which it probably is), then we can use the model likelihoods based on goodness of fit to the data in a quantitative sense.

7/28/16 – audience member – Universidad Catolica, Chile

Have you tried to validate the approaches that you are proposing?

I can only think of two ways to test the approach.  One is through synthetic studies, when we know the right answer and can determine how well we can define it given different data sets.  We have shown that DIRECT works very well for these conditions; but, they are limited.  One major limitation is that we usually (implicitly or explicitly) assume that we have included the right model (although not with the right parameters) in our ensemble.  We have more limited experience with data downsampling studies.  But, I am very keen to try these.  For one experiment that we examined, we showed that you could infer hydraulic function of the vadose zone with 1/100th of the data that was collected using standard approaches.  That is, rather than monitoring at regularly spaced depths and times, you can identify the depths (and to some degree the times) that would be most useful for monitoring before you ran the experiment.  This was based on not knowing which model was correct and only knowing ranges of parameter values.  But, one of my goals for next year is to find someone to work with on a data downsampling experiment.

7/28/16 – Felipe Fuentes – SERNAGEOMIN, ChileIt could be seen that model building and data collection are sequential in time.  Often, when an agency puts out a call, it is often seen as fixed – you have the data, now build a model and answer our questions.  Could you make an argument that spending more time on model building in advance instead of collecting data could be better?

I agree with your idea, with a couple of small qualifications.  I really believe that the tendency that we have to want to collect data at the onset of a project, then build a model, then support decisions is inefficient.  This is especially true in hydrogeology because our data can be very expensive to collect and/or analyze.  So, I think that we should absolutely spend more time in modeling up front – especially in conceptualizing alternative possibilities for our system.  I am confident that this will lead to more efficient, more targeted data collection.  But, I don’t think that it necessarily means that we will spend more money on modeling and less on model calibration after data collection.  Rather, we would just change the timing of when we spend our modeling budget, with more spent early in the project and less spent on model calibration.  But, all of this requires that we have at least enough data to support model conceptualization.  I would not recommend building lots of uninformed models.  We also have to recognize that there are often practical costs associated with measurement network design.  For instance, mobilization costs may eliminate the possibility of repeated data collection / model / data collection steps.  Similarly, if we want time series data, we can limit their usefulness if we spend too much time modeling before collecting data.  At this point, I am just pushing for a better balance between pre-screening of data through multi-model analysis as a way to improve our practice, from data collection through decision support.

7/28/16 – audience member – Universidad Catolica, Chile

Have you tried to validate the approaches that you are proposing?

I can only think of two ways to test the approach.  One is through synthetic studies, when we know the right answer and can determine how well we can define it given different data sets.  We have shown that DIRECT works very well for these conditions; but, they are limited.  One major limitation is that we usually (implicitly or explicitly) assume that we have included the right model (although not with the right parameters) in our ensemble.  We have more limited experience with data downsampling studies.  But, I am very keen to try these.  For one experiment that we examined, we showed that you could infer hydraulic function of the vadose zone with 1/100th of the data that was collected using standard approaches.  That is, rather than monitoring at regularly spaced depths and times, you can identify the depths (and to some degree the times) that would be most useful for monitoring before you ran the experiment.  This was based on not knowing which model was correct and only knowing ranges of parameter values.  But, one of my goals for next year is to find someone to work with on a data downsampling experiment.

7/25/16 – David Evans – Flosolutions, Lima, Peru

I enjoyed your talk – I think it is breaking new ground in the field.  But, it wasn’t completely clear to me how you assign model likelihoods.  Could you clarify?  As presented, it seems that the method is quite reliant on the robustness of the data set, which isn’t always possible in complex settings.

These are both very valid points.  First, I need to make sure not to skip over the definition of likelihood.  Of course, as with anything, there are many ways to assign model likelihoods.  The simplest is what we do somewhat naturally – the model with the lowest RMSE compared to the data is the best.  We just put numbers on it.  Essentially, if you calculate 1/SE for each model (square error, perhaps weighting observations by their uncertainty, if you know that) then the higher this number is, the worse the model.  To convert these to likelihoods (which must add to one), you just divide by the sum of the 1/SE values over all of the models.  Some other approaches penalize models for complexity (number of parameters), but the gist is the same.  As to the second point, I agree … but, I think it is kind of the point of DIRECT.  If we follow some procedure for model calibration based on goodness of fit to the data, then we are living with the limits of our data.  The idea of DIRECT is to admit this and to try to make use of it.  Basically, with little (or low quality) data, we should be able to fit many models to our data acceptably well.  That just defines our uncertainty given the complexity of the system, and the limits of our data.  With DIRECT, we suggest that you consider the implications of all of those acceptable models.  If some have ‘worse’ outcomes, then focus your data collection on testing them against all other models.  The way to do this, is to find measurement times/locations/types that your ‘bad’ models predict you will measure something different than the ‘other’ models.  These are discriminatory data.  So, yes, you are limited by your lack of data.  But, rather than just accept (or ignore) that, we think you can try to do something about it!

7/21/16 – Micha Silver – Ben Gurion University of the Negev, Israel

When dealing with sparse data, I wonder if some data affect interpolation schemes more than others, making them discriminatory.

I am not sure about this one.  I am sure that interpolation schemes can be leveraged strongly when data are sparse.  But, I don’t know them well enough to know how you could predict the influence of data to select discriminatory data before collection.  It does raise the interesting question of trying different interpolation schemes as a version of multi-model analysis, though!

7/21/16 – Micha Silver – Ben Gurion University of the Negev, Israel

Is there a different term for discriminatory data that none of the models predict?  (It turns out that this wasn’t, exactly, Micha’s question.  But, I liked my misinterpretation!)

I have been thinking of discriminatory data as, specifically, data that is predicted to be different among the models.  In particular, it is predicted to discriminate between models of concern and all other models.  My feeling is that these are the only data that you can PLAN to collect.  Of course, some of the most important observations are those that cause a major shift in our thinking – essentially those that are not predicted by any of the models.  I think that this can be viewed in the evolutionary model development context that I proposed.  That is, as described by Gould, the development of a model ensemble should follow punctuated equilibrium.  Most data (that which I describe as discriminatory) will lead to shifts in the relative likelihoods of models in the ensemble.  The data that you describe would lead to ‘evolutionary jumps’.  I still believe that the best way to be prepared for these jumps is to propose and maintain as diverse an ensemble as possible.  But, we cannot guard against Bredehoeft’s ‘surprises’ that may catch us off guard as we learn more about hydrogeology or about a specific site.  Essentially, DIRECT is designed to help us to avoid mistakes that we should have known better than to make.  But, nothing can really guard us against our fundamental lack of understanding!

7/21/16 – Naftali Lazarovitch – Ben Gurion University of the Negev, Israel

Can you use DIRECT for experimental design?

I have to say that I am surprised that I have never considered this.  But, I think that there is no reason that you cannot.  Basically, instead of seeking observations that are most different, you could search applied conditions that (together with the right measurements) would lead to the largest discriminatory index.  Great idea!

7/20/16 – Alex Furman – Technion, Israel Institute of Technology

It seems that you would want to build your model ensemble in a way that it explores model diversity.  Couldn’t that be achieved by using something like a genetic algorithm to explore both continuous and discontinuous variables representing model structure, boundary conditions, etc?

Absolutely!  I think that this is the future of modeling, if we want to do something like DIRECT.  Perhaps drawing on the work of NAME at NCAR, we need to have modular tools to construct models.  This will allow us to explore model space both more completely and more efficiently.  The idea of using something like a GA is a very good one, because unlike many parameter estimation tools, we would not be as interested in finding the best model, but rather in exploring the full range of plausible models.

7/21/16 – audience member – Ben Gurion University, Israel

Have you tested DIRECT on problems that have large data sets that have already been collected?

In a limited way, yes.  We analyzed one very rich data set related to infiltration, measured under field conditions, and showed that we could have demonstrated (before the experiment) that over 90% of the data would be either redundant or not useful for answering the question posed.  But, we are actively looking for other opportunities to test DIRECT ‘in retrospect’!

7/12/16 – Audience member – University of the Western Cape, South Africa

How can we decide if our model is good enough?

Easy … we’re out of money!  But, really, I think that we can offer another approach.  Once we have our ensemble, the question should be – do our plausible models make importantly different predictions related to predictions of concern?  If not, then it is unlikely that spending more money to calibrate the models further will be worthwhile.  If so, then we can justify further improvement in the sense that we would want to try to increase or decrease the likelihoods of models of concern by collecting additional data.

7/12/16 – Yongxin Xu – University of the Western Cape, South Africa

How long will it take for something like this to become part of the practice?

It can be used, in a relatively simple form, now.  But, it will likely take a while to develop model building tools that will really allow us to explore model space appropriately.I think that it will take considerably more time for the culture of model building to change!

7/12/16 Kes Murray – University of the Western Cape, South Africa

Do you think that the methods that you described may be used for fracking studies in South Africa. I’m not sure how familiar you are with the “fracking situation” in South Africa (it has become a rather touchy subject, especially where local communities are concerned, which has allowed for a lot of emotional bias to creep in…), but essentially the situation is quite complex due to the extensive network of dolerite dykes which intrude and extrude through the Karoo shales. What I was thinking about was whether your process of multiple models could be applied in this situation, particularly to the concern of “will the deep residual fracking fluid rise into ‘shallower’ (typically anything from 20mbgl to about 350mbgl) groundwater aquifers”. I’m not sure whether it would have to be site specific (probably the most scientifically accurate) or whether a general approach could be taken, however either way it seems like it could be a fascinating study! Any thoughts?

The short answer is, yes … I think that multimodel analysis is critical for complex issues like fracking.  (It seems that all places with fracking and any freedom of communication face the same issues.  Locals at once may see immediate benefits; but, they are certain to pay the long term costs!). The difficulty can be that if the system is TOO complex and our data are TOO limited, we can feel that we cannot say anything scientifically.  Personally, I think that it is better to admit this than to pretend that our one best (but poor) model is reliable.  But, we need to find a balance that keeps science as part of the conversation – both to inject objectivity, often to find solutions, and always to ensure that uncertainties are properly voiced.  What I am coming to believe is that we can often do the most good by trying our best to formulate scientifically defensible models that COULD predict bad outcomes.  If we can’t do this, no matter how hard we try, then we can say that the science doesn’t predict damage (to the best of our knowledge).  If we CAN identify such models, then we have something to test.  Not only that, the stakeholders should see that we are working hard to try to help them and that we are actually listening to their concerns.  Maybe it is a bit optimistic … but, I think it represents the very best that science can be!

7/12/16 – Candice Lasher-Scheepers – University of the Western Cape

In South Africa, like many places, the major limitation on our model-based analyses is financial.  It can be a challenge to fund one model, is it feasible to suggest multimodel analysis?

I think that this is the first critical part of ‘listening to the question’!  If the problem is simple enough, or if the cost of the consequences are low enough, it may not even warrant one model.  In other cases, a single, simple model may suffice – maybe even to just visualize the problem.  But, in many cases, if one well calibrated model is justified, I would say it is hard NOT to justify multiple models – even if they are not as well calibrated.  One way to explain this to clients is this – we can build our best model based on the data that we have now.  But, when we collect more data, that model will likely change.  If we commit to one model completely, now, it will be more expensive to build a new model. On the other hand, if we do a good job of proposing multiple plausible models now, we can continually refine each model slightly at lower cost as we collect data.  In the long term, this will give you more and better information throughout the project, lowering your risk, and it will save money.  In the future, I hope that we will build tools that will help us to build ‘trees’ of related models.  Essentially, we would branch whenever we face a tough decision during model construction – naturally leading to many models that differ in explicable ways.  For now, though, we have to rely on the insight of hydrologists, maybe teams of hydrologists, to propose competing models.  This requires more complete training so that modellers are not ‘just modellers’ and everyone feels comfortable discussing models.  But, I think it will lead to a much more robust and impactful practice of hydrogeology!

Candice replied: In the future, I hope that we will build tools that will help us to build ‘trees’ of related models”. I like this, please keep us posted on any progress made in this regard. Modelling is such an amazing and powerful tool, but costs are just too high. Although initial costs are high, it could potentially reduce costs on future groundwater issues occurring in a Municipality. In SA we need to start by changing the mindsets of our municipal managers. I hope we can get to this point really fast.

7/12/16 – Audience member (professor) – University of the Western Cape, South Africa

If I have built an ensemble of models based on previously experienced events, how can I use my ensemble to assess the risks of extreme events?

In some ways, we can’t.  But, on the other hand, we will almost certainly be better off if we have developed multiple models than if we have only one.  Unless the extreme event really overthrows our understanding of the underlying processes or the system response, the hope is that we have developed a wide enough range of models to include important responses to extreme events.  Then, if, for instance, we are interested in determining whether plans that we have are robust, we can start by determining whether any of our disparate models say that they aren’t.  If all of them say that the are robust, we can gain a lot of confidence.  If some of the models indicate concern, they need to be tested against the other models.  DIRECT is designed to help us to find measurements that could be made under less-than-extreme conditions that can discriminate these model groups based on their projected responses to extreme conditions.

7/6/16 – Audience member – University of Pretoria, South Africa

Is your goal to formulate a statistical representation of risk?

In general, no; we try to test discrete models that represent stakeholders’ concerns.  (In response to a follow up.)  But, we could (and probably should) represent the plume that we show in the transport example as probabilities of contamination reaching a different location.  The only limitation to that is that this would require that our ensemble be a faithful representation of the probability of an outcome.  In general, I don’t think that we have enough understanding of the uncertainties underlying a system to actually make predictions with quantitative uncertainties.  In other words, I think that we often confuse OUR uncertainty with THE uncertainty about a future outcome.  They are very different things!

7/6/16 – Matthys Dippenaar – University of Pretoria, South Africa

Practically, how do you run so many models?

We write shell codes, usually in MATLAB, to define the model structure and parameter values, run it, and harvest results.  For hydrologic models, this is often relatively straightforward.  The difficult thing becomes figuring out how to decide what to vary and how to turn it into continuous variables that can be explored efficiently.

7/4/16 – Audience member – Orleans University, France

What is the role of model benchmarking?

I may not understand what you mean by benchmarking.  But, if it is describing the idea of running many models to eliminate outliers, then I would say that it is the opposite of what we should be doing.  That is, we should not assume that models should agree in order to be useful.  Rather, we should be looking specifically for models that disagree, especially regarding important predictions.  Then we can use the model ensemble to identify measurable proxies to discriminate among those ‘models of concern’ and the rest of the ensemble.

7/4/16 – Audience member – Orleans University, France

In the end, how do we decide on which is the best model?

I think that the biggest change that we are suggesting is that we should never come up with THE best model.  Rather, I think that we should embrace the idea that our goal is to develop an ensemble of competing models.  The key is how to form these models, how to use them most effectively, and how to communicate the scientific findings in the context of multiple models.  That is what we are starting to explore.

7/4/16 – Audience member – Orleans University, France

Can you only use models?  What about expert opinion that cannot be encapsulated in a hydrogeologic model?

This is really difficult to answer.  Essentially, it is hard for me to picture what that expertise would be.  Or, maybe a better way to say it is that we ultimately need to develop a model to test something scientifically.  That is, we can only test a hypothesis based on its ability to make testable predictions.  But, maybe the best way to think of it is that our job, as scientists, is to try to translate this general knowledge into an associated model.  That is, can we come up with a plausible, scientific hypothesis that supports your non-expert insight and that can then be tested as a proxy for your insight?

7/4/16 – Audience member – Orleans University, France

Don’t we have to be concerned about the influence of many bad models being included in the ensemble?

I think that this is a real problem if we want to develop quantitative probabilities of predicted outcomes based on a model ensemble.  Then, including a lot of models with poor fits (unless we use some threshold to exclude models) will reduce the likelihood of more representative models. What we are suggesting is a balance between model likelihood and model importance.  Especially if a group already has a model (or its associated outcome) in mind, it is important to test that model against all other models to address their concerns.

6/23/16 – Jesus Carrera – Barcelona, Spain

Does the utility function describe the actual risk or the perceived risk, and does it make a difference which one?

To my way of thinking, it only addresses the perceived risk.  In fact, I would go further and say that the real risk is almost never really known.  Perhaps for some systems, if we really can explore model space well enough, we could come up with quantitative prediction uncertainties that could be used for decision support.  But, I don’t have much faith in that for most real problems.  Rather, I think that the best that we can do is to ask someone what concerns are really driving their decisions.  Then we can do our best to come up with plausible models that would lead to those outcomes of concern.  Once we have these, then we have something that is testable that can act as a proxy for their more general concerns.  Thinking about it, this is probably related to the idea of ‘nudging’ as described by Thaler and others.  Essentially, rather than seeing our role as scientists to come up with THE answer about how a system functions, we can act as objective investigators of whether some hypotheses of concern are plausible or if they can be discounted.

6/23/16 – Audience Member – Barcelona, Spain

Which risk distribution should be used to make decisions?  Might this change with time?

In my opinion, this is not our place to say.  Maybe that is a bit of a cop out … but, I think that what we should do is to listen to the decision maker (perhaps with the help of social scientists) to understand which of our predictions are most important for decision support.  Once we know this, we can focus our work to try to answer these aspects of the hydrogeologic question.  I have no doubt that the priorities of the decision makers will change in time.  This does open interesting questions about how to design data collection using DIRECT for multiple decisions.  We are working on this – and so far it seems, as you might expect, is that some outcomes can be supported by the same data, while others require almost independent data collection schemes.

6/23/16 – Audience Member – Barcelona, Spain

How can we decide how to represent each component of a model when all of them have so many possible descriptions?

That is actually the opposite form of a question that I have been asked before – which is, how can I possibly form more than one model?  I like your question because I think it points to an important thing that we need to develop as part of our model building culture.  Essentially, your question boils down to, ‘How can we figure out what the important aspects of a model will be?’  The short answer is … I don’t know.  But, perhaps a more useful answer is that I think that the best approach must be based on finding ‘importantly different’ models.  That is, if we can explore one aspect of a model and show that it does not have a significant impact on the prediction(s) of interest, then we can decide not to explore that aspect of the model.  For now, I think that we only need to develop many models because we are not very smart about finding models that are importantly different than each other.  So, instead, we develop many models and throw most of them away.

6/21/16 – Guy Vasseur – UPMC, Montpellier, France.

The talk does not address inversion, is there a reason that it isn’t included?

I think that the problem with inversion is that it has become too focused on finding the global minimum in the objective function.  There are competing methods that go to the other extreme, saying that there are many equifinal models.  But, we have not made full use of the ideas that have underlain inverse methods to find many models that are measurably different than each other, but that are all plausible.

6/21/16 – Olivier Barreteau –Montpellier, France.How can you implement adaptive management if decisions must be made at one time, early in the process?

You are right to question this … I didn’t describe it well.  What I should say is that there are types of adaptive management that could be designed with DIRECT.  For example, we may propose an initial treatment that has some chance of success, but some models predict that it will fail.  We can use the ensemble to identify what we should measure to best test the effectiveness of a treatment.  In addition, this could point to alternative possible descriptions and secondary treatments that should be ready if we observe this failure.

6/21/16 – Philippe Pezard – CNRS, Geosciences Montpellier, France.

In general, the best mathematicians in Earth sciences are in seismics, with rich data sets, and in hydrogeology, where pressure data are somewhat the opposite, scalar and integrative the penetrated geological system. Therefore, hydrogeology might be viewed as a difficult exercise aiming at reconstructing information from this scalar and integrative dataset, equivalent to asking many people with thick glasses to give an average description of what you look like if they get to see from a different angle.

This is an interesting way to view hydrogeology.  I think that you are right, to a degree.  But, I think that this is much better for making general descriptions than for dealing with specific questions.  For your example, what if the goal was to determine if I was one of 20 known criminals. Is the average picture (or the answer to the diffuse question) good enough or even useful ?  (Response – or, maybe we need better glasses!)  Absolutely!  But, I think that the first step toward this is figuring out what we would LIKE to be able to measure and then to look for approaches to provide these data.

6/21/16 – Rutger de Wit – University of Montpellier, France.

When I interact with a hydrogeologist, I often need simple models that I can use to drive my more complex biogeochemical models.  How would this work for your DIRECT approach?

I think it is an ideal example of how DIRECT could be used for purely scientific studies.  As I imagine it, you would say to a hydrologist that you are interested in predictions of flow into a lagoon, for example.  You could conduct a sensitivity analysis with your models to determine if there are critical flow thresholds that are of particular importance to your system.  This would be your utility function – defining which flow levels are most important to predict accurately.  With the specific flows identified, and the ranges of definition that are most important, a hydrologist can use DIRECT to determine which proposed measurements are most likely to discriminate among models for making these specific predictions.  Whether this requires simpler or more complex models depends on the system – which processes are important – and your knowledge of the system – what can you actually support with what you do or do not know?

Rutger replied – Yes, indeed, the dynamics of water flows from catchments into coastal systems is key. Exchange flows between the coastal systems and the adjacent sea are also very important because these flows both tend to salinize and dilute the water in the coastal system. Sometimes, with luck, you can use salinity as a conservative tracer and, in that case, often do not need very detailed measurements of the exchange flows (which is sometimes difficult as the tidal flows often do not fully mix with the coastal water as e.g. the piston effect).  Finally, I would highlight that for biogeochemical processes it is crucially important to consider the material (both dissolved and suspended matter) that is carried into the coastal systems by the water lows. As a result, we crucially need information on the dynamics of these concentrations in the water flows and the information on water flows itself is not sufficient

6/21/16 – audience member – UPMC, Montpellier, France.

To compare models, do they have to have the same parameter space?

For me, this is one of the potential advantages of not separating ‘models’ and ‘realizations’. In the end, what we care about is the combination of processes, structures, and parameters that could end up with bad outcomes.  Then, once we have them defined, we can test them as separate story lines.  In this sense, it doesn’t matter as much if two story lines differ in their structure or their parameters – we are interested in testing the combinations of these elements against reality by finding discriminatory data.

6/21/16 – Yvan Caballero, y.caballero@brgm.fr ? – BRGM, Montpellier, France.

How can you use DIRECT with lumped models?

This is something that I haven’t quite figured out.  It seems to me, as someone who hasn’t done a lot of work with these models, that

6/21/16 – Philippe Pezard – CNRS, Geosciences Montpellier, France.

Democracy can be seen as the opposite of science.  (Clarified to say that science operates to a certain point by consensus, and large advances in science often comes from an individual – or a group of individuals – contradicting elements of this consensus.)

Wow, what a concept!  I think that you are right again, and I take your point.  Much of my talk focuses on identifying ways that we can do science to answer specific questions (or, better, aspects of questions) and I also suggest that we can find consensus in the science that should be done.  But, I think that this is different than defining scientific truths by consensus.  To some degree, of course, this is what we do because we are social animals who find comfort in communal support.  But, the mark of science is that it is true regardless of what we believe!  If I could restate what I am proposing it is this – almost every problem that we address has vast and numerous uncertainties … we make choices (conscious or otherwise) regarding which aspects of the uncertainty that we will address.  I think that it is a fair merger of societal need and scientific rigor to allow society to help to identify the important areas of uncertainty and for science to apply its best, most objective tools to addressing those uncertainties.  I would even go so far as to say that we SHOULD engage members of the public to help to define scientific questions as well as counter-hypotheses of concern when developing our scientific objectives.

6/17/2016 – Alexandre Pryet – ENSEGID, Bordeaux, FranceIs there some danger in proposing models with low likelihood because it could introduce something as possible even if it really isn’t?  Further, I think that this happens both when decision makers have loss aversion or when they are too prone to make overly positive decisions (for example, politicians).

This is also related to the actual meaning of the likelihood measures that we have.  I agree that we can mislead stakeholders if our ensemble is constructed poorly such that it overstates the probability of a ‘bad’ model.  (Or, for some decision makers, a ‘good’ model.)  Especially given that people do a relatively poor job with very small probabilities, we may need to think carefully (and with professional judgement) regarding what level of likelihoods can be considered to be meaningful.  Regardless of this, it does not seem appropriate that we, as scientists, should bias the information (or models) that we show the decision makers.  We should inform them, but the decision has to lie with the person that takes the responsibility for the decision that is made.

6/17/2016 – Olivier Atteia – ENSEGID, Bordeaux, France

How do we choose which models to generate?

I think that this is the most challenging question that we face.  We need to develop approaches to develop DIFFERENT models.  The question is how to do this in a way that keeps a lot of model diversity but that avoids making a lot of models that are essentially equivalent, or: having several conceptually different models.  The major constrain is that this shall be done by keeping all models statistically representative, leading to a parallel development of several models that all must be well calibrated. On the other hand, if we want our models to be statistically representative, then we have different restrictions on which models should be in our ensemble.

6/17/2016 – Alexandre Pryet – ENSEGID, Bordeaux, France

One problem with the DIRECT approach is that it relies on decision makers to make good choices based on a range of possible models with likelihoods.  Isn’t there a danger that by presenting low likelihood models some decision makers will choose whichever model fits their interests, regardless of model likelihood? Might we be better off only showing models that are highly likely, or, in some cases, that promote a common good as an outcome?

I think that this is a tempting idea, but one that we have to try to avoid.  Ultimately, the only thing that we, as scientists, have to offer that is different and valuable is that we are objective.  In fact, I think that one of the goals of DIRECT is to make us MORE objective – to force ourselves to challenge our favorite theories or models.  I do think that it is fine to ask non-scientists (actually, to demand that all interested and affected parties) help to define the crucial uncertainties to address.  But, then I think that it is our responsibility to communicate all that we find as honestly and as a comprehensibly as possible.

6/14/16 – Ben Abbott – University of Rennes

Is there an inherent value in being quantitative or trying to be objective when all our questions and tools are infused with subjectivity? Is there a risk in creating a sense of objectivity that could lead to overconfidence or blindness about the limits of the methods? That’s not to say that all methods are equally good, but are quantitativeness and objectivity useful ends in themselves?

Is it possible that likelihood weighting, based on past performance, is not appropriate if you are aiming to address stakeholder concerns that may not conform to the maximum likelihood prediction?  That is – just because a model (or an expert) has performed well in predicting relatively common outcomes (e.g. those things that are often used for calibration) it does not mean that they will be better qualified to predict lower probability but higher cost outcomes.

6/14/16 – Ben Abbott – University of Rennes

Is it possible that likelihood weighting, based on past performance, is not appropriate if you are aiming to address stakeholder concerns that may not conform to the maximum likelihood prediction?  That is – just because a model (or an expert) has performed well in predicting relatively common outcomes (e.g. those things that are often used for calibration) it does not mean that they will be better qualified to predict lower probability but higher cost outcomes. How do you weight when there isn’t any appropriate baseline or reasonable equivalent dataset to evaluate against?

I’m torn on this one.  On the one hand, I think that the role of science is to provide objectivity.  So, some objective measure of performance (as a combination of ability to match past and future, discriminatory data) is necessary.  But, I do see your point.  I think that the way that DIRECT deals with this is by projecting models out to future time horizons.  Then we identify models that have concerning outcomes for stakeholder groups.  Then we try to CHANGE the likelihoods of these models.  I’m not sure that the actual value has much quantitative meaning – for the reasons that you point out.  But, I do think that there is value in testing models of concern.

6/14/16 – Ben Abbott – University of Rennes

From a long-term monitoring perspective, are there tradeoffs with trying to be more quantitative about where to sample to optimally answer the question at hand, since research questions will surely change and so many discoveries happen as happy accidents due to some peculiarity of the sampling design?

No doubt.  I would not suggest that we should overrely on DIRECT or any other measurement optimization scheme.  I am just advocating that we use it in balance with our current, largely ad hoc, approach. I think that what is interesting it to think of how we could promote ‘creative happy accidents’.  It is impractical to make expensive measurements with no basis.  It is also unsupportable to choose measurements to confirm a model (which is inevitable if we have only one model).  Personally, I expect that even if we look for discriminatory measurements, we will still be surprised often.  That is, NONE of our models will match the data that was supposed to support some of them over others!

6/14/16 – Philippe Davy – University of Rennes

You describe an approach based on developing many competing models. Would your approach accept models with statistical outcomes  (either to represent a probabilistic description of the system, or to describe stochastic processes)? That is, how would you apply DIRECT to models for which the parameter values are defined as distributions rather than discrete values?

To be honest, I hadn’t really considered this fully.  I have been focused on the idea of each realization of each model being a separate entity.  But, I guess that it would also be possible to consider each model to be a distribution, given inherent uncertainty of parameter values.  (Or structure, in the case of fractures.)  You could still look for measurements that discriminate among models, in light of their individual uncertainties.  To some degree, Colin did this in his paper that is available on the References page.  But, my expectation is that as the parameter uncertainty grows, it will become increasingly difficult to find any discriminatory measurements.  I guess the question is … does this indicate that the problem is hopeless?  It would be good to test this on some real cases!

6/14/2016 – Audience Member – University of Rennes, France

How do you generate the preference curves?  For example, you may be able to formulate 1000 models that say that hydraulic fracturing is safe.  But, some people will never accept it.  On the other hand, if we have two stakeholders with very different influence (say a mine and a local community), then it can be impossible to find a global optimal solution that doesn’t overweight the more powerful group.

These are two important questions.  In the first case, you are describing a condition for which a stakeholder will not be influenced by science, then it may not be a place to spend your effort doing science!  The important thing is to formulate the utility curves at the beginning of the process to try to determine if this is the case or, if not, what aspects of the problem may be amenable to scientific influence.  I think that this relates to the concept of ‘nudging’ in advertising.  (As I understand it at this point, it is trying to find the influential information that a person or group is most likely to be receptive to and using that to convince someone rather than starting so far from their understanding that you turn them off.)  The second question is equally important and is a key aspect of DIRECT.  Personally, I think it is a mistake to try to express all utility functions in common units (for example, currency).  I think that the real value of the curves is to identify relative values of different outcomes for each stakeholder.  Then, we can try to conduct scientific investigations that will address each group’s concerns, ideally finding data that can serve multiple purposes for greater efficiency.

6/14/16 – Aditya Bandopadhyay – University of Rennes

I enjoyed your talk, in particular the reference to Lincoln’s diverse cabinet.  There is an Indian story, which is fictionalized in the movie named “Hirak Rajar Deshe” (translates to: the kingdom of the Diamond King), where all the ministers kept are sycophants and consequently the kingdom is reduced being ruled in a very authoritarian manner (closes down schools because knowledge leads to well informed opinions — something not in favour of the ways of the king). Consequently the king is overthrown by the protagonists and free will (knowledge) is restored. The undertone of the movie does indicate that decision making must involve members not with wholehearted and illogical agreement with the central authority, but must involve the free and unbiased opinion for best results — a clear inference from your talk today.  A clip can be seen here: https://www.youtube.com/watch?v=4Ytiqf8wMrg.

Fantastic!  Given how well people used to be, I wonder if Lincoln’s thinking was influenced by this or other similar stories from the past?  I guess that the take home message is that we, too, should learn from these old examples.  Just because our tools for predicting the future are more advanced (or, at least, produce higher resolution images) doesn’t mean that we shouldn’t view them with skepticism!

6/14/2016 – Jean-Raynald de Dreuzy and Jean Marçais – University of Rennes

It seems that stakeholders judge the value of science in part based on how well it can address their specific concerns.  This means that models may have to handle very unlikely conditions and still converge.  In this sense, do you think that model robustness is as important as model accuracy for decision support?

(I should note that I am answering all of these questions long after they were asked.  Their ideas have become fully integrated into my talk and largely changed the direction of my thinking.)  This is a very important insight.  I have been thinking that model ‘importance’ should be considered.  But, you are making me wonder if it should actually outweigh model likelihood!

6/14/2016 – Jean-Raynald de Dreuzy and Jean Marçais – University of Rennes

We are interested in the combined use of a small number of numerical models for extrapolation and deep neural networks for interpolation.  In this sense , deep NNs are those that attempt to represent complex relationships in an integral fashion.  The result is an NN with many layers that represents complex relationships that would be represented by numerical models.  We think that these tend to be quite robust – meaning that any layer could be perturbed but the NN would still be able to make good predictions. This could be an advantage for rapid assimilation of a lot of data.  But, we wonder if they could be used to generate the multiple models that you are describing as the basis for DIRECT.

I think that this is a great idea!  I could imagine that we could focus our time and energy on building ‘importantly different’ conceptual and numerical models and then rely on NN’s or some other tool to explore the ‘inter-model’ space more efficiently.  Nice!

6/14/2016 – Tanguy Le Borgne – University of Rennes, France

How confident are you in the choice of observation points for the second case study – is it completely automatic or is there some level of judgement required in identifying sampling points?

I am not a naturally confident person!  But, I am confident that DIRECT is an improvement over intuitive or qualitative approaches to choosing observation locations.  A more difficult part of the question is whether we can choose multiple observations at one point.  This becomes much more computationally demanding!  We are working on this (Colin’s paper shows how this can be done and a colleague in Denmark is following on work by Wolfgang Nowak’s group to use genetic algorithms and Pareto techniques to make this more efficient) – but, in the end I think that this will require some more thinking about how to efficiently search ‘observation space’.  One other thought on this, though … is that it is probably a mistake to expect that we can capture all of our ‘soft’ expertise in our models.  DIRECT only considers the ‘knowledge’ that is in our model ensemble.  So, to some degree, we may always want to leave room for some professional judgement in the design of monitoring networks!

6/14/2016 – Jean-Raynald de Dreuzy – University of Rennes, France

How do we deal with the case where we have a lot of data?  Does this actually make it harder to generate multiple models?

I think that this specific problem may be best solved by considering a range of objective functions so that you weight different data differently.  This could move you toward identifying models that stress different behaviors.  But, so far the most interesting approach that I have heard is the idea that you are exploring with using neural networks as model surrogates.  Depending upon what you find, it may be that deep neural networks can actually be used to ‘force’ us to explore different conceptualizations.  That seems intriguing to me.

6/14/2016 – Jean-Raynald de Dreuzy and Jean Marçais – University of Rennes

There has been some discussion of ‘nudging’, which comes from advertising, as a model for scientists to contribute to public conversations.  Have you thought of this approach as it relates to communicating uncertainty for decision support?

This has really affected my thinking.  I now believe that the most important thing that we can do to increase the impact of our science is to try to start some of our models as close as possible to stakeholders’ concerns.  Even if they have low likelihood – it should be easier to convince them to change their way of thinking if we start close to their internal story and nudge them towards the scientifically most-probable story.  I think that this is a pretty profound insight!

6/7/2016 – audience member – University of Bristol, England

How do you balance the increased information of multiple model approaches against the increased computational cost?

This is the crux of the matter, and there is no simple answer.  I think that the key lies in efficient model (ensemble) building.  That is, we don’t get more information because we have MORE models, but because we have DIFFERENT (plausible) models.  Unfortunately, in the absence of insight, we can only find different models by sampling a lot of models.  We have been exploring the use of clustering techniques to reduce large model ensembles while maintaining ensemble diversity.  But, this isn’t very efficient.  What we need is improved approaches to sample model space (something like Latin Hypercube sampling of models, perhaps).  The perverse thing is that our parameter estimation approaches, which are aimed at finding the very best models, end up spending much or most of their effort sampling with high density in the region of the maximum likelihood model.  In the end, I think that some of our expectation that additional effort in calibration has little value is due to the fact that we are splitting hairs when finding the very best model.  I suspect that we would see much more value for the same effort spent in seeking model diversity.

6/7/2016 – Rafael Rosolem – University of Bristol, England

How can we ensure the sustainable and healthy future for field activities and research in hydrology?  That is, it is becoming very hard to secure funding for long-term, high impact experiments.  Rather, we need to build a series of short-term projects (1-3 years), each with its own self-contained importance, or projected impact. But, we all know that actual investigation of underlying processes in the field is laborious and time-consuming.

I agree entirely.  In fact, I have been a bit concerned that my message comes across as some form of – we should model more INSTEAD of measuring.  Let me clarify – I completely recognize that measurement is the ultimate scientific activity.  All that I would contend is that measurement without clear consideration of WHY we are measuring is just rolling dice.  We may end up with something interesting, but it is possible that we won’t.  Further, it is likely that we won’t end up with data that informs our initial question.  So, I would say – by all means, we should expand funding for community monitoring efforts that provide long term, multi-faceted data collection.  These data are the bedrock of science. But, given our limited budgets, we should clearly pose some hypotheses AND counterhypotheses that are likely to be tested by the data before we commit the funds to collecting them.

6/7/2016 – Rafael Rosolem – University of Bristol, England

You show two local examples.  How can you apply DIRECT to global scale models?  What are the challenges to upscaling?

I think that there are similarities between this question and the question of more versus less complex models – it comes down to run time.  Both, to me, are a question of balancing two aspects of model parsimony – model complexity and number of models.  It is interesting to me that we tend to come down so heavily on the side of having low parsimony in terms of complexity and ultimate parsimony in terms of number of models (one!).  I think that we should strive for a better balance.  I think that the key challenge is to find models that are fundamentally different than each other but still plausible.  If we can do this, we can gain some benefit of model diversity without saddling ourselves with impossible computational demand.

6/7/2016 – Fiona Whitaker – University of Bristol, England

Is there any way to judge models based on their concepts without having to propose realizations that would be compared to data?

This is a hard, but important question.  To me, it boils down to whether we can inject experience or professional insight at the conceptual level without relying on data.  Of course, we do this all the time.  In fact, I think that we may do it too enthusiastically, which ultimately leads to the formation of a single acceptable model.  On the other hand, being able to reject some models as aphysical saves an immense amount of time that would be spent testing models that are clearly false.  I wonder if there is a way to test model responses to a sequence of stresses to assess the general plausibility of models.  I we could do this repeatably and objectively, I think it could be a great addition to model building!

6/7/2016 – Rafael Rosolem – University of Bristol, England

DIRECT is focused on selecting data to test models of concern.  But, many stakeholders concerns will be related to extreme, possibly even unrealistic, models or drivers.  It will be important that a model can converge under these conditions, or we face the problem of not being able to test a stakeholder’s concerns – thereby not helping them to make a more informed decision.  In a way, this is related to an interesting EOS article recently that discussed the concept of using extreme conditions to test model validity.  I wonder if there is a way to combine these ideas to eliminate some models from the ensemble before even collecting more data?

(I have since read this article and included it on my references page.)  This is a really interesting idea.  I think that it is consistent with DIRECT as follows.  We focus on the importance of models for making specific predictions.  So, if we develop an ensemble of models, it would be very reasonable to project them out to extreme conditions – even if they are not well calibrated for these conditions.  Then we would identify those that lead to outcomes of concern and test them against data that could be collected under less extreme conditions.  In fact, I should think about this some more – it may be a key component of ‘importance ranking’ of models!

6/7/2016 – Barney Dobson – University of Bristol, England

You discuss the problem of confirmation bias as it relates to developing models.  But, aren’t we also subject to confirmation bias when we have to make multiple competing objective decisions at the same time?  In fact, by defining the important outcomes first and using those to select data, we are almost intentionally injecting bias into the scientific process.  How do we deal with this?

(Listening back to your question, I realize that I didn’t answer it at all!  Here is another attempt.)  That is a very clever way of looking at this … and maybe a bit troubling!  I think that you are right.  But, I think it is OK.  Basically, DIRECT recognizes that bias is an inescapable part of the decision making process.  But, I think that our usual approach, as scientists, is to say that this bias in inherently wrong, so we will not consider it and we will just do ‘objective’ science.  With DIRECT, we turn things around and say, if you are going to be driven by your bias anyway, perhaps it is better for us to do objective science that addresses your biases.  The trick is that we still have to do our science objectively.  (In fact, I think that multimodel approaches are MORE objective because they do not make hard assumptions about unknown processes or properties.)  I hope that makes some sense!

5/31/2016 – P Johnston– Trinity College Dublin, Ireland

I have a different interpretation of the reason that we construct scientific models.  Ultimately, the value of our models lies mostly in the conceptual model – especially in the public acceptance of the underlying concepts.  But, the conceptualization must be tested by realizations.  Is this inconsistent with DIRECT?

No, I think it is entirely consistent.  I think that we just try to build on this to go two steps further.  You can think of it this way.  For any set of facts (scientific or otherwise) you can build many plausible stores (models).  Each stakeholder will have a storyline in mind that leads to some conclusions.  Often, they make a decision based on a small set of these conclusions, those that represent the greatest perceived risk (combination of probability and cost).  As scientists, we often try to develop our own ‘best story’ to fit the data.  We feel confident if our model fits the data (although it is a bit unfair, because we have ‘calibrated’ it to fit the data) and if stakeholders’ models don’t fit the data.  But, when we present our findings this way – in competition to the public – we don’t have much uptake.  So, one alternative is to build many plausible models and to actually try to build some models that are as consistent with stakeholders’ ideas as possible.  If we can, then we have a model that represents their concerns and is scientifically based.  Then we have the opportunity to test their model against other models with discriminatory data.  The goal is not to disprove their model – rather, it is to test it.  Then we must explain how and why we are testing their model and help them to see the model as a faithful proxy of their story that is also founded in science.  I think that this is the best way to help non-scientists to see the value of science, to see that it can be applied in their interest, rather than placing science in conflict with other considerations that influence (and often dominate) decision making.

5/30/16 – Vasily Demyanov – Heriot-Watt, Scotland

I think that there are many uncertainties associated with the utility functions.  For one thing, they may vary with time.  They also may be subject to various external factors under control (media, price, market expectations etc).  How do you deal with these uncertainties?

I am becoming more an more skeptical that we can … quantitatively.  I imagine that people are actually quite resistant to having their preferences turned into a plot!  Rather, I am beginning to think that we need to develop surrogate stories that try to represent their concerns, rather than trying to quantify them, as such.  (Note, this response was written in August 2016!).

5/27/16 – Peter Bauer Gottwein – Danish Technical University, Copenhagen

You have shown DIRECT as it applies to selecting measurement locations.  But, could it also be used to consider different data types like geophysics?  For example, it could be argued in the example that you showed that if you want to know if the anticline plunges, then you should use some geophysics to map it!

Absolutely!  We can consider the likely value of indirect measurements, but it requires that we use coupled hydrogeophysical approaches to translate each model’s predictions to measurable geophysical signals that can be compared with measurements.  What I would say is that, for the case study shown, we showed that DIRECT could be used to determine whether our not the plunging anticline is important for the prediction of interest and which concentration measurements could define its structure in one integrated framework.  But, we could add geophysics (including multiple models of petrophysical relationships) as candidate measurements.  Eventually, we would also want to be able to consider the relative costs of different measurement types to allow for a full cost:benefit analysis of survey designs.

5/27/2016 – Mike Butts – DHI, Copenhagen

What was the selling point for the groups that did take DIRECT on?

Generally, the best selling point is that we can provide more meaningful uncertainty.  That is, we provide competing descriptions of the system, many of which are plausible, that embody different possible conditions that they may face.  With this kind of description, a decision maker can actually consider different scenarios and ask a scientist to address the specific concerns embodied in those scenarios.  My sense is that this is closer to the way that people actually make decisions, so it is more informative.

5/27/2016 – Mike Butts – DHI, Copenhagen

How could DIRECT be used in a case like the Nile Delta, where three different countries have three very different models driving decision making?

In this case, the interesting thing as that we have models that are almost designed to be different.  So, we can have fairly high trust in outcomes that the models agree upon.  Furthermore, we can look for areas in which the models disagree as opportunities to learn about the system through discriminatory data.

5/27/16 – Peter Bauer Gottwein – Danish Technical University, Copenhagen

Can you think of this as a multi objective optimization approach?

I hadn’t thought of that, but you are probably right.  The question is, what is the tradeoff?  There are a couple of ways that I could see this.  One is that we can think of identifying data that support the important questions for multiple stakeholders.  In fact, we are looking at that with Troels and Steen in Aarhus as regards to selecting added well data to reduce the uncertainty of multiple stakeholder questions.  This is based more directly on the variance reduction approach that Fienen and Nowak have done.  The other tradeoff, which is more interesting to me in some ways, is how we trade off model likelihood and model importance.  There is still some subjectivity involved with deciding the combination of (low) probability and (high) risk that drives decisions.  This defines the subset of models to be discriminated, which eventually controls the selection of additional data.  I’ll have to think more carefully about this!

5/27/2016 – Kerim Martinez – COWI, Copenhagen

There may be some bias in where the talk is being given – that is, the talk is being given in developed countries, but many of the important stakeholders are in developing countries.  Why weren’t there more groups in developing countries?

Ultimately, it is a problem with the structure of the lecture series.  Groups have to know about the series and know that they can invite a speaker.  Also, some years the talk is highly technical and may not be suitable for a general audience.  I knew that I wanted to deliver a talk that could benefit a broad group.  But, although I tried, I couldn’t get invitations to more exotic locations!  I do think it is a bit of a shame, because I think that the message of my talk to stakeholders is that they should be ready to push back against scientific results.  They should ask scientists to provide their counter hypotheses (essentially multiple models) because, after all, science should be about testing a hypothesis against viable counter hypotheses, right?

5/27/2016 – Philip Binnig – Danish Technical University, Copenhagen

How do you deal with the unknowable unknowns?

DIRECT doesn’t guard against this – if it isn’t in your model ensemble, you don’t consider it!  What DIRECT does is to emphasize the importance of spending more time in the conceptualization phase.  It also records the decisions that we have made in constructing our ensemble – that is, we at least know whether we considered (and rejected) an idea!  This helps to avoid making ‘avoidable’ mistakes.

5/27/2016 – Peter Steen Mikkelsen Danish Technical University

Has DIRECT been taken up and, if so, by whom?

So far, we have about seven cases.  For the most part, these have been customers that have problems that are complex enough to require multimodel analysis and who recognize that they are going to have to revisit the modeling effort.  So, they are willing to pay higher up front costs with the realization that the ongoing costs are lower.  That is, we don’t need to recalibrate a model whenever we collect new data – we simply shift the model likelihoods as we extend each models fit statistic.

5/27/2016 – Audience Member – Danish Technical University, Copenhagen

If you are using DIRECT to quantify recharge, how can you test models if you don’t know the recharge value?  Further, doesn’t it require that you make your models more complex to better define recharge processes?

I hadn’t thought of it this way, but I think that it actually points to the strength of DIRECT.  Essentially, we are proposing that you can build multiple competing models, with different representations of recharge, for instance, and then determine if they actually matter for your specific question.  It may be that the analysis shows that recharge is not a key driver to the question at hand.  In that case, you can use a simpler recharge representation rather than spending time and money to better define recharge.  However, if recharge is critical, then you can identify which conceptualizations have more important implications and then design data collection efforts to test the validity of these conceptualizations.

5/26/16 – Giuliano di Baldassarre – Uppsala University, Sweden

If our knowledge is too limited, shouldn’t we make decisions without a model? Also, building on the nice reference you made in your talk about biodiversity, I wonder: if a system changes significantly, we might need a completely new model, the way in which new species come in.  How can you deal with this?

I would say that we ALWAYS use a model.  It may not be a formal model, but if we take my definition of a model as a specific set of simplifying assumptions about a system, we are always in that position – even if it is just conceptual.  I think that your point is perhaps more that we should guard against applying more complexity than we can support with our data.  I would agree, and this is a point of much conversation in the hydrologic literature right now!  But, the counter argument is that sometimes complexity matters!  How do we know?  I would say that one way to examine this is to develop models with and without the complexity in question and see if they make importantly different predictions of interest.  If (and only if) they do, we can then use DIRECT to figure out if there are discriminatory data to test whether the complexity exists in the system.  As for the species competition … I really like this idea!  I think that it makes the point that we have to be on the look out for data that are not captured by ANY of our models.  This won’t show up in the likelihood measures … it will add misfit to all of the models and ‘come out in the wash’.  We need to be open to the idea that there are paradigm challenging data that could be collected!  (I should note that I am adding this post script more than two months after this talk – the idea of punctuated equilibrium came out of this discussion.  I now recognize in all talks that MOST data will lead to minor changes in the likelihoods of models.  But, some data will cause a radical change in our thinking and may require that we rethink the model ensemble entirely!)

5/26/16 – Kevin Bishop – Uppsala University, Sweden

How has DIRECT been taken up by the community?

That is the key question, in the end!  In terms of response to the talk, so far I would say that the most positive responses have been from regulators.  I think it is because they have to deal with the ‘dueling models’ issue and see this as a possible path forward through these conflicts.  The most skeptical response has been from consultants.  I think that this stems from a misperception that building (or explaining) 8000 models will require 8000 times the work (and no change in budget).  In many cases it will require more work up front, to generate an ensemble of models.  But, this depends in part on whether we can develop tools to help to construct variations of related models.  I think that this is certainly possible, but in its infancy in hydrology.  The task of generating multiple conceptual (or geologic or structural) models will require a cultural change and an associated change in our educational offerings.  This may be an even longer term effort, but I think it will become increasingly necessary.  In a practical sense, my former student, Colin Kikuchi, and current student, Tim Bayley, are definitely finding some success (and some challenges) in pitching the concepts to clients as consultants.  That is particularly promising to me!

5/26/16 – Jean-Marc Mayotte – Uppsala University, Sweden

Do you need to apply a threshold likelihood to avoid having too many bad models?

I think that is really important if we are using the ensemble to represent our uncertainty and/or to make decisions based on the ensemble statistics. In other words, if we ‘believe’ the likelihoods in any quantitative sense, then polluting our ensemble with lots of bad models can lead to nonsensical results. The key question is how and when to eliminate models. We might want to eliminate models based on a threshold. But, we also advocate using multiple likelihood measures as a way to generate a larger ensemble. The problem with these filters is that they may reduce model diversity. One approach may be to cluster models on outcomes of interest. Then, if we apply thresholds to eliminate models, we could also include exclusions to ensure that we keep some representatives of models with possible outcomes of concern.

5/26/16 – Sven Halldin – Uppsala University

You need to be clear about the difference between models and realizations of the same model.

I should note that I am writing this response more than two months after the question.  For the first 50 or so versions of the talk, I stuck to the idea that this difference was not important.  In part, I still believe that.  With respect to the predictions of interest, I don’t think that it matters if outcomes of concern are generated by different models or different realizations of the same model.  Either can be tested against new data.  But, I have come to understand that this is too confusing.  I now think that you are right that it is important to distinguish between models and realizations – new versions of the talk do this explicitly.  I think that this difference becomes more important, and more interesting, at the point of knowledge discovery. When we try to understand why different conceptualizations lead to different predictions it becomes important to view each model as different and encompassing many realizations.

5/26/16 – Carl Bolster – USDA-ARS, Bowling Green, KY

Do you think that there is any room for the standard approach of calibrating a single model to answer a question and to use traditional statistical inference methods to estimate the model’s uncertainty, or are you suggesting a more universal paradigm shift toward using multiple models?

I think that there is room for both.  In some cases, our fundamental uncertainties are relatively low.  For example, Graham Fogg may say that a single model approach can be applied to many or most water availability models at the basin scale.  But, as our fundamental uncertainty increases, I think that our need for multimodel approaches increases, too.  In a way, we can talk about this as balancing two types of model parsimony – in terms of the complexity of each model and in terms of the number of models.  In many cases, we have gone to a highly unbalanced situation of building highly complex representation (low parsimony), but only building one version (high parsimony).  Similarly, calls to use only simple models, but many of them, shifts the balance to the opposite extreme.  In the end, we need to develop our ability to produce many distinct, rival models so that we can choose the right balance of model and ensemble parsimony.

5/25/16 – Arvid Bring, Stockholm University

When you showed the old dice I thought immediately that they might be INTENTIONALLY loaded. Does DIRECT handle this consideration?

Absolutely! In fact, I think it is the key point to be made!! In many cases, we make decisions based, for whatever reason, on overestimation of the likelihood of bad outcomes. Science may be the only way to reliably test these misperceptions. The real goal of DIRECT is to discover the outcomes that will drive decisions and then to find data that can specifically test these outcomes against all other outcomes (more literally, the underlying models for these outcomes). I need to remember to make this point explicitly. Thanks!

5/25/16 – Lea Levi – Stockholm University

What advice would you give a person who is new to modeling?

What a great question!  First, I would urge them not to get too caught up in the details of modeling too quickly.  Spend a long time working with simple models to understand how changes that they make are expressed as changes in model predictions.  Once they have a feel for the end results of making changes to a model, they will be in a good position to ‘translate’ their thoughts into models.  At that point, they should try to forget about modeling for a while.  They should really spend their time practicing model conceptualization.  The tasks should be: 1) try to describe a system as simply as possible; 2) then try to come up with a competing description; and 3) try to come up with a third description that has some specific outcome.  This is very hard to do and is best done with other modelers – ideally some with experience.  I hope that this helps!

5/25/16 – John Livsey, Stockholm International Water Institute

I have seen examples of risk averse decision making. But, I have also seen some people, especially those prone to short-term thinking, who make decisions based partially on an overly optimistic interpretation of probabilities. Would DIRECT address their, too?

I have not thought about that, but it is a great point! I’m not sure if those people who WANT to be dissuaded by science. But, the same approach could be used to CHALLENGE this kind of wishful thinking!

5/20/16 – Volker Ermert – University of Bonn

I am wondering if weather forecasts do a good job of using ensemble forecasts in the way that you are suggesting.  Similarly, I am also studying the malaria spread in Africa. Listening to your talk, I realized that I was only using three malaria models for my research.  In particular, I was trying to calibrate the Liverpool Malaria Model and you are right – it was hard to decide which parameter setting was the best meaning it was a ‘flat mountain’.  Can you comment on the application of DIRECT to these fields?

Thank you very much for your interest!  To me, DIRECT is a completely general approach that could be applied to almost any question that will be informed by science.  OK, maybe that is a bit of an overstatement.  But, I do think that we have some interesting things to contribute.  First, I should say that I am not a meteorologist or a virologist, so I may well be misrepresenting what is already done in those fields.  But, I will take a shot anyway.  As for meteorology – I would be very interested to hear if the ensemble techniques are used to identify particularly ‘bad’ storm tracks and then if those models are used to find discriminatory data to test these particular models against other models.  I haven’t heard that this is done.  Similarly, and probably more importantly, for disease propagation.  This seems like a very complex process, with models that require the analyst to make many difficult decisions.  The basic idea that we are proposing is that you should try to avoid making these decisions … instead using them to identify ‘branch points’ that can generate a new line of models.  This leads to a tree of related models that can be used both for DIRECT and for the type of knowledge generation that is being promoted in big data analyses.  I would really like to hear if you can make use of multimodel approaches in your research … stay in touch!

5/16/16 – Sergio Brudasca – Schlumberger

Are you concerned that DIRECT could be used to support fracking or other controversial activities because it requires one to prove harm?

I’m glad you asked that, so that I can clear up a misunderstanding.  The idea that I have for DIRECT is that it can be used to test any and all concerns.  One of those may be the environmental impacts of fracking.  The idea is that a stakeholder group would raise this concern.  Then we, as hydrogeologic modelers, would do our best to find a model (or models) that would lead to the concerns of the stakeholders.  Then, we could test those models against other models.  The idea is not to require that harm be proven.  But, rather, to give stakeholders a chance to test their concerns against the best scientific knowledge.  It could be that the tests support the model that predicts harm!  That would give more weight to the con arguments.  But, if the models predicting harm are found to be less sound than alternatives, it should give stakeholders some confidence that their concerns are less likely to come true.  Does that ring true for you?

5/16/16 – Michael Tso – Lancaster University

Hydrology is an applied science that answers questions. To do so, we will need discrimination-inference. But to do so, prior information takes a huge role. Take your dice example, it takes quite a lot of thinking or going into circles in looking into probability curves to conclude that doubles is a useful indicator to two equal fair dices. Sometimes bringing in the helpful prior information is what takes the most time!

I agree entirely!  I think that identifying, or at least proposing, discriminatory data is a place that requires a lot of creative thinking and use of soft data.  But, on the other hand, the model ensemble can help us here.  Essentially, we can use our models to project outcomes and divide them into models of concern and other models.  Then we can use the same models to propose a wide range of possible measurements.  We can then use data exploration tools, like clustering, for example, to see which potential measurements cluster similarly with the models of concern.  Those are the discriminatory measurements!

5/16/16 – Nicole Archer – British Geological Survey

Doesn’t your focus on data, especially as a means to develop and test models, diminish the role of experience?  If we become modellers who eschew data until it is justified, don’t we risk losing the opportunity to develop experience?

First, I should admit that I am responding to this much later than it was asked.  I have had more than two months to think of a better answer than I probably gave on the day.  But, with that caveat, please consider this idea.  I agree entirely.  There has to be a role for experience, for intuition, frankly for creativity.  But, I have come to believe that the best place for this is in model development.  The most difficult change that we are proposing is that modellers should spend much more time in the creative phase of proposing alternative models.  I can think of no better vehicle for proposing alternatives than through experience.  The more you know, the more you should be able to think ‘outside the black box’.  With that said, I still see the role of hard (and expensive) data as a test of models.  I cannot help but think that models can only be tested against other models.  Further, once you have the models, it seems only logical to attempt to determine if the proposed measurement has the capability to test models against one another.  Does this seem like a way to accommodate both experience and planned observations?

5/16/16 – Michael Tso – Lancaster University

I think that uncertainty or expected cost can only be reduced in a relative sense because there is no possible way that we can know about how much uncertainty is out there. By bringing in more information, we may (i) know more about the questions, (ii) discover that we know about more we don’t know, or (iii) realize that the information tells us pretty much what we have already known. I think an important aspect in future research is not just reducing and quantifying uncertainty, but to establish practical thresholds (i.e. relative but realistic) thresholds that we can conclude the uncertainty reduction in a particular analysis is sufficient to the hydrological problem of interests.

This is another really insightful point.  I would say the same thing slightly differently.  The best that we can do is to define prediction uncertainty based on those things that we have considered.  This may include scenarios (which are usually discrete), model structures (which are usually limited), and parameter values (which are often explored extensively).  But, this uncertainty is NOT the same as the actual uncertainty that we would like to know to make risk-based decisions.  In fact, our ultimate goal can only be to reduce the uncertainty about the system – the uncertainty about future stresses will likely always remain.  One possible approach is to say that our structural uncertainty should be lower than some fraction of that related to future stresses.  But, to be honest, both of these quantities will always be poorly defined.  Personally, I prefer to think of DIRECT as a tool to help us to avoid making mistakes that we should have been able to avoid.  This will have to be coupled with methods that encourage us to consider a wider range of uncertainties in our analyses.

5/13/16 – Parker Wittman – Aspect Consulting, Seattle

It may be worth considering other theoretical and applied fields.  I wonder if particle physicists are more comfortable with the idea of unresolveable uncertainty than geologically minded people?   On the other hand, the FAA requires manufacturers to think of all possible eventualities in testing new planes or even new parts for planes.  Maybe that is a place to look!

Fantastic ideas!  I haven’t had a chance to follow up.  But, I will … for sure.  What is interesting is that the former is inherently uncertain, but not very applied.  The latter is highly applied, but, in some ways, more controlled.  I think that you have described a great set of bookends, between which the reality for hydrogeology probably lies!

5/13/16 – Seann McClure – Aspect Consulting

I wonder if it would be possible to use NSMC, or something like it, to construct models?

John’s tools, like NSMC, are fantastic.  Often, I think that there is much more to them that we know!  In general, I think that we need to develop tools to explore model space the way that existing tools explore parameter space.  NSMC is one of those.  The trick, of course, is how to conceptualize model structure (and boundary conditions, and processes) as continuous variables that can be explored automatically!

5/13/16 – Glen Wallace – PgG – Pacific Groundwater Group, Seattle, WA

Would well A be better to test entire ensemble?  That is, isn’t there some value in checking to make sure that your entire ensemble of models isn’t wrong?

At first blush, yes.  If none of our models match A, then we have learned something profound.  But, we could learn the same, maybe more, if none match well B! The added advantage of monitoring at B is that we will learn something even if our ensemble does encompass the real system.  Regardless, I like the idea that you point out that data can have both minor and major effects on model likelihood assessment.  I think that this is in line with Gould’s idea of punctuated equilibrium.  Most stresses to a population cause very small changes in the gene pool (small movement of the models’ spheres), but some drastic environmental changes (concept rocking new data) cause rapid evolutionary change (model reconceptualization).

5/13/16 – Glen Wallace – PgG – Pacific Groundwater Group, Seattle, WA

Your transport example points to idea that big effects should be considered when building the ensemble.  For example, we should have considered the dipping anticline as a first order feature likely to influence groundwater flow. (The structural geology would allow a reasonable projection of the feature, and model iterations could have considered the uncertainty in the projection).

That’s true.  The trick is – how to determine which effects are ‘big’ unless we ask in the context of the question and the models!  Essentially, what we are proposing is to build a set of plausible models that can do just that.  You can project each to the time/location of decision/impact.  Then segregate the models based on the severity of their projections.  Then, and I would content only then, can you decide what you could measure to discriminate ‘important’ models (or structures, or boundary conditions, etc) from others.

5/13/16 – Peter Schwartzman – PgG – Pacific Groundwater Group, Seattle, WA

What if you have a large number of groups … can you ever find cooperative measurements?

The example shown, with two stakeholders and a clearly shared optimal measurement, is pretty ideal!  Realizing that this kind of perfect overlap is unlikely in messy, real world cases, Colin and I worked on ways to quantify the discriminatory index for each group.  With these in hand, we can use a Pareto (or other trade-off) approach to find sets of measurements that are most likely to satisfy all groups while minimizing total measurement costs.

5/13/16 – Koshlan Mayer-Blackwell – PgG – Pacific Groundwater Group, Seattle, WA

Perhaps DIRECT could be useful for insurance companies, where quantifying risks of model failure can be more important than a making a prediction.

Great idea!  I have been wondering where the ‘in’ might be for this more extensive analysis.  One possibility, as you suggest, is a client that has a financial interest in quantifying risk.  Honestly, I think that many clients do – but, they either don’t want to consider it, or (more and more frequently) the decision maker knows that they won’t be in office long enough to see the consequences, so they are infected with short-term thinking!  The other possibility that I have considered is through the regulatory agencies.  If companies were required to quantify risk through development of multiple, competing models, then I am quite sure that consultants would be willing and able (even happy) to comply!  At base, though, the adoption of DIRECT will require a recognition of the value of guarding against risk – especially risk that could have been foreseen with more complete analysis.

5/13/16 – Koshlan Mayer-Blackwell – PgG – Pacific Groundwater Group, Seattle, WA

How do you avoid over-weighting models due to spatially concentrated existing data?   That is, does/could DIRECT’s maximum likelihood approach account for potentially spatially redundant information?

I hadn’t thought about that!  I think that, in the DIRECT context, we are always looking for the next discriminatory data point.  So, if there is a concentration of existing data in one region, then more data there is unlikely to be selected.  But, I would have to check that!  I think that the more profound part of this comes in with model development.  It is interesting to me that we can have a strong belief in a model conceptualization, but be willing to throw it out if we are forced to do so by new data that contradicts it.  At the same time, without those data in hand, we find it almost impossible to think of competing conceptualizations.  I think that this is related to your question, too – how can we develop the ability to generate new models in the absence of data (or the absence of data in some areas in a model)?  This seems to be the big question that we have to face to make any of this work!

5/13/16 – Koshlan Mayer-Blackwell – PgG – Pacific Groundwater Group, Seattle, WA

When competing models have the potential to produce winners or losers, among stakeholders,  can you ensure a ‘fair’ likelihood weighting? That is, can ensemble weighting be ‘gamed’  – how do you ensure that they aren’t?

The likelihood measures have to be objective – even better, several different measures should be used simultaneously.  That said, you do have to trust some agent to act objectively in defining the measures.  To me, this trust will never develop if modelers are tasked with coming up with a single best representation of a system.  Essentially, they are encouraged to game likelihoods to ‘prove’ that their model is correct.  It would be a very different game if modelers were tasked with coming up with multiple plausible models.  Then, their incentive would be to come up with likelihood measures that support multiple models.  I think that would be inherently insulated against advocacy.

5/13/16 – Dan Matlock – PgG – Pacific Groundwater Group, Seattle, WA

Some clients need AN answer.  What do you do in these cases, where multiple models might not be appropriate?

Maybe we just need to think of better ways to phrase answers related to multiple models.  I can see that it is easier to process the results of one model.  But, if we know that any one model is almost certain to be wrong, how can we justify this faith?  I think it is incumbent on us to be more honest about what we do and don’t know.  Essentially, we should be making a case for multiple plausible descriptions of the system, presenting these to a client, and saying that our current state of knowledge does not allow us to determine which of most likely to be correct.  They can consider the range of models and decide if the range of their predictions is consistent enough to allow them to make decisions.  Or, they can identify some of the models as critical for testing.  Then we can test that subset against the rest of the models.  I fully recognize that this may seem like an academic fantasy at this point.  But, I think it could greatly improve the real impact of the work that we do!

5/13/16 – Janet Knox – PgG – Pacific Groundwater Group, Seattle, WA

What if you discover a process that was not considered in your model ensemble?

This is an excellent point!  I think that the problem with our usual approach to developing models is that this would require us to start over again … develop a new model and work to calibrate it to the data.  With DIRECT, we would ADD models to the ensemble to test whether those with the added effect are important.  If they are important – with respect to predictions of concern – then we can look for data that can test them against the rest of the models that we have developed. If they aren’t important (if their predictions are importantly different from other models), then we don’t have to spend time and money testing them. If nothing else, this means that we haven’t wasted all the time that we spent building the previous models!

5/11/16 – Gordon Grant, US Forest Service, Pacific Northwest Research Station

I really like the merging of the utility functions and the predictive uncertainty.  I think that gives a unique insight into the decision making process.

Thanks!  I really like this approach, too.  I have become convinced (convinced myself?) that this really does collapse many different decision making styles into a single continuum representation.  Now it is just a matter of getting people to test it in the wild!

5/5/16 – Bart Nijssen – University of Washington

I like the idea of forming multiple base models and then varying them to generate a model ensemble.  In a way, each could be thought of as a story line to describe an interpretation of the system.  How do you ensure that the initial models are different and that they cover the range of possible stories?

I think that this is the biggest challenge that we face regardless of the approach that we take.  But, it is much more clearly displayed with the approach I am describing.  At some level, I think that this requires two things: better training in model conceptualization; and increased emphasis on developing competing hypotheses.  Ultimately, this is how we should be spending our time … leaving anything that can be automated to automatic approaches like parameter estimation routines.   I think that the only way that we will ultimately form robust, representative model ensembles is to make it a priority, to build tools (SUMMA?) that help us to build ensembles in structured ways, and to build tools that relieve us of the more mundane aspects of model building and calibration.  To reiterate, we need to do what we are doing for parameter estimation for the examination of model structure and external forcings.

5/4/16 – Carie-Ann Lau – Simon Fraser University

I work in geohazards, so our models are somewhat different than the models that you describe.  Do you think that you could take some of the same approaches for statistical risk-based models?

I haven’t thought enough about this to give a good answer.  What I would say right now is that it would be great to try!  Ultimately, I think that the important thing is to be able to connect outcomes of concern to specific, testable models.  I haven’t worked out how to link a probabilistic model (one that propagates uncertainties directly) can be used to make specific, testable predictions.  But, I think that is just a lack of knowledge on my part.  Maybe, if you interest in thinking about this question, we could follow up in the months ahead?

4/26/16 – NGWA – Colorado Governor John Hickenlooper

(In response to the question, “We, as scientists, are comfortable with the knowledge that our findings include uncertainty.  But, anti-scientific elements of society present their ideas with absolute certainty.  As a politician, is it helpful for scientists to express uncertainty, or does it fuel anti-scientific arguments?”)

It is critical that scientists report uncertainty.  In fact, they need to find ways to do this more clearly and consistently.  As a society we need to accept that all decisions are probabilistic.  What is interesting is that when I speak to business people – whether they are with the chamber of commerce or if they run small or large businesses – they get this idea.  Even if they are choosing an ad campaign, they know that there isn’t certainty regarding which will be effective.  It is only in the most public parts of public debate that there is a pretense that any decisions are without certainty.  So, to answer your question, yes – it is very important that scientists report their uncertainty to support better, more informed decision making.

4/26/16 – Liang Guo – Independent Consultant

Can DIRECT be used to assess data quality?

Interestingly, I haven’t thought about the impacts of a few ‘bad’ data points.  There is a real danger in developing models to chase bad data – especially because we are trying to examine outlying models.  I think that the advantage that we have in the DIRECT framework is that we are always testing these models, if they lead to outcomes of concern.  So, in a sense, we would be testing the bad data if they lead us to construct bad (and important) models.  But, it is worth checking this specifically!

4/26/16 – Ravichand Paturi – University of Alaska, Fairbanks

Can DIRECT support real time measurement optimization?

I think that it can.  Essentially, we develop an ensemble of models and assessing their misfit to all existing data.  That is a lot of up-front calculation.  As new data are added, it should be relatively fast to update the model likelihoods by adding the misfits to the new data to the objective function.  The real challenge lies in doing the discrimination-inference (DI) quickly.  Especially if we are considering selecting multiple measurements – then the combinations explode!  I like the idea of using clustering techniques to reduce the number of models that have to be examined for DI.  I would also think of doing DI for single observations and then considering the correlations among identified discriminatory points.  So, thinking about your question now, I would probably say yes, maybe, with some improvements!

4/26/16 – Henk Haitjema – Editor, Ground Water

I like the idea of using progressive model complexification to build understanding.  In the end, the simpler the model, the MORE powerful it is in a legal context.  If you can make your point with a back of the envelope calculation, you will win the argument.

I like this idea very much.  I can imagine this coming together with other ideas that have come up during my tour as follows.  A simple model would form the basis of a representation of a system.  At each point that more complexity is needed (assuming that this is accompanied with uncertainty regarding formulations and structures) the model space would ‘branch’.  Ultimately, this would lead to a tree of related models.  If there is uncertainty about the simple model, this could be repeated to establish a ‘forest’ of models.  This could then take advantage of tools that have been developed for machine learning analyses.

4/26/16 – Michael Frind – Independent Consultant, Calgary

If you have a constant data stream, do you need to re calibrate all of the models in your ensemble continuously?

No … in fact, this is why multimodeling approaches can avoid ‘model freeze’.  The idea is that you simply update the model likelihoods, which may just result in a subtle change in the weight given to each model.  (Or, if the data are revolutionary, to a jump, as in punctuated equilibrium in evolution!)  In fact, it is even better than that.  Given that each data point is an independent element in the summation of the objective function, you only have to add the new elements that relate to the new observations and renormalize across all of the models.

4/15/16 – Dave Hyndman – Michigan State University

We have been working with methods to characterize the detailed hydrogeologic structure of heterogeneous materials, and fill these structures  with stochastic representations of heterogeneous hydraulic conductivity.  For many of the applications that we have tried, this structure tends to be very important to capture – especially when considering solute transport.  Given the infinite number of possible detailed structures, how can you apply DIRECT in an inferential mode to these data?

That is a very tough one.  I think that the simple answer is that you probably can’t.  At least not as described.  But, I think that you could do something a bit more abstract.  For instance, you could run many models with different structural characteristics (e.g. degree of connectivity, transition probabilities, etc) and look for which characteristics are most important for your outcome of concern.  Then you could design detailed characterization efforts to capture these characteristics.  This may have more to do with how to adaptively design your field characterization than whether or not to collect detailed data.  But, I think that the outcome is similar.  Ultimately, we can either try to develop tools that can help guide our data collection efforts, or we have to rely on ad hoc approaches (or accept high costs due to oversampling).  This does deserve more thought, though … let me get back to you!

4/15/16 – Austin Parish – Michigan State University

Are there guidelines for developing different models to populate an ensemble?

To my knowledge, no.  But I think that we need them. Ultimately, I think that this creative aspect of proposing models should be the real task of modelers.  Then, as we have done for parameter estimation, we should develop routines to fill models in between these proposed models.  I also like the ideas that Martyn Clark is developing that formulate a structure for developing models.  As I understand it, models will be developed by starting with a model (or multiple base models) and developing a tree of related models.  Each design decision is a branch point, leading to a family of models that have clear relationships among them.  One great thing about this is that it allows us to figure out which decisions lead to outcomes of concern.  This kind of knowledge generation is standard in machine learning and other model-free approaches.  Eventually, we need to do something like this, too.

4/15/16 – Tyrone Rooney – Michigan State University

Your talk is focused on the value of models to support decision making and to guide data collection.  One question that I always have is, how do you properly credit data collectors?

I am a measurement guy at heart, so I understand your concern!  I think DIRECT allows modelers to appreciate the value of good quality measurements.  By requiring modelers to consider, explicitly, the impact of measurements on model likelihood, prediction uncertainty, and decision, we can start to bridge the (artificial) gap in our discipline.  But, appreciation and acknowledgement are different things … the latter will require a cultural change!

4/15/16 – Warren Wood – MSU

Regarding your concern this type of analyses is not getting into consulting practice, perhaps it would be appropriate for the USGS to lead the charge.  It was John Bredehoeft, in the middle 60’s, that brought numerical modeling to USGS amid much kicking and screaming.  It eventually took hold in consulting practice.  I wonder if the USGS could push the idea of multimodel analysis now; it seems like a natural for many USGS problems.

I agree.  In my mind, the USGS has been the leading force in high-end applied modeling.  Andy Leaf and Mike Fienen are current great examples, and they may be just the people to push these ideas forward.  I have been very encouraged by my conversations with people across the USGS – many noted on the blog.  I will definitely follow up on the possibility of working with GS folks to put some things in motion!

4/13/16 – Peter Riemersma – GVSU

I think that it is important to emphasize that DIRECT requires (or allows) a modeler to spend more time and effort on conceptualization.

I have modified the talk to reflect this … it is a key point.  To me, the key to effective multi-model analysis is that modelers devote their efforts to the creative aspects of model building.  I don’t think that this occurs in the calibration phase – I think that automated tools are ideal for exploring the space efficiently and completely.  I think that we need to redefine the modeling task as trying to come up with as many different but plausible models as we can to represent systems.  Then the ensemble can act as a whole to define what we do and don’t know!

4/8/16 – Tony Daus – GSI, Environmental

When should we bring in the lawyers?  (As a follow up, Tony provided a document showing a case where MC analysis was used in the penalty phase.  The article includes a fantastic explanation of why multimodel analysis is consistent with a legal framework!)

This is a must read.  It is great to see that probabilistic analyses are entering the legal domain.  Moreso, this provides a very solid argument for them – we already expect probabilistic measures of other evidence, such as DNA analysis or fingerprints or eyewitness testimony – why shouldn’t we hold expert witnesses to the same expectations?!  (Note, too, that Paul Hsieh sent me a report form the BP oil spill trial that also shows the use of MC analysis in the penalty phase … the document is on the references page.)

4/7/16 – Brett Sanders – UC Irvine

It seems like the approach of using multiple models, utility, and likelihood is a rising tide.  Is this inevitable?  Should we be training students in this way of thinking, or is it too early to do that?

This is one of the things that I am hoping to learn this year.  How far has the idea penetrated?  Are other fields using this approach and are they far ahead of us?   Personally, I think that the approach is inevitable.  Eventually, we have to recognize that no single model is ever good enough to be reliable.  We can only make robust predictions for decision making under uncertainty using multiple models.  But, this will require a cultural change in the way that we define the purpose of modeling.  That probably won’t happen quickly!

4/7/16 – Brett Sanders – UC Irvine

We are at the beginning of a project that is focused on communicating science to different groups that use risk information.  We present different groups (planners, emergency workers, etc) with different maps and ask which they would find most useful.  How does this relate to DIRECT?

This sounds fantastic!  It relates to the definition of utility functions.  I represent them as continuous curves, which is the easiest way to see how they could be used for both probabilistic and non-probabilistic decision support.  But, they could be categorical – what are the bad outcomes that will drive your decision?  What are the hydrologic predictions that relate to these outcomes?  How do we test the models that lead to these predictions against all other models of the system?  I’ll look forward to hearing what you learn about stakeholder valuation through your study!

4/5/16 – Rick Felling – Deputy Administator, NV State Engineer’s Office

Our office makes water rights decisions using groundwater flow models on a regular basis.  Often there are competing models showing different results.  In such cases, we must make a decision on which model has more merit and rely on the results of that model.  Our preference is for models that are prepared by unbiased third parties.  Alternatively, we have also observed that competing models that have similar conceptual approaches often have similar results.  This situation is increases the confidence in both models and simplifies decision making.

I think that the use of an unbiased third party to assess model quality makes a lot of sense.  But, I would not be surprised to find many cases for which multiple models are equally good, making it hard to choose one.  Another alternative is to use a consensus approach to model building.  I think that this is one of the best ways forward to develop a single model that will have buy in from all parties.  I know that the USGS is really trying to rely on this approach wherever possible, too.  I only have two concerns.  First, I worry that the consensus may dissolve when the predictions start rolling in that are unfavorable to one or more sides.  Second, this is still, ultimately, a single model approach.  It has the advantage that, by stripping away details from each model, it tends to a simpler model.  But, it still neglects to explore the lower likelihood, higher risk outcomes that drive decisions.  For that reason, I feel like consensus modeling is an improvement on dueling models, but it doesn’t serve the purposes that DIRECT was designed to address.

4/4/16 – Hydrologist, Reno NV

A friend and colleague related this story to me … I was speaking with a leading hydrologist who also works in geomorphology.  I commented that geomorphologists seem to be much more constructive in their conversations than hydrologic modellers.  Without hesitation, they said, ‘it’s because they go to the field together!’

On the one hand, this is a comment about how relationships form in the field.  On the other, I think that there is something profound in this observation that relates to my talk.  Basically, I think that field people are immersed in the complexities of the natural system, which tempers their feeling that they really ‘know’ anything with certainty. In contrast, people who devote most of their time to developing models of systems can become overconfident in their level of system understanding.  Short of requiring people to sit in the field while building models, it would be good to find ways to impart that openness to fundamental uncertainty that surrounds you in the field into the office-bound model building process!

4/1/16 – Judy Detchon – my mother-in-law

Some people asked about the cost of running multiple models, what about the time required?

Great question!  Others have asked about the cost of running multiple models and the complexity of dealing with the output of a model ensemble.  But, you are right, overly complex models can simply take too long to run to be practical in a multimodel framework.  The key challenge is HOW to simplify the models appropriately!

4/1/16 -Chris Castro – Univ of Arizona

How do we apply DIRECT to short and long term atmospheric predictions?

During El Dia, one of the atmospheric science student presentations introduced the idea of Monte Carlo analyses to better define the likely track of a storm.  I think that this is a great example of how DIRECT could be applied.  If we could do this in real time, it may point to measurements that could be made that would do the best job of refining storm predictions, with an emphasis on testing the ‘most problematic’ predictions.  As for longer term applications, I am really interested in the idea of subjecting multiple (e.g. groundwater) models to multiple climate predictions.  I think that this could lead to more robust estimates of the possible impacts of climate change under the combined uncertainty of climate and subsurface representation.

4/1/16 -Tomas Goode – MWH Global

How can we sell this idea to clients who are not accustomed to paying for multiple models and may be concerned about high costs associated with the approach?

I imagine that it would differ with every client; and some may never buy it.  Actually, some probably shouldn’t – some problems can be answered sufficiently with simple models or, in fact, no models at all!  But, if the question that is being addressed faces significant uncertainties about hydrologic responses and if different responses have significantly different potential costs (or suggest different actions), then I think that you could ‘sell’ DIRECT as the best way to quantify their exposure and, specifically, to determine whether and to what degree additional hydrogeologic investigations could reduce their risk.  In particular, you can show that some portion of their exposure is DUE TO the hydrologic uncertainty and, therefore, there is value in addressing it.

4/1/16 -Hydrogeologic consultant

I recently proposed the use of time-lapse resistivity to monitor the installation of a grout curtain in a fractured rock region near the core of a new embankment dam we were constructing.  The proposal was denied internally because it called into question standard construction practices, and may have indicated to our client that we were uncertain regarding the construction methods proposed.  I recognize some of the logic in this, but this type of thinking is generational.  Breaking in a new standard of practice will take time.

I think that this is a very general problem and it speaks to the way that we communicate uncertainty.  To move forward, we really need to find a way to describe uncertainty in accessible ways that demonstrate that uncertainty does not equal lack of knowledge.  I think that there must be a sports analogy.  At the beginning of the season, any team could win the Superbowl.  Halfway through the season, we are a bit more certain which team will win, but we could still be surprised.  As the season progresses, we start to eliminate some teams, but we are still uncertain.  But, this uncertainty is SMALLER and DIFFERENT than the uncertainty at the beginning of the season.  Hmmm … still not good, but there may be something there!

4/1/16 – Tim Lahmers, University of Arizona

In the atmospheric sciences discipline, executing an atmospheric model that can physically resolve important processes (e.g. organized convection) can be computationally expensive. This is particularly true for executing Regional Climate Model (RCM) simulations, where a model simulation needs to encompass a period that is several decades (and sometimes over a century) long. In these cases, it may be feasible to only run a few model simulations. For other atmospheric science applications, like weather forecasting, running multiple models is more tractable. My question then is “what do you do if using a multi model approach to solve a problem is not feasible?”.

This is a choice that we have to make that will be application dependent.  In some cases, it may be justified to run a single, complex model that requires all of your computing resources (time, capacity, budget).  In other cases, it will be a clear choice to run many less complex models at the cost of detail and calibration effort.  My gut feeling is that we would choose the many model approach most often if we had both choices available to us.  But, so far, we have devoted our efforts to building more complex models at the expense of developing support for multimodel analyses.

4/1/16 – Jim Leenhouts – USGS AZ Water Center

I can think of several instances that we have developed models and later had to address stakeholders’ concerns through revision.  It could well have been simpler and more efficient to have developed an ensemble of models that we could expand to include later concerns, if necessary.  Worth thinking about!

It is an interesting observation, and one that I think would apply to most people who develop models for others to use.  To me, the key point is that we are committing (time, resources, reputation) to a single model that we know to be limited (or flawed, or over simplified, or just wrong).  Not only do we commit to it by discarding all other models, we over commit by spending all of our resources to ‘perfect’ that model (i.e. fit it to the data that we have at the time).  I can equally imagine that a consultant would often benefit by having multiple models, thereby ensuring against a job being taken by a rival consultant who casts doubt on assumptions that they made in developing their single model.  Equivalently, in a legal setting, it seems that it would be easier to discredit any single model that an expert witness puts forward – one questionable decision may cast doubt on the entire model.  But, how do you criticize an analysis that demonstrates that it has considered a wider range of possible representations?

4/1/16 – Paul Hsieh, USGS

You emphasize the need to form multiple models to support a more robust analysis.  It seems that there should be a measure of difference that you could apply to the models to ensure that you don’t end up with many models that are essentially equivalent due to parameter correlation.  Can you comment on how different the models need to be and how you can achieve that difference?

This is a really important point that hasn’t come up yet!  At one level, I would say that model equivalence may fall out naturally as DIRECT is applied.  That is – if two models are truly equivalent then they will make the same predictions, so they will always be grouped together in terms of their predictions of concern.  At the next level, I think that there could be implications for the model likelihoods if multiple equivalent models are included in the ensemble – essentially overweighting the model by including its clones.  Even more fundamentally, including equivalent models may give us the feeling that we have developed a broad model ensemble when we have really only ‘varied the hair and makeup’ of one model.  I haven’t thought much about how to ensure that models are truly different.  One simple approach may be to examine the predictions of interest and to have a healthy skepticism if they are too similar.  This would, at least, suggest that we should be looking more carefully for more models to include.  Ultimately, I have a feeling that these fundamental differences will come into the ensemble only if we shift our focus away from the calibration phase to the (creative) model proposal phase of model building.  I could imagine a future when most of the work of modelers is devoted to proposing alternative models that differ in important and meaningful ways.  Then, much as we do today for parameter estimation, we will rely on software to do the laborious task of constructing realizations of models that span these human-proposed ‘pilot points in model space’.

3/31/16 – Stephen Moysey, Clemson, SC

I like the approach to multimodel analysis and utility, rather than insisting on dollar values.  This seems very similar to Value of Information (VOI), in particular, the work of Jef Caers in his book “Modeling Uncertainty in the Earth Sciences”.  Can you comment on how DIRECT is different?

First, I have to admit to some ignorance, here.  I have been familiar with VOI, but only as Andrew Binley used it for ERT spatial sensitivity analysis.  On your suggestion, I read the chapter on VOI in Jef’s book.  It really is excellent …  very clearly stated!  I think that DIRECT is, clearly, a version of VOI.  Its value, in my eyes, is that it can support different decision styles – not strictly probabilistic, that it bridges the ‘full Monty’ treatment of VOI and the simple deterministic approaches that still characterize our practice, and that it provides for conceptually simple ways to implement the tenets underpinning VOI.  Nonetheless, I will absolutely link to Jeff and his book.  As I read more I am sure he will become one if my listed major influences.

3/30/2016 – Aaron Thompson – University of Georgia

How do you guard against some variant of … ‘my brother is a hydrologist, and he says ….’?

I think that this is more common than we may think!  Not necessarily a brother, but a consultant, or professor, or popular writer … someone captures the trust of a stakeholder group and their ideas are hard to dislodge.  I think that perhaps the ONLY way to deal with this is by providing direct comparisons of their expert’s predictions with those of other models.  I don’t think we’ll get too far by criticizing their experts credentials or motivations.  But, we may be able to make some headway if we develop an open forum for models to compete.  Let the best models be recognized and the less good models, well, be given less weight.

3/30/2016 – Todd Rasmussen – University of Georgia

Some people have said that models are only useful for hind casting.  In this context, is a collection of models any better than one?

Ahh … a philosophical question!  It has been interesting to address the ‘all models are wrong’ idea.  At one extreme, it seems to limit us if we don’t at least aim to make near-perfect models.  On the other hand, we can be paralyzed if we really believe that all of our models are wrong.  For me, developing an ensemble of models is a good compromise.  It is still to be expected that no single model will be correct.  The next best thing is to hope that our ensemble at least spans the ‘truth’.  That, too, will probably be untrue in many cases.  But, at least we will have documented what we DID consider.  If nothing else, it may help us to avoid making mistakes that we should have foreseen!

3/30/2016 – David Radcliffe – University of Georgia

We have been using SWATCUP SUFI2 for model calibration and recognizing equifinality in this process, but we then use only one model (the best fit) for scenarios.  I’m not sure why we haven’t (part of the problem is figuring out how to incorporate the equifinality from our calibration in our scenario runs using SUFI2 — maybe with the “No observation” tool) but we’ll give it a try!

Great!  It is interesting that we tend to separate the calibration phase and the predictions for decision support.  For whatever reason, we are often more comfortable settling on a single model or a model average to use for decision making.  Let me know what you find out and please feel free to get in touch to discuss as you move forward!

3/30/2016 – Adam Milewski – University of Georgia

Is it enough to rely on the parameter variations from automated calibration?  Or, do you need to generate fundamentally different models?

I think that this is the most difficult question related to DIRECT.  The short answer is that the models should be as fundamentally different as possible to cover as much of the space of plausibility as possible.  But, this can be difficult (or too expensive) to achieve.  At a minimum, we should make use of all of the parameter sets that we propose when calibrating models.  I think that the second level of analysis should be variation of the objective function (this is similar to the variation of likelihood measures used in GLUE).  Then we get into changes in the boundary conditions – often referred to as scenario testing.  Finally, we have to consider the really hard parts – structural and conceptual differences.  Ultimately, I think that our goal should be to use modelers to generate these conceptual and structural differences … to do the creative part of the process.  Then we should develop tools (like PEST) to do for the other sources of uncertainty what we can now do for parameter uncertainty.  I think that some people are working on this (David Lesmes at DOE, Martyn Clark at NCAR, ….).  This would be a huge advance for model-based analyses!

3/30/2016 – Adam Milewski – University of Georgia

If the goal is to construct numerous models in an attempt to find any or all that meet a certain acceptable criteria, does the type of input data used matter for those models?  For example, as a hydrogeologist who incorporates satellite remote sensing estimates into hydrologic models to offset the lack of field data, a common hurdle that I and others face is the absence of more accurate field data.  Can DIRECT be applied to this condition?

This is an increasingly important question as remotely sensed data are incorporated more commonly into distributed models.  My basic response would be that DIRECT can be applied at any stage of a project to identify the NEXT data to collect.  What is interesting to me is that it can at least seem to be more difficult to form competing models once we have more data available.  But, this is my impression from the ‘outside’ … I’d be curious to know if you still find that there are significant uncertainties even given a wealth of RS data.  (I am assuming yes, given your stated wish for more/better ground truth data!)  So, this was a bit of a rambling answer, but to summarize, I think that DIRECT (or something like it) could be a very good way to assess WHICH ground truth data is likely to be most valuable for constraining important model predictions, even if the models are constrained by RS data.

3/30/2016 – Aaron Thompson – University of Georgia

Many people make decisions based on expertise of people they trust, even if that “expertise” is only marginal to the topic at hand. For instance… ‘my brother is a hydrologist, and he says ….’. How does your presented approach intersect with this common way people make decisions?

I think that this is more common than we may think!  Not necessarily a brother, but a consultant, or professor, or popular writer … someone captures the trust of a stakeholder group and their ideas are hard to dislodge.  I think that perhaps the ONLY way to deal with this is by providing direct comparisons of their expert’s predictions with those of other models.  I don’t think we’ll get too far by criticizing their experts credentials or motivations.  But, we may be able to make some headway if we develop an open forum for models to compete.  Let the best models be recognized and the less good models, well, be given less weight.

3/30/16 – Leigh Askew Elkins, University of Georgia, Athens, GA

The following is a brief summary of a conversation that Adam Milewski and I had with Leigh, who works at the Carl Vinson Institute of Government.

One need for many decision-makers on the state and local level is an introduction to fundamental subsurface hydrologic processes.  This includes the connections between groundwater and surface water and the spatial extents and time frames if hydrologic processes. Discussions of scientific uncertainty are generally beyond the limits of time that they can devote to any given topic.

In general, policy makers understand the important connection between water resources and economic growth and development. (Ty – this relates to the concept of utility that I discuss.  For example, translating hydrologic outcomes to jobs or other measures of economic development is necessary to quantify those outcomes for lawmakers.)

Farmers, as noted by one during a Regional Water Council meeting, have been dealing with climate change and many hydrologic processes (e.g. groundwater – surface water connection) directly and for some time.  As a result, they are quite sophisticated and may drive future adoption of new science in Georgia.

Of necessity, regulation is applied based on political boundaries, not the absolute distance from a stream or well or source of contamination.  This can lead to inconsistencies in zoning from a purely scientific standpoint. This is especially true when trans-state boundary impacts impact water management.

Thanks, Leigh!  I am particularly impressed by the effort of the Legislative Academy.  This effort to inform legislators (not staff, the legislators) on scientific issues could be a model for other states.

3/28/2016 – Gregory Payne – University of West Georgia

I deal with cladistics and organism classification models in some of my classes and your comments reminded me of a quote that I present in one of my courses … “Even with a consistent method, the best tree need not be the correct tree.” R. Raff et al. 1977. Ann Rev. Ecol. Syst. 25:  351-375. Your comments concerning hydrogeology related to “data are sparse and models are incomplete, but it’s what we have so how do we use it” rings very true in organismal systematics. There are lots of models out there that show various relationships.  Chances are that none of those models are actually correct, but it may be the best model based on the data available at that point in time.  My question is – the work that you present seems to have some similarity to the approaches to identifying and producing vaccines.  Have you looked at this field?

It is great to hear these connections!  I have had the sense that the ideas underlying DIRECT were related to other fields … in fact, I have assumed that someone has already done all of this somewhere!  I haven’t looked closely at the design of drug trials, but it is now on my list!

3/28/2016 – Jim Mayer – University of West Georgia

Your description of DIRECT reminds me of the procedures used for medical trials.  Have you seen any comparisons with these procedures and hydrologic analyses?

This is a really interesting point.  Drug trials may be the most carefully designed experiments!  They generally have very little data, are working with very complicated systems, and rely on using statistical inference to extend the results to a larger population.  I think that this represents an extreme that could be very instructive for hydrologic analyses!  Just some of the things that we could adopt are double-blind studies (separation of data collection and analysis), communication of results in statistical terms, and intentional design of experiments.  To be honest, DIRECT is a highly simplified version of this … it really represents a first step that I think that we could pull off as a profession.  I’m not sure how far we will ever be able (to afford) to go down the road of the medical trial!

3/16/2016 – Tom Nolan, USGS, Reston, VA

I like the idea of using multiple models.  In the case of numerical models, this is probably most easily achieved by changing parameter values – as you described ‘models’.  For example, different parameter sets can be explored while maintaining the model in a calibrated state by slightly relaxing objective function fit criteria. We are doing something a bit different for the NAWQA project.  We are using machine learning models such as artificial neural networks and adjusting  meta-parameters (e.g. the number of hidden layer nodes) to explore the bias-variance trade-off.  Interestingly, we are finding that more complex models (increasing hidden layer nodes in this case) often lead to worse predictions to new data.  I think that this is consistent with what you are suggesting, do you agree?

Absolutely!  In fact, I think that it would be a huge advance if we could develop continuous versions of conceptual/structural models for numerical analyses.  If we could spend our time defining ‘poles of attraction’ for concepts and geologic structures and then allow automated tools (like PEST for structure) to explore these model spaces, then I think that we would really be on the path to developing highly useful model ensembles for DIRECT and for many other multi-model approaches!

3/16/16 – Alden Provost – USGS Reston

Might the term ‘likelihood’ in your talk be misleading?  When evaluating multiple models, we often treat the process like a police lineup, with the aim being to choose the most likely suspect.  But most likely in what sense?  Not the most likely to be “true,” since none of the models is a “true” representation of reality. Your talk seems to suggest that our approach to using multiple models should be more like assembling a composite sketch based on the testimony of multiple witnesses.  Another, related, way to think of this is that maybe the “likelihood” of a given model could be interpreted as something like “credibility” (rather than probability)?  Nobody’s recollection is perfect, but some witnesses’ testimony provides more useful and accurate information than others’.  The trick is to identify the “more credible” witnesses and give their testimony more weight.

Cool ideas!  I guess that the term should be ‘more likely’, right?  Or, another way to look at it is that we shouldn’t really have the likelihoods add to one!  I think that the real point is that we should think of the likelihoods as relative in most of our discussions.  But, when it really becomes important is when we try to calculate expected costs using likelihoods.  It really highlights the point that model likelihoods are not the same as ‘likelihood of an outcome’ … as much as we would like to make the connection!

3/14/2016 – Karilyn Heisen, CDM-Smith, Boston, MA

You present an example that shows how you can propose new observations to discriminate among models in your ensemble.  But, there are even more uncertainties that are not represented in your ensemble.  Can you use the results as a guide to areas in which you should focus new model development and conceptualization?

I really like this idea!  This brings together two aspects of DIRECT that I hadn’t married before.  On the one hand, we have focused on how the method can be used to guide data collection and decision making for practical problems.  In these cases, the ensemble is subdivided based on the consequences of predictions of interest (through utility functions).  We have also investigated how DIRECT can be used for purely scientific investigations – in these cases, the ensemble is subdivided based on underlying hypotheses, such as ‘what is the primary process controlling recharge?’ or ‘what is the dip and extent of this fault?’.  What you suggest could start with the practical – identifying those models that lead to a consequence of concern, and the further subdivide those models based on underlying structure or process.  I think that this could definitely help to point the way to areas that warrant further consideration when adding models to the ensemble!

3/10/2016 – David Lesmes, DOE, Germantown, MD

Your presentation focuses on practical applications of DIRECT, guided by utility functions.  Could DIRECT be applied to purely scientific questions?  (Follow up) We are developing an open, interoperable, and modular computing platform to develop community codes for hydrologic analyses.  It seems that something like this would be useful for the ideas that you are presenting.  Could you comment on that?

I’m glad that you asked about the purely scientific applications.  I generally focus on the practical because it is simpler to explain.  But, Colin’s paper explains how the tools that are applied here are directly applicable to purely scientific problems.  Essentially, in both cases, the modeler has to decide how to subdivide the model ensemble to form two competing hypothesis-groups.  In the practical case, the groups are defined based on their consequences to a stakeholder group.  In the scientific case, they can be based on underlying concepts or structural hypotheses.  In that case, the objective is to determine if there are measurements that could discriminate between or among hypotheses even in the face of all other sources of variation in proposed observations.  As to the modeling platform, yes!  I think that this is exactly what we need to move forward as a science.  I would especially love to see a platform that allowed a developer to build branching model ensembles.  That is, a modeler could define critical decision points and then propose two or more threads along which models will be constructed.  Ultimately, this could lead to an organic approach to building model ensembles that really capture our state of uncertainty about physical systems.  I can’t see how this can be achieved efficiently without basing it on an interoperable and modular structure.  In practice, it would be particularly useful if this were not restricted to one code or family of codes – the ability to include multiple codes would open the tool to much broader use and more rapid adoption.

3/10/2016 – Sally McFarlane, DOE, Germantown, MD

The atmospheric community has a similar structure for assessing the likely value of satellite measurements before they are collected, called Observing System Simulation Experiments (OSSEs).  How is DIRECT related to these approaches?

I really should point that out in the talk!  For me, this is the path to bringing two parts of my academic life together – geophysics and measurement selection.  My understanding of OSSEs is that they rely on coupled process and instrument response models.  Essentially, you can project multiple outcomes, with different models or different drivers, and then use the output to simulate what you would measure with an instrument with a non-point footprint.  Then you can infer whether there will be a measurable difference among the models/drivers.  DIRECT is very similar – we propose modeling differences in proposed measurements among models.  The difference is that we subdivide the model ensemble based on the consequences of the models (for practical applications) or based on some underlying set of hypotheses (for purely scientific applications).  What I did not point out, and should have, is that if a coupled hydrogeophysical framework is adopted, then you can consider geophysical observations as a part of the proposed observation set in DIRECT.  In fact, colleagues in Denmark are working on this problem for airborne electromagnetic methods.

2/31/2016 – Delphine Dyer, undergraduate in the ANSEP program, UA Anchorage

I enjoyed your talk about modeling – it reminded me of a story I just heard about bias in science on NPR.  (She later provide this link.)

I was really impressed with the Alaska Native Science and Engineering Program at UAA.  It was clearly a supportive environment that pushed undergraduate students to look beyond their classes.  The fact that they had to present to over 100 students, in the round, was impressive … I know first hand how intimidating that can be!

2/30/2016 – Kimberley Maher – ADNR, Fairbanks, AK

How do you handle many small decisions that may have implications for later decisions.  That is, in the aggregate, small allowances may add to a late impact.  Or, one decision can set a precedent for later decisions.

I can think of two approaches.  The first is to model the many actions when considering their cumulative impact.  This would be ideal, but may not be practical for complicated chains of decisions.  Alternatively, it seems that this could be factored into the utility function – a perceived cost of the thin edge if the wedge effect.  I’m afraid that these are pretty poor answers to a great question!

2/29/16 – Kyle Johnson – SLR, Anchorage, AK.

You talked about considering uncertainties in the model, but how do you treat the (unknown) uncertainties in the data?

Excellent point!  You can make a case that the way that we consider data is, itself, a model – either in how we assign measurement uncertainties or how we decide to weight data types (e.g. to avoid overwhelming a few flow observations by many head observations).  What I really like about this is that it suggests that we could generate more models for the ensemble by altering the objective function.  That has some fascinating implications for ‘shaping’ the ensemble based on which data we think are most valuable!

2/29/2016 – Jim Munter, Munter Consulting, University of Alaska, Anchorage

There are some examples in professional practice that are similar to DIRECT. We have a case in Alaska that is very similar to your example. A proposed mine with 1800 feet of expected drawdown. Multiple stakeholders were involved and we developed many competing models for decision support. We found that the key to this process was sensitivity analyses. I’m curious why you didn’t mention sensitivity analysis as a tool?

It may be a difference in terminology. But, I think that most sensitivity analyses are based on linear sensitivities. Things like PPR generally look to relate parameters (often individually) to predictions of interest. For problems that are linear in nature, this is clearly more efficient than MCMC approaches. But, even in these cases, I think that these (parameter) sensitivity analyses need to be coupled with investigations of boundary conditions, structures, and conceptualizations to explore prediction uncertainty. It sounds like you did this in the case that you are citing!

2/26/2016 – Deborah Tosline, BOR, Flagstaff, AZ

I am intrigued by your idea of proposing multiple models and using them to support decision making.  You mentioned that one way of doing this is to contract multiple consultants to do initial model conceptualizations – to ‘seed’ many starting points.  From a practical standpoint, this could be a contracting nightmare!

This is exactly why I wanted to reach beyond academic circles for my Darcy Lecture … real insights from the real world!  OK … scratch the idea of having multiple consultants vie for the full contract by providing proto-models.  Let me think about other approaches!

2/24/2016 – Paul Brooks, University of Utah, Salt Lake City

I can see how to implement the uncertainty in the physical modeling. But, many stakeholders have no idea what their utility function would be. Is the process stopped if you can’t define the utility functions?

I think that the RECT part requires that the uncertainties and the utility functions be quantified – especially if you are going to do risk-weighted decision making. But, I think that DI can move forward even with qualitative utility curves. But, perhaps more importantly, I think that the value of the utility curves may be in their construction. Just requiring each party to identify what is important to them and where they might be willing to compromise could be critical. If you can get them to link their priorities to hydrologic outcomes and to prioritize those outcomes, then you can design investigations to support decision making.

2/22/2016 – Paul Brooks, University of Utah, over a beer in Salt Lake City

I find that most things can be reduced to first principles, or evolution.

I can’t believe I hadn’t added a component of that in my talk. I read popular evolutionary biology (Gould, Dawkins, etc) for fun! Thinking about it, the suggestion to propagate multiple models for decision making has a direct analogy in evolution. It is essentially equivalent to the advantage that a species accrues by retaining genetic diversity. If the environment (physical:biology::data:models) changes, it is highly advantageous to have ready members (individual/models) pre-adapted toward the change. I will add that in my talk right away!

2/22/2016 – Vic Heilweil – USGS, Salt Lake City

The USGS has a goal of building groundwater models that are generally useful; the ultimate application of the model is not known when it is constructed. Can this be a justification for building more complete models, even if they are more complex?

The motivation is a really good one. The USGS has to build models that are as free of bias, even in the context of their purpose for being built, as possible so that there is an objective shared framework. But, I suspect that no matter how we construct our models, even if it is to be as general as possible, introduces biases into our modeling decisions. As a result, I still think that we would be better served if the USGS aimed to build a community ensemble of models rather than a community model. Then, as each specific application arose, the USGS could offer a collection of models that would, in the whole, be less subject to model-construction-bias.

2/22/2016 – Christine Rumsey – USGS, Salt Lake City

How can people run a collection of models? It is often hard enough for the modeler to get their own model to run!

This would definitely require a change of practice. But, if we think about UCODE or PEST or DREAM, they are all constructed in way that they call and run external models. Ideally, we would have a framework to run multiple models and extract the metadata needed to support decision making – something like DIRECT, but not necessarily exactly as described. This could have the added advantage that by requiring that models be able to run in this environment we could even harvest models that others build to add to the model ensemble.

2/19/2016 – George Roadcap – Illinois State Water Survey, Urbana-Champaign

We find that our models are most useful for communicating our ideas about the system to stakeholders. They provide a basis for discussions and for engaging them in constructing models.

I think that is fantastic, too. Using models to visualize and communicate what is happening in the modeler’s mind is critical to engaging stakeholders. I just think that we could do the same with visualizing and explaining uncertainty. I think that by making use of multiple models, each of which is shown graphically and, importantly, their differences are explained clearly, can help stakeholders to understand what is uncertain and why. Again, I think that could be a key step toward helping people to understand that uncertainty doesn’t JUST mean, ‘We don’t know!’

2/19/2016 – George Roadcap – Illinois State Water Survey, Urbana-Champaign

Sometimes you collect data that, at first, you can ignore, but with time they change your conception of the system. For example, over the last few years we have realized that abandoned wells are connecting our multiaquifer systems in many areas. How do you consider these kinds of data in DIRECT?

This is a great example of a general question that I have been asked before. Basically, what do we do about the ‘unknown unknowns’? I don’t think that we can do much of anything about them. But, I think that DIRECT can help in two ways. First, it can help us to address those things that we might otherwise look back on and think, ‘I should have seen that coming.’ That is, we can at least to a better job of dealing with our known unknowns. I also think that committing to a multimodel approach can make us more willing to change our models when new data force us to reconceptualize the problem.

2/19/2016 – George Roadcap – Illinois State Water Survey, Urbana-Champaign

You seem to be making a case for less complex models. But, we find that our stakeholders increasingly expect more from models. They are becoming sophisticated regarding what models can and cannot do, but they want to have predictions into the future that they can use for planning. Doesn’t this require the construction of more complicated models, not less?

I am sure that you are right … and it is a good thing that stakeholders are approaching models with more sophistication. But, I wonder if their sophistication extends to uncertainty. It seems to me that we, as hydrologists, are often unclear how best to define, quantify, and describe uncertainty. The purpose of DIRECT is to provide a more complete description of uncertainty – one that focuses on plausible and high-cost (or otherwise important) outcomes. I think that DIRECT ultimately relies on more sophisticated stakeholders.

2/18/2016 – Craig Bethke – GSB, Urbana-Champaign

If I were to summarize the part of your talk that addresses measurement selection, I would say that it is the difference between the value of collecting data and the value of collecting the right data.

Very nicely said! I wish I had thought of that when someone at Stanford asked me how DIRECT was different than Value of Information analyses!

2/18/2016 – Jenny Druhan – UIUC, Urbana-Champaign

I like the idea of DIRECT helping us to avoid the known unknowns. But, I wonder if it could also be structured in a way that it could help us to identify unknown unknowns?

If you can see a way to do it, that would be fantastic! I think that I am too buried in the simple application of looking for model prediction differences to uncover areas of uncertainty to see a path to this. But, maybe there is something that could be done with looking for sentinel data that all of the models AGREE on. These might be the most likely data to uncover important overlooked aspects of the system. Let me know if something comes to mind for this!

2/18/2016 – Rob Sanford (aka Dr. Microbe) – UIUC, Urbana-Champaign

As a microbiologist, I saw your data selection process as something like a phylogenetic tree. You suggest looking for data that identify branch points between different paths that lead to different outcomes. I also like the idea of the multimodel approach because it pushes back against the tendency for dogmatic ideas to stop us from considering new explanations. Scientific reputations are built on things that you come up with and then defend.

I never, in a thousand years, would have thought of the decision process that way; but, it is a fantastic way to view it. I think it is actually a bit different than what I had in mind. I think it is actually a bit better than what I had in mind. I also really like the idea that multimodel approaches can fight against dogma. I agree entirely – I think that the intentional construction of multiple models should make us more open to alternative explanations – even to the point of listening to other people!

2/18/2016 – Parveen Kumar – UIUC, Urbana-Champaign

Don’t we face a danger in presenting the idea that all of our models are wrong? Ultimately, from a scientific perspective, our goal is to develop the right model of a system. It may have necessary simplifications; but, that is different than being ‘wrong’. I am concerned that this mantra of ‘all models are wrong’ will discourage our students from setting a goal of getting to the right model!

You are not alone in pushing back on this idea; but, you stated the point more eloquently! I think that you are right – perhaps we should make the point that we strive to develop correct and complete models; but we often have to make decisions before they are ready. I also think that there is a danger in using the ‘all models are wrong’ idea in a public setting because it can communicate a complete lack of understanding on our part. This can decrease trust in using science to help guide decision making. I think that I will change my poem to read:

“Our data are sparse.

Our models are incomplete.

But, we must decide.”

2/17/16 – Kevin Brewer, Olivet Nazarene University, Bourbonnais, IL

(Commenting on why he was willing to pay Pascal’s price for the dice wager ….)  Remember, I AM from Nevada!

2/17/2016 – Joe Makarewicz, Olivet Nazarene University

Your approach seems ideally suited to data-limited systems.  I work in the realm of Big Data.  Often, our problem is that we have too much data!  Could your approach also be used for these problems?

That is an excellent question … I’ve never really considered it!  My impression is that the problem with Big Data can be that you are swamped with observations that have little or no information about the specific question at hand.  It seems to me that something like DIRECT, which poses multiple competing models, could also be used to focus on data that are likely to be more discriminatory, thereby reducing the noise from non-discriminatory data.  If you think more about this and have some insights I’d love to hear about them!

2/17/2016 – Jamie Kraus, Olivet Nazarene University

Do you think that, in the future, we will rely more on model-based decision making, or will rule-of-thumb approaches always be popular?

Yes!  By that, I mean that I think that the move to embedded technology – I know that is a focus here at Olivet – will naturally move us to more model-based decision making.  In fact, in many cases the decisions will be made and acted upon before we even realize that there is a question!  But, also yes – I think that we will always rely, to some extent, on gut responses or rules of thumb.  In some ways, that is a good thing – not all questions really warrant a model-based analysis!  (You don’t want to run a MATLAB code to decide what to wear in the morning.  OK, I do, but that is another story.)  I also think that Kahneman’s ideas of Fast Thinking suggest that it is human nature to rely on instinctual solution rather than well-considered analyses.

2/16/2016 – Randy Hunt – USGS, Wisconsin

We often find that the most discriminatory data are those that are most like the prediction of interest. For example, if you are interested in flux to a river, measure flux, no heads.

That makes a lot of sense to me! I guess that the question is what to do when the property of interest is not readily measurable. Recharge is one example. Or, nonpoint source pollutant loading. Having said that, I am sure that you are right that we will often find that the best observation is obvious in hind sight. In a sense, you can think of DIRECT as an expert system to augment or train your intuition!

2/12/16 – Water Manager, San Andreas, CA

I like the idea of the state DWR providing us with a range of model predictions with likelihoods.  We are always asking for local control.  In fact, I once visited the DWR to explain that their predictions that we use for water allocations have been wrong every year for ten years in a row – and always too low.  They said that they had the authority to give us the predicted values, but not to change how the predictions are made.

I hadn’t thought of this aspect of multi-model approaches; but, it makes a lot of sense to me.  In many cases, propagating multiple models all the way to the local level could  allow for more effective feedback from those who are observing the system most closely.  By avoiding an overemphasis on one model, it may also be institutionally easier to change the likelihoods assigned to many models rather than completely changing the one model that has been accepted for some time.

2/11/16 – John Kramer, Calaveras County, CA

Is it accurate to say that your approach is aimed at planning for the worst-case scenario?

Not exactly.  We do want to consider the worst case, but only if it is supported by at least one model that does an acceptable job of fitting the data that we have.  I would just change your statement slightly to be – DIRECT aims to weight models based on both their goodness of fit to the data and the importance of their predictions to each interested party.  Ultimately, the idea is to develop a framework that intentionally investigates all plausible models while avoiding putting too much effort into identify one (or even a few) best model(s).

2/11/2016 – Audience Member, Columbia College, Sonora, CA

I worked for a government agency for 20 years.  They used a very simple model – even though scientific advances had shown that the model was wrong.  It didn’t include important processes.  I think that this kind of institutional inertia will be the greatest hurdle to implementing the ideas that you presented.

I think that you are completely correct.  In an interesting sense, I think that this is similar to the situation that consultants and resource companies find themselves in when they have built one, very complicated, very expensive model.  They can’t afford to change it, so they are stuck with the one (wrong) model!  One of my goals for this year is to ask a broad range of people how it might be possible to encourage the use of DIRECT and similar approaches in the practice of hydrology.

2/8/16 – Dirk Kassenaar, Earthfx – Toronto, Ontario

I liked the ideas in your talk until the end when you focused heavily on parameter estimation.  We have found that there can be an overreliance on automated parameter estimation tools; without sufficient insight and constraint, they can end up with physically unrealistic parameter distributions.  As a result, they rarely provide real insight into a problem.  (Graham Fogg was part of the conversation, too.  He emphasized the importance of 3D structure, especially of high permeability units, echoing Dirk’s concerns.)  I prefer to simplify the model, include as many processes as possible, and aim to constrain using directly measured data wherever possible.  The models may not calibrated as well; but, they have much more meaning.

This is a great point – I couldn’t agree more!  The structure of DIRECT is intentionally very loose to consider all sources of uncertainty.  I generally emphasize using the large number of parameter sets generated through automated calibration because these results are generally already available and could be used ‘for free’.  I still think that automated calibration tools are necessary for current models – even relatively simple models.  We have too many parameters to consider to calibrate manually with any efficiency or effectiveness.  But, your larger point is critical.  If we really want to explore the range of currently plausible predictions our model ensemble has to address the more fundamental sources of uncertainty – processes, drivers, and structures.  I’m going to add discussion at the end of the talk to reflect these ideas.  Thanks for a great discussion!

2/8/2016 – Graham Fogg, University of California, Davis

The example that you show is for solute transport rather than groundwater flow, which is typically much easier to model in a predictive sense than is transport. Can you comment on the differing challenges/uncertainties between transport and flow in the context of your approaches.  [Aside: challenges in transport modeling include the governing equation not representing the physics and/or chemistry very well, the important role of locally difficult-to-resolve heterogeneity. Conversely with flow, (1) all agree on one, basic governing equation that represents the physics of saturated flow more or less perfectly, and (2) the parameters upscale rather nicely and in ways that can be tested and verified against field data that are commonly appropriate to the scale of the problem.]

This is an excellent question!  For starters, I show an example of transport only because it is simpler to conceptualize quickly and for a wide audience.  But, the general point is well taken.  Still, I would say that flow problems still have plenty of uncertainty to go around.  One common source of uncertainty is recharge – especially distributed recharge.  This is often calculated as a residual of other processes; so, it is quite subject to uncertainty as a function of assumed drivers and processes.  Similarly, flow problems with local stresses (pumping wells, recharge facilities) can have pretty significant sensitivities to parameters and geologic structures.  Finally, while flow itself may be less sensitive than transport, I would guess that much of the uncertainty driving transport questions is due to uncertainties in flow.  In fact, the example that I show, while it is a transport problem, is really dependent on flow paths that are dominated by (uncertainties in) geologic structure.

2/1/16 – Kyle Blasch – USGS, Boise, ID

One thing that interested me about your presentation was the possibility of using DIRECT to help us to avoid getting to a point that we cannot afford to change a model. Even if a good-faith actor, like the USGS, builds a model with substantial community input and buy-in, we still often end up with one model. This model then becomes the basis for important decisions such as permitting, water rights apportionment, construction of recharge projects, etc. Even if we communicate that the model should be updated to reflect future scientific advances or new data, the stakeholders have all become too invested in the model and the subsequent decisions to change it. As expected, model updates are only supported by stakeholders if it changes the outcome to their benefit. The application of DIRECT, and more importantly, how you communicate the concept of multiple underlying models that are all right or all wrong, could steer the discussion away from one single model and enforce the idea that the models should continue to be updated with new information.

I hadn’t thought about this, but it is a good idea.  Essentially, even if the process is open and people work together to form one best model, they still face many of the limitations of working with a single model.  They are more likely to ignore expensive outcomes that are plausible, but not highly likely based on the current data.  Even community model building can lead to this limitation!

2/1/16 – Jim McNamara – Boise State University

I once did a study of landslide risk for a government agency.  I presented the results as red, yellow, and green zones to show areas in which slides were highly likely, possible, and improbable.  They came back to me and said that they didn’t want yellow zones; they wanted a boundary.  Furthermore, they wanted me to define it.  Do you think you will see similar resistance to the ideas that you are proposing?

Most likely, yes!  I think that many of us recognize the value of calculating and reporting uncertainty.  But, it is a huge challenge to work with decision makers to figure out how we should provide the uncertainty and how they can use it to make better decisions.  In the case that you mentioned, I would be tempted to point out that you could choose a line anywhere within the yellow zone.  In fact, depending upon who is asked, and their risk tolerance, there would be many different correct places to put the line!  The challenge is figuring out how we can communicate the idea that there is no single right answer without sending the message that we don’t know what we’re doing.  I hope to hear from many different people with ideas about how (if?) this can be done.

1/29/16 – Regulator, Calgary AB

Ultimately, decisions are trade-offs and uncertainties lead to inefficiencies at best and bad decisions at worst.  Regulators may be a good point of entry for introducing the value of considering uncertainty in hydrologic model-based decision making.  Ultimately, they do the analyses that inform decision-makers and policy makers.  Current practice is still somewhat intuitive, even ad hoc.  But, regulators need to prepare for new world of more sophisticated, risk- and science-based decision making.  In some ways, it can be helpful for regulators to have consultants to push them to consider more sophisticated analyses; in fact, this may be necessary to drive innovation in decision making.

This is definitely encouraging.  So far, it seems like there is general recognition that a more open and thoughtful consideration of uncertainty would help everyone, from potentially responsible parties, to consultants, to regulators.  It just seems to be difficult to figure out where change should start!

1/29/16 – Gord MacMillan, Calgary, AB

I like the idea of changing our model objective to forming multiple, competing models.  It seems that these models need to be both plausible and significantly different than one another.  I wonder if it would be possible to develop a global sensitivity analyses that would find multiple different minima as focal points for different parameter sets rather than focusing only on finding the global minimum?

That seems like a great idea to me!  This comment really helped me to realize that some of our limitations in using models may arise from the underlying approach to model calibration.  That is, we generally search for a global minimum and then assess uncertainty about that model.  The exception is some of the multimodel approaches like those that Jasper Vrugt has developed.  I wonder if there is a way to modify our calibration tools to achieve this?

1/29/16 – Gord MacMillan, Calgary, AB

I think that it’s true that, as consultants, we would generally like to have the budget and time to include uncertainty analysis.  At least, we would like to do this where it is appropriate.  In fact, there is one regulator in Calgary who has advanced the quality of hydrologic modeling single-handedly by requiring appropriate inclusion of uncertainty analyses.  The danger is that uncertainty could be required in projects that don’t warrant the effort.

I have heard consistently that hydrogeologists would like to have the scope to do more complete analysis.  It isn’t surprising; after all, we are all scientists who are driven, at least in part, by curiosity.  I can also see that it might be difficult to determine when uncertainty is (not) warranted.  Perhaps this is an advantage of propagating uncertainty to the point of decision.  If you can demonstrate that the current uncertainty regarding the physical system doesn’t impact the decision, then there is no need for further uncertainty analysis.

1/29/16 – Andrew Hinnell, Calgary, AB

The system is a bit different in Canada compared to the US.  In Alberta, the system is not as confrontational – there isn’t as much reliance on dueling models.  In fact, the developer is required to pay for consultants to review reports for the opposition.  Does this change the way that DIRECT would be applied?

I think that DIRECT (or something similar) has a role even in the complete absence of conflict.  Basically, as I see it, the goal of DIRECT is to ensure that we think broadly in making decisions, rather than making binding decisions too early in the modeling process.  It is possible, however, that this would help to change the view of the Canadian system, too.  Firstly, the idea would be that the developer would be required to construct multiple models in a transparent way.  Then, the role of the opposition could be to suggest additional models to consider as part of the analysis.  In addition, the opponent would be able to identify outcomes of concern so that these outcomes could be considered when selecting additional data for collection.  Thinking about it, I think that these advantages may be at least as valuable as reducing confrontation on model selection.  Thanks!

1/27/2016 – Carl Mendoza, University of Alberta, Edmonton

Your talk is aimed at a broad audience, but it deals with fairly complicated ideas and approaches.  This relates to a question that I have had for some time.  What do our students need to learn to be prepared to contribute as practicing hydrologists.  In particular, has the need for PhD students changed over the past 25 years?  Where we once saw PhD students as potential faculty members, only – in fact a PhD could limit your future in consulting – have we gotten to the point that most of our PhD’s are bound for industry or consulting or higher level regulatory positions?

I agree.  I have been finding a lot of resonance of the ideas with regulators, consultants, and industry people.  It seems like everyone recognizes that our field has moved to a new level of complexity that requires more advanced analyses.  This requires the educational elements of a PhD – in particular, the ability to tackle as-yet unsolved problems from conception through completion.  I will be interested to hear ideas from across our field regarding how we should focus our education of PhD’s for this changing landscape!

1/26/16 – Jozsef Toth (yes THE Jozsef Toth!), University of Alberta, Edmonton

(In response to the question, ‘Do you think that numerical models are useful for explaining our scientific findings?)”  Models are very useful; but, it is hard to trust that they are constructed well.  To do this requires broad insight into hydrologic systems.  You need to have the undisciplined creativity of a geologist and the rigorous thinking of an engineer. This is a rare set of skills, but it is the foundation of modern hydrogeology.

I have nothing to add to this – what a great description of a hydrologist!

1/22/2016 – John Hoffmann, Northern Arizona University, Flagstaff

You assign model likelihoods, based on goodness of fit to current data, to future predictions.  Is there some way to consider whether the goodness of fit is dominated by fits to historic data rather than more recent data?  Would the latter suggest better predictions for the same goodness of fit to data?

I have never thought about this, but it seems like a good idea!  I know that likelihoods and the objective functions that they are based upon are an area of active research.  But, I haven’t heard of someone taking this approach.  Does anyone else have an idea of how this could be done?  It sounds like it might be worth investigating!

1/22/2016 – Erin Young, Water Resources Manager for the City of Flagstaff, Northern Arizona University

As a project manager of groundwater modeling projects looking at longer term (100 year) impacts from pumping groundwater, that involve financial, environmental, hydrological concerns from many stakeholders, I see tremendous value in this approach. The benefit of applying DIRECT is that from the beginning the process skirts around the issue where stakeholders may be knowingly or unknowingly protecting their respective interests in conducting their own modeling studies. Good for the modelers, but expensive when it is all said and done, and then it does become a case of ‘dueling models’ that we know are equally correct (because we all know there are probably 100s of modelers working with 50+ firms in AZ that would do an excellent job). What I like about this approach is you’re bringing everyone to the table to whittle down the situations or scenarios to only those that matter to the outcome, and then spending money on monitoring where it matters. To me, this puts the model to real use, rather than just sitting on the shelf (with hundreds or thousands of useful uncertainty runs contained within!) as a stake in the ground that says “well here is what OUR model shows!” I like that the DIRECT approach looks at the uncertainty in one or more models, teases out what impacts are acceptable or not acceptable to stakeholders, in order to derive the mix of model parameters that give you the acceptable or not acceptable outcomes. At this point it’s more obvious where to collect measurements to prove or disprove a model run. I am involved in regional projects that may be subject to the ‘dueling models’ as other players reveal their respective models. I like that the DIRECT approach cuts to the chase.

Fantastic!  It is one of my main goals to hear from people who are actually DOING hydrogeology and making related decisions.  I hope we have a chance to test some of the ideas I presented and fine tune them on a real world questions.

1/21/2016 – Leslie Katz, Montgomery & Associates, TIAA Meeting

You discuss how direct can be used by different interested parties to make decisions based on common scientific information.  Could the approach also be used to consider multiple objectives, or pain points, simultaneously?

We haven’t considered that yet, but I think you are completely correct.  You could either consider what data are discriminatory for each outcome of interest, or you could do a Pareto analysis to consider trade-offs between multiple outcomes of interest.

1/21/2016 – Ailiang Gu, TIAA Meeting, Tucson International Airport

The approaches that you are describing require that we develop multiple models, or at least multiple parameter sets.  This might be practical for models that can be calibrated with PEST, but some of our models are too complex for automated calibration.  Can you apply DIRECT to these problems?

The short answer is probably no!  At least, I can’t see how it could be done.  But, the more complete answer would be that this may be an indication that the model is too complex.  I am really trying to make a strong case for the idea that relying on a single model can be very dangerous for decision support.  Generally, I would say that it may be better to simplify the model, even if it captures fewer details of the system, to allow for some exploration of uncertainty.  Kris Kuhlman at Sandia had an interesting insight on this – they use two types of models: process models and performance assessment models.  The first are very (extremely, in some cases) complicated.  The latter are used to explore uncertainty.  They have a formal process to determine which processes can be ignored (or considered fixed) for a given application.

1/21/2016 – Indira Balkissoon, TechLaw, Inc., TIAA Meeting, Tucson International Airport

How do you communicate the modeling to stakeholders?  Do you try to present details about uncertainties and parameters? If so, how do you do this?

In my opinion, the most important interaction with stakeholders is to collaboratively develop the utility functions.  This requires identifying possible hydrologic outcomes and discussing, in broad terms, how models make these predictions (and why they are uncertain).  I also think that it is important to encourage stakeholders to make sure that the science is addressing their question(s).  But, I don’t think that there is much benefit in presenting modeling details.

1/20/16 – Ricardo Gonzalez-Pinzon, UNM, Albuquerque, NM

Keith Beven and Markus Weiler recently published an article proposing that we should adopt common models across our community.  Given your emphasis on multiple models, I wonder if you would agree with their proposition.

First, I should say that I was a bit flip in answering this question in the presentation.  Upon further reflection, I would say that we should form a community ENSEMBLE of models.  I am hesitant to suggest that we adopt any single model (code, conceptualization, structure, boundary conditions, or parameter values) because our models, by their very nature, are all flawed.  On the other hand, as Kevin Parks later pointed out, the availability of a community model provides a very useful reduced barrier to participation by groups that may not have the funding necessary to construct a model from scratch.  Taking both of these points together, I think that we should strive to form a broad set of models, ideally with the ability to run them or at least to harvest previously run results, that would be available for all participants in water resources decisions.

1/20/16 – Sharon Desilets, HydroInnova – UNM

You mentioned that when choosing multiple measurements you need to be careful to avoid redundant information.  How can you ensure that the models in your ensemble are not redundant?

I probably need Jasper’s help to answer this … because I really don’t know the answer!  But, this is a great point.  We can certainly make sure that each parameter set / boundary condition / conceptual model combination is different.  But, that doesn’t ensure that they are sampling the possible outcomes well and efficiently.  Ultimately, as I understand it, tools like DREAM are designed to sample parameter space such that the frequency of models throughout parameter space represents the likelihood distribution of parameter values.  But, it is a great question – how do we do this when we mix sources of uncertainty.  Stay tuned or feel free to chime in on this anyone!

1/20/16 – Dagmar Llewellyn, University of New Mexico

You present three concepts: cost, utility, and preference.  Is it right to think of utility as the opposite of cost?  Is utility a better measure than cost or is it complementary?

I was not clear in my usage … I’m glad you caught this.  In fact, I am going to remove cost from the talk in future.  Utility combines cost (or benefit) with each person’s valuation of that loss or gain.  As such it is more subjective and prone to biases and attitudes about risk aversion.  This makes it difficult to quantify, but ultimately more useful.  Preference is my term for a qualitative version of utility.  (There is probably a more correct term out there … let me know if you know it, anyone!)  I imagine this to be a relatively fast and easy way to present a small number of possible hydrologic outcomes and have someone describe them as: terrible, bad,  OK, good, fantastic.  At least it would help us to identify general, relative priorities for each interested party.

1/20/16 – Audience Member, UNM

Parameter value uncertainties can be represented by continuously varying parameter values.  But, conceptual models are discrete.  Does this limit our ability to capture uncertainty?

Yes!  It would be fantastic if we had DREAM for conceptual model generation, too!  The closest that I can think of is Graham Fogg’s work with TPROGS, which lets us generate many structural models that honor the data that we have.

1/20/16 – Geoff Freeze, Sandia National Laboratories, ABQ, NM

Your analysis is based on the expectation that added data will narrow the model ensemble.  But, what if new data don’t agree with any if the models in the ensemble?

This is an excellent point that I will emphasize in the talk going forward.  DIRECT (in fact all of our approaches for selecting data to collect) have no way to target  ‘unknown unknowns’ or Bredehoeft’s ‘surprises’.  The objective of DIRECT is to make sure that we do the best that we can with the ‘known unknowns’ – those that are included in the range of models that we have considered.  I think that the DIRECT approach encourages us to think more broadly about plausible models, which may help to deal with surprises.  But, it is critical to understand that the uncertainty that we describe, even with DIRECT, is incomplete.  Our PDF of predicted outcomes is by no means the same as the true probability that any outcome will occur!

1/19/16 – David Jordan, INTERA, GW professionals meeting, ABQ, NM

We (consultants) would often be interested in spending more time on uncertainty analysis.  Usually, the problem is budget limitation.  The best jobs are for clients who have a longer view and can therefore be more strategic in thinking about how to use uncertainty.  While there a formal methods for uncertainty analysis and optimization, these may not always be the right fit for all clients. For example, while there are formal methods for selecting optimal monitoring locations for contaminated sites, it is more common to use professional judgement.  Thus many of those decisions are made based on a mix of experience and regulatory requirements.  Of course, the budget also constrains our choices.

First, it is great to hear from professionals that they would prefer to have the opportunity to include more uncertainty analysis.  It suggests to me that effort should be made on the regulatory end to require it.  Excellent points about the limits of formal optimization methods and subsequent reliance on professional judgement.  One of the objectives of DIRECT is to provide a ‘loose’ framework that can be implemented without too much additional effort.  But, there will always be a need for professional judgement – if nothing else to form valid model ensembles to capture system uncertainty.

1/15/16 – Kathy Jacobs, Univ of AZ, by email after the talk

The approaches described are interesting, but it is hard to see how a real decision maker would use them.

This is an important point.  I am presenting a framework that technical people, physical scientists and engineers, can follow to present findings in a way that they provide the information needed to support decision making.  But, it is important to remember that it will require a dialogue with decision makers to figure out what is and isn’t of use to them.  It would be great to work with people who are active in researching and implementing decision making from the ‘non technical’ side to improve the framework as suggested!

1/12/16 – Dale Rucker, HydroGeoPhysics, in the question period after the talk

Is it realistic to think that any framework can promote collaboration between groups that are on opposite sides of a water resources dispute?

– This is in response to a part of the talk that suggests that the framework for hydrologic analysis that I describe could allow for greater transparency and inclusion, rather than resorting to ‘dueling models’ approaches.  Dale makes a very good point, and I may have been a bit naive in my presentation.  There are definitely cases that two sides will simply not find common ground.  But, I’d be interested in hearing ideas or examples of parties that may (or, better yet, did) benefit from efforts to find common ground to work collaboratively to develop models for addressing water resources problems.

12/16/15 – S. Mohan, Indian Institute of Technology.

Their group does many modeling projects dealing with topics ranging from mining, to seawater intrusion, to environmental impact.  They are required to report model and data uncertainties.  The Ministries for whom they work look for a threshold uncertainty of 75%. If  the uncertainty exceeds this level, they won’t use the model. But, these uncertainties are based on parameter uncertainty, not prediction uncertainty.  To account for uncertainty, they typically use a 2.5 factor of safety for designs based on groundwater models.  For very high profile applications – e.g. siting nuclear facilities – the ministry will seek expert guidance to design monitoring and characterization. Usually, that focuses on filling in spatially undersampled areas.

 

=========================================================================

Blogging about the talk itself … I thought it might be interesting to document what it is like to give a talk more than 100 times!

=========================================================================

12/7/16 -Las Vegas – The last talk was at the NGWA Groundwater Week conference.  I wasn’t sure how this would go – this is a VERY different conference than most academic conferences.  I was a bit concerned that a crowd that was most interested in drill bits and casing would find the talk far too academic.  But, it was a nice, mixed crowd.  I just went for it and let it run long (I forgot to start my timer).  Only a few people left early!  Nice comments afterwards.  Also, one guy who said, “I drill wells all over the place.  People ask how deep I’ll find water.  I tell them that I really don’t know.  My feeling is that after you do all of your modeling, you’ll be in the same situation.”  Sadly true.  But, one could also ask – if you aren’t going to try to build models, then why collect the data?  I also had a great conversation with a concerned citizen from northern Arizona.  His concerns were perfectly in line with the talk.  Quite satisfying.  So, all in all, it was a great way to finish a fantastic year!

12/5/16 – San Diego – I wondered what it would be like to start the final week.  Had I been off too long?  Had I lost the energy to pull it off?  No and no.  It felt really good.  This was an intimate crowd of 20 people.  Not much laughter, but I got plenty of head nods along the way.  The questions were really go

11/22/16 – Munich – A large and diverse crowd.  I managed the Darcy stories and still came in under an hour.  I guess that I have the timing down on talk number 121.  I also felt that I could add some nuance to some of the points.  It has gotten to be a very interesting balance … parts of the talk are on autopilot while others still require some active thinking about how to present the material.  I don’t think that I would have expected that it would remain this fluid after so many talks.

11/18/16 – Graz – fantastic crowd.  More specifically, a wonderfully diverse crowd – climate scientists, economists, hydrologists, geologists …. based on the lack of laughing, there may have been some language delay.  I do talk fast.  Then again, I’m not very funny.  Great questions and comments afterwards, though.  The talk definitely seemed to resonate.

11/16/16 – Ljubljana – the crowd seemed quite engaged.  Interesting discussions afterwards.  I do, sometimes, wish that people would be less polite.  I feel like the talk misses a fair fraction nof the audience.  But, it is hard to figure out why.  I think that I have learned two things about being a good audience member. First, show clear attention. Second, ask challenging questions.

11/14/16 – Milan – it felt like the talk went well, although the humor didn’t land.  It felt like there was a slight language processing delay.  But, it could be that I just spoke too fast or I just wasn’t funny!  Still, the conversations afterwards indicated that the concept of importance weighting and ‘a backwards view’, as Annali put it in Stuttgart, were well received.  Lots of new ideas!

11/11/16 – Karlsruhe – I was asked to do the full hour version, so I included the Darcy story.  I had almost forgotten it in Stuttgart, but it came back well today.  The talk seemed to go well. Even the jokes seemed to land.  In particular, when I said that by combining polls, Nate Silver hadn’t missed an election since … Tuesday.  I never have figured out how to describe the talk.  I think it lies between theory and practice.  But, it often ‘lands’ as quite philosophical.  Still, I’m not sure that it would be a good thing to introduce it as such.  Regardless, the talk is now largely ‘in the can’ … not too many more chances for changes!

11/10/16 – Stuttgart University – this one felt really good.  I was advised to go for the long version – it ended up at 1 minute longer than 1 hour.  I knew beforehand that it would be filmed.  So, I tried to keep in mind that this may end up as the final published version.  I also managed to work in some ideas about model averaging that came up in conversation with Wolfgang’s group yesterday.

11/8/16 – University of Tubingen – good talk.  Without the Darcy story, I came in just on time.  I think that it was well received and it seemed like the right mix of practiced and spontaneous.

11/4/16 – EAWAG, Zurich – This felt really good.  There had been a talk earlier in the day about the role of scientific advisors.  I really liked it, and invited the speakers.  They came and had a nice question and comment.  Very interesting convergence of ideas.  Afterwards, a fantastic chat about the role of quantitative uncertainty assessment.  I think that there is a lot to be discussed, especially regarding what can be learned versus what can be used and the balance of time and effort spent on detailed uncertainty assessment given known structural uncertainties.  I cut out the Darcy story and finished in 46 minutes – without feeling rushed.  It will be interesting to look back to remember what I have cut along the way.  I need to do this before the final versions in the US.  I am also leaning towards making a final voice-over version rather than a filmed one.

11/3/16 – University of Lausanne – I think that I was  a bit too intimidated before this talk.  I had spoken with students who are doing very high level investigations of model construction and uncertainty analysis.  So, I assumed that the whole audience would find the talk a bit too simple.  Comments afterwards reminded me that there is always a range of backgrounds and that it is important to deliver a talk that offers something to 2/3 of the audience, not that is aimed at the top 10%!  Still, it was a good experience and the audience seemed to follow along and get something from it.

10/28/16 – Kyoto – My largest crowd in Japan.  Many people seemed very engaged.  Some were probably still having a hard time keeping up with my English … fast, even though I was making a point of slowing down and enunciating.  The questions were spot on, though.  This leads me to believe that the audience was really following more closely than I may have perceived from the stage.  I actually wish that this one had been taped … I think that I really put some ideas together clearly.  It will actually be strange to go back to the ‘full’ version in Europe!

10/26/16 – Mie – I think that I managed to slow down my delivery so that it was largely understandable while staying under 50 minutes.  I just had to shave details from the formation of the prediction PDF.  In general, I think that this one went well … probably the clearest version delivered in Japan.

10/24/16 – Tokyo – A small, but interested group.  I slowed things down to try to get some of the points across.  But, I think it was still too long and too fast.  I am considering a total revamp for the remaining Japanese stops.

10/21/16 – Nagasaki – Japanese Association of Hydrogeology – interesting experience.  I only had 40 minutes – and they use a very effective bell system to announce 5 minutes left!  I managed to finish within 38 minutes.  It required stripping the NGWA front end slides and the Darcy story and skimping of the description of prediction PDFs and data selection.  I think I was still too fast to be understood by many people.  The humor did not seem to land.  But, someone asked me a question that was posed as an homage to a quote from Shakespeare!!

10/14/16 -Miami, FL – Florida International University – some very good audience connections during the talk … amazing how much difference that makes.  The audience included a number of seminar-for-credit students.  But, I think that I managed to engage most of them.  It felt like I went slow, but it was about 53 minutes.  (It looks like I will end this series with a consistent obsession with the time of the talk!)

10/12/16 – Tampa, FL – South Florida University – talk felt a bit flat.  A 5:00 pm slot is tough – sleepy audience – but, nice responses afterwards.

10/10/16 – Gainesville, FL – University of Florida- this really seemed to land well.  Good laughter and reactions throughout the crowd.  Even lots of eye contact and nods!  Great questions afterwards.  I asked if it would be OK to go long, got the thumbs up, so I included the Darcy story.  In the end it was only 51 minutes!  Seems odd … I thought that I had added quite a bit of extra material.  It is a bit of a shame to have dropped the multi-stakeholder piece.  But, time is of the essence!  I had a few times that I felt a little disconnected … like the talk was running itself … but, I don’t think that it was noticeable.

10/6/16 – Quebec – Laval University – Fantastic.  We had just shy of 30 in the room and 40 on line!  The talk went long – over an hour.  But, people stayed and asked questions – even on line.  Really nice feedback and interesting questions.  At this point, the struggle is between making the talk very smooth – with lots of transitions – or pretty fast.  I really prefer the latter.  But, it isn’t always possible.  The only technical glitch was a multi-forward, sticky advancer.  A bit of a bummer, but didn’t seem to detract too much from the flow.

10/4/16 – Ottawa – Carleton University – Great audience – 40 people, broad range.  Not uproarious laughter – but, all seemed to be engaged and following.  We started with AV problems, so I gave the Darcy story before the talk.  Only one question, but plenty of interest afterwards. Followed up with the short course – really fun as always!

10/3/16 – Kingston, ON – I really enjoyed this talk.  There was a nice mix of undergrads and grads and profs and non-university folks.  People seemed to stay with me – not too much laughter, but it was a relatively small group for that.  It was especially nice to have a number of students who had taken the short course in the audience.  I was a bit long again – 53 minutes – but, I felt like I was adding conversational elements and I did include the Darcy story.  In fact, my PPT preview function didn’t work … but, I only missed the transition to a couple of slides.

9/30/16 – Waterloo, ON – get this … I was NERVOUS!  Nothing like being back in a place that you were a student.  Very nice introduction from Dave Rudolph.  I asked for permission to go long … included the Darcy story and touched on multiple stakeholder issues … only 47 minutes.  Nice.  The jokes didn’t seem to hit – except for the ‘out of money’, that one is pretty reliable. The only real pity was that I missed out on Tim Horton’s donuts – then again, probably for the best!

9/28/16 – Western University, London, ON – very good audience – even mix of students in a modeling class and grad students, including some in a group that studies resiliency.  It went just over 45 minutes – no time for questions (talk was limited to a ‘class hour’) but great discussion afterwards with Clare’s group.

9/26/16 – Geological Society of America, Denver – I was a bit worried because my talk was schedule to start at 4 and the free beer started pouring at 4:30 … in a crowd of geologists!  But, this was the best … talk … ever.  Things finally clicked.  Jokes landed.  I put off the Darcy story on account of time. In the end, I finished in UNDER 45 minutes!  Questions were really good and Peter R and Carl M both said that the revised talk was much better than their previous viewings.  I guess it was ‘lucky 100’!

9/21/16 – Foulum Research Station – a mixed crowd of soil scientists and PhD students from broader backgrounds.  I think that I need to find a better way to prepare people for what the talk is going to be.  It isn’t enough to say it is a lecture.  I think that could help to avoid ‘disappointment’ that it is more of a general science talk than a specific research talk.

9/20/16 – Aarhus University – I gave a mini talk before the introduction to tell the Darcy story.  With that, I was actually able to finish in 50 minutes!  It went well, but it seemed perhaps a bit too practiced.  There is still some disjoint at the point of talking about multiple stakeholders, but I am relunctant to completely remove the utility curve aspect.

9/16/16 – Texas Water Development Board – I cut out the Darcy story – it gave me time to fit everything into 52 minutes – I could also weave the theme of two types of uncertainty and how they relate to the division of effort to do parameter estimation and structural exploration.  In general, the talk felt like it went really well.  I had someone say that it was a master class in giving a technical talk.  But, more and more I only see the discontent in the faces of the audience.  Maybe my expectations have gotten too high.  I was glad that I re-added the three concerns about model averaging.  I got several comments about that being a new idea to people.  When I met with UTA grad students on FR afternoon several explained some of the ideas that I had presented in their own words – also a good sign!

9/15/16 – UT Austin – the talk was meant to be 45 minutes long and aimed at a broad audience.  This is doubly challenging.  But, I cut out much of the technical details and came in at 46 minutes.  Later discussions suggest that the ideas had broad resonance.  But, I always end up feeling like I didn’t give enough details to satisfy the hydrophiles in the audience.

9/14/16 – Texas A&M – For me, I nailed it. I went back to the previous version and brought forward a few new elements.  It felt like a good mix of content and entertainment.  It was also about 51 minutes long (my timer stopped!). I think I can work with this to get down to a solid 45 minutes!

9/13/16 – Houston Geological Society (Environmental and Engineering) – The talk was just OK.  I had revamped the talk to make it shorter.  Didn’t really work – still 56 minutes.  Some new elements were good – especially the shorter treatment of the dice.  But, the flow was off and I missed the critique of model averaging approaches.  I’ll revise again for TAMU!

9/13/16 – Houston, Energy Institute High School – what a fantastic experience!  I spoke with some of the staff at the school and learned that this is a magnet school, with no entry requirements, aimed at increasing diversity in STEM programs, and completely based on project work … from grade 9-12. So, I decided to do a version of the full talk at the last minute instead of a more informal discussion that I have done at other schools. It was probably a bit too detailed, but many of them stuck with me and asked great questions about multimodel analyses.  The only thing that I wish is that I hadn’t cut out the image of the old dice – that made it hard to transition.  But, I was really impressed with the students – both in the large group (~200) and small discussions (~15).  Their questions were thoughtful and insightful beyond their years!

9/12/16 – Houston, Exxon Mobil – great meeting with about 20 reservoir engineers.  I really shortened the talk for an expert audience.  I didn’t record it or check the time … but, I would guess that I ran 45 minutes (with questions).  This was followed by a solid hour of questions and discussion.  (I was glad that I kept some reference to storylines and dice, though.  The former seemed to resonate with several people.) Not surprisingly, they are working on many of the multimodel and risk-based methods that I discussed.  They haven’t progressed too much further – in part because their use of risk assessment is considerably different than ours.  In particular, the main driving consideration is outcomes that are not ‘recoverable’.  This puts a different spin on DIRECT – basically, we should weight negative outcomes based on whether they can be overcome if they arise.  But, the general principles should still apply.  They did do some interesting work with combining elicitation (expert opinion) to determine the likelihood of models and quantitative predictions to quantify the expected costs/benefits of the models.  Very informative and interesting discussions!

8/31/16 – Perth, IAH – This was a disaster.  It was in a ‘hot office’, which was cool.  But, I made the mistake of borrowing the common slide advancer.  It regularly advanced two slides.  I hadn’t realized how much I relied on the timing of revealing slides to make key points.  This definitely showed me that!

8/29/16 – Darwin – There were two talks before mine; one very applied and with a heavy story component and the other quite technical.  I went 61 minutes, but was quite relaxed.  Good questions afterwards.  More and more elements seem to flow.  But, I was too far from my computer – I couldn’t see the upcoming slide!

8/26/16 – Ayr, Burdekin Public Library – fantastic.  I worked hard to reduce the length, explain things that had been obtuse, remove things that were unnnecessary or awkward, and add a better conclusion.  There were 37 people there – with very, very diverse backgrounds.  Questions went on for 20 minutes … very insightful … and continued over lunch.  I never regret making the time to talk to non-experts.  I think that it sharpens my thinking and, besides, it is really appreciated by the audience.  The key change to the talk is that I now summarize the key point … doing things backwards … and why it is useful.

8/25/16 -Townsville, James Cook University – a bit of a disaster.  I worked to shorten the talk, and was on target to come in under an hour.  But, I heard a rattle at the door just before I got into ‘how DIRECT works’.  The next class was coming in!  I rushed through the end.  But, based on feedback in the hall afterwards, the key points came through.  What is missing, though, is a clear vision of how this all works.  Maybe I need to try cutting out the overview of how DIRECT works – the 30,000 foot view – and inserting more of a meat and potatoes overview.  I’ll try that for tomorrow.  I also need to remember to ASK how much time I have before I start!

8/23/16 -Brisbane, Queensland University of Technology – I worked for about 30 minutes before the talk to reduce the time.  It seemed to work … I got it down to 57 minutes.  I think I can cut more.  It was a bit rocky, as I didn’t know what was coming next, but that’s fine.  I didn’t get a lot of love from the crowd during the talk, or during questions.  But, I got some very interesting feedback from a medical researcher.  They are doing some similar work – worth following up!

8/18/16 – Hobart, University of Tasmania – this was a fun crowd, because most of them were not hydrogeologists.  It was interesting to try to aim the talk more generally, for a meeting of geologists.  It seemed to go over well.  But, was even longer than usual!  I think that I am focusing too much in the story element.  It should come through, but unthinking is getting preachy.  I also think that I could deuce the RECT part.  Maybe avoid the first explanation of how it is used?  Maybe condense the description of forming the PDF?  That’s is my next shortening target.

8/17/16 – Adelaide, Flinders University – this felt a bit like preaching to the choir.  The concepts of multi mode analysis and conceptual model uncertainty are pretty well understood here.  I think that what we have to offer is a practical approach to implementation.  But, that is a bit less satisfying to present.  Still, the main issue is that the talk is too long.  I need to cut at least ten minutes.

8/15/16 – Melbourne IAH – I was warned that Australian audiences can be quite subdued, so not to expect a big response.  But, I’d have to say, this was one of the most actively engaged audiences I’ve had!  Some of my ‘humor’ went over pretty well and they reacted to the dice games very well, too.  Questions were great – definite practical bent, but not surprising given the preponderance of consultants in the audience.  The talk really seemed to go smoothly.  I could see my computer easily and hit all of the cues for upcoming slides.  The final slide, the busy one with all of the cultural references, was better.  But, the message could still be a bit clearer at the end.  I should spend a bit of time thinking about this, especially as it is the summary slide!

8/11/16 – Geoscience Australia, Canberra – great venue and crowd.  The talk really seemed to come together – it was long, but I had several people say that I shouldn’t shorten it unless I need to fit a tight time slot.  I had access to PPT 2016 – so I could see my next slides.  That really helps!

8/10/16 – Sydney UNSW – split crowd – a few faculty who seemed to follow along closely and a number of students who may have been a bit lost. Still, the delivery seemed smooth and well connected.  Thought to add a revisit of the mountain climb before showing the genetic diversity slide – think it will help that transition.  I have a feeling that the Darcy bio part is still too long.  But, I think it has to be there as it is the Darcy Lecture after all!

8/916 – Sydney professionals – very well connected with audience – they appreciated that I stopped for a swig of beer during the talk!  Feels like I am harping on the story line aspect a bit much.  But, comments afterward were uniformly excellent.  So, as Sander said, if it ain’t broke ….

8/5/16 – Lincoln – I nice auditorium with a good crowd.  Unfortunately, I couldn’t connect my computer, so I couldn’t see the upcoming slide.  I hadn’t realized how much comfort I get from that!  Really clicked … am able to bring out more and more of the storytelling aspect.  That really seemed to resonate in questions and in later conversations with Brent.  I do think it is time for another cut … the time has crept up again!

8/1/16 – Concepcion – The talk really came together.  I was long 1:04, but that was because, in part, Jose had asked me to go a bit slowly to make sure that people could understand my English.  Some of the themes are gelling nicely. One thing that has been bothering me, but came together, was how to introduce Eagelman.  I realized that I needed to point out that he speaks to dealing with detailed uncertainty because our senses are imperfect sensors of the world.  This, too, leads to a risk averse decision style, it is just different than the one that Kahneman discusses.  I added a slide of Messi, head in hands, for Santiago (Chile just beat Argentina for the Copa de America.)  That was a big hit, again.  I’m going to see if I can find a similar image for other stops.  I am now realizing that a key thing is to ask the audience to imagine someone who thinks, ‘I have a story in mind and I will do all that I can to make that story fit the facts.  Who do they see?’  I can introduce the idea of confirmation bias early – in the context of how we can help stakeholders to consider scientific findings that may counter their beliefs (their story).  Later, I can hit them with the idea that this is what we do when we calibrate models.

7/29/16 – Santiago – University of Talca and Montgomery and Associates – more of a professional group this time.  My first experience with simultaneous translators.  I really don’t know how they do it.  But, I gave them a run for their money with my speed of delivery!  I totally dropped the detective idea.  Now wondering if it is worthwhile after all.  I have to say that the talk really felt like it came together.  (Even the photo of Messi that I added to link the pain of loss to the Chilean victory in the Copa America!)  The talk was long … just over an hour.  But, it felt like the right amount of time.

7/28/16 – Santiago – Universidad Catolica – a very diverse group of academics, professionals, and government employees.  Led to very interesting questions and nice comments.  My first real mention of the detective idea.  Still needs to be developed.  The talk again crept up to 60 minutes.  But, no complaints.  I did feel a bit disconnected at times.  Forgot funny things, like that Darcy turned down the reward, and how to use stories to explain uncertainty.  I need to work those more solidly into the main story so that they come to the fore.

7/25/16 – Lima – Peruvian Geological Society – I had a translator today – that was challenging.  The guy was quite good – although some of the subtleties were lost.  Then again, they are hard enough to communicate by me, 75 talks in, in English!  The hardest thing was how to decide the handoff.  Too long and he had to interpret too much content.  Too short and it sounded like repeating an oath. I am sure that I ended up speaking much less because I had the translation pauses to think ahead to my wording in more detail.  Overall, a good experience, but not the best format to give a talk.  I will be very interested to see how the simultaneous translation goes!

7/21/16 – Beersheba – Ben Gurion University – a full room, but many were students visiting for a summer short course.  I had shortened the talk considerably – coming in under an hour and quite happy with it!  Now the trick will be to avoid adding again!  The patter feels quite natural now.  Interesting how your mind can decide on words and then deliver them while thinking about the next slides.  I think that the danger at this point is becoming to practiced.  So, this kind of off the cuff adjustment is probably good.

7/20/16 – Haifa – Technion University – I worked hard and got the talk down to 52 minutes!  A few transitions need to be re-added.  But, I think it is improved!  Very hard to read the audience, but Jacob Bear said it was very thoughtful and could be useful for some Israeli issues.  Wow, I’ll take that!

7/14/16 – Bloemfontein – University of Bloemfontein – I got to lecture to students about DI for 3+ hours before the lecture.  That really helped.  I think that the lecture material landed more solidly with that background.  I riffed a lot to relate to the course, so I ran to 1:10.  But,it felt pretty good.  I think uncoils do even more to solidify the conclusion slide.  Before the next leg – another round of cuts!

7/12/16 – Cape Town – University of the Western Cape – I really overhauled the talk.  I changed my presentation of the two aspects of uncertainty that affect decision making.  I also separated parameters and models, heading to the idea that parameter estimation is an exercise in confirmation bias.  I emphasized the story telling component – introduced it early and used it for a wrap up.  I also changed colors of some lines, according to Luke’s suggestions!

7/6/16 – Pretoria – University of Pretoria – I led a 3+ hour short course before the talk and most of the audience attended.  It was interesting that (I think) it still worked.  Luke, Ben, and Leslie were there.  They said that it has improved!  But, they also had some suggestions.  Leslie had some good ideas about the ‘story’ aspect.  Luke really picked up some nice details that needed improvement!

7/4/16 – Orleans – University of Orleans – the talk is a pretty well oiled machine now.  Some danger of forgetting things … like that the simulations were divided into polluting and non-polluting!  But, I can generally go back and catch those without too much confusion.  It is, however, time to cut time again … 1 hour and 4 minutes.  Still, plenty of questions and no mass exodus for the door!  As Sander said – if the audience doesn’t mind the length, why should I?

7/1/16 – Paris – Pierre et Marie Curie Institute – the folks here organized a workshop to coincide with my lecture.  As a result, I got to hear about a lot of their cool research, and I also had a nice big crowd for my talk!  I worked in a nice reference to my host, Damien, being from Dijon.  I also made a lot more connections throughout the talk.  The downside is that it was 1 hour and 8 minutes long!!  I think it’s time to cut again.

6/28/16 – Valencia – Jaime was a fantastic host, too.  He was concerned about the number of attendees because of the timing of the talk (between sessions).  But, it was a good crowd!  He had some excellent points – especially about clarifying how the selections are made for the example case study – I am now linking back to the ‘blue dots’ plot.  He noted that the talk was very understandable, but that it accelerated for the last 10-15%.  I take that as a good sign!  I am definitely in a ‘groove’ for the talk.  I am weaving in more connections both ahead and reaching back during the talk.  It is still fun!

6/23/16 – Barcelona – the Mediterranean relaxed attitude had me fooled at first.  It was a relatively small room, but not so full.  Then I looked up again and it was packed … almost no empty chairs!  I am working in the story theme from the beginning now.  I still need to add more than two models in the number sequence example.  I’m also toying with moving the final example right to the beginning.  We’ll see about that idea.  I think that Spanish subtitles may be a good idea for Peru and Chile.

6/21/16 – Montpellier – the University really felt like a school in Florida to me – open buildings, palm trees, very relaxed air about it.  Of course, being between sessions, it was also pretty empty. A nice crowd from several departments and outside agencies.  The questions were quite philosophical.  I think that the talk has come together and now I am noticing small bridges that I need to build … like having more than two ‘models’ in my example model building exercise.  It is amazing how important it is to have a few people who are visibly following the talk … even if they don’t end up asking questions in the end!

6/17/16 – University of Bordeaux (ENSEGID) – a small, but active group.  At times it was hard to compete with the hard rain falling outside.  But, the questions were tough and on point.  In particular, we had an interesting discussion of whether scientists SHOULD present all options if we feel that they may be abused by some decision makers.  A nice range of questions from technical to conceptual.

6/14/16 – University of Rennes – a really good turn out, mostly researchers rather than students.  Fantastic range of questions from highly detailed to conceptual to applied.  I am getting comfortable (maybe too comfortable) with the subject material.  But, I was able to work in two ideas from a conversation yesterday.  One, during questions, was the idea of using neural network proxies as a way to generate multiple models.  The other, which is really intriguing, is the concept of ‘nudging’, which is an approach in social science that seeks to communicate with people by starting from what they know and moving towards a goal rather than confronting them with a ‘scientific pronouncement’.

6/7/16 – University of Bristol – I am starting to be bothered by a few loose ends that I need to tie up – things like moving the ‘star’ to the selected prediction point.  But, I think I am adding layers of reference more and more.  Interesting to present to a group with a heavy land-surface modeling component.  They have a different association with multiple models!

5/31/16 – Geological Survey of Ireland – small group & I had met with several earlier in the day.  Interesting because modeling isn’t a major part of the work here.  But, I think many of the ideas resonated.  A bit hard to tell.

5/30/16 – Heriot-Watt – things are clicking now. Implemented Peter’s suggestion to ground the example before describing DIRECT. Not smooth, yet, but I think it will be quite helpful! Fine points now – Mine/local example missing boundary in one slide. Should move red star to point of interest shown later. Varied and excellent questions and some very nice comments.

5/27/16 – DTU, Copenhagen – the talk felt really good. The story asked extra is solid, probably the clearest message. The description of DIRECT still gets a bit abstract. Peter suggested reiterating the concrete example. I like that idea very much and will try to implement it today. The questions were excellent – a real mix of technical and applied – I am continually amazed that people find new questions to ask! I am getting more consistent about recording questions. That really helps later!

5/26/16 – Uppsala – as good as Stockholm, almost.  I moved Darcy before the story explanation.  Even more positive reaction to the story idea.  Too few samples to say anything for sure, but I think I’ve hit on something there.  But, I need to reorient to the slightly new flow.

5/25/16 – Stockholm – best version of the talk, yet. Put together, anticipated upcoming slides, but not too practiced.  Felt great!

5/24/16 – Wageningen.  The story aspect comes together better all the time.  One strange thing – the room had a big window behind my computer and I did find myself looking outside during the talk a couple of times! Martine had a good suggestion.  Give people a chance to figure out what the next number will be.  This will reengage after a long section of talking!

5/20/16 – Cologne -set a new record fortime because I was trying to make sure that non-hydrologists understood the hydro material.  But, still, 64 minutes!  The ‘story’ aspect came together much better, including the ‘story’ of Darcy.  I am wondering if the model-building part has to be cut, or at least shortened.  Maybe I can start at the third step of model building?

5/19/16 – Juelich – the talk seemed to flow pretty well, although I need to work in the ‘story’ aspect a bit more.  I forgot to explain the Gould punctuated equilibrium model progression.  But that prompted one of a few questions. Sander made a good point – if the audience doesn’t mind the length of the talk, why am I so concerned?

5/17/16 – Liege – my first talk to an audience of non-native-English-speakers. Given how fast I talk, I was concerned! But, it seemed to go well. Many people seemed highly engaged and the questions were excellent. I added the figure diagramming stories and it seemed to be quite a hit with several people.

5/16/16 – London – I was a bit concerned about giving the talk to a collection of modellers, thinking it might be too simple. Further, the focus of the model was repurposing a single agreed-upon model.

5/13/16 – Aspect Consulting, Seattle – full room in a cool space – felt very Silicon Valley! Again, good to be in a small format. I think the ideas resonated. But, I didn’t love enough time for questions and discussion.

5/12/16 – PGG, Seattle – it is great to give the talk in a small group setting. Lots of chances to ask questions and bounce ideas. Later discussions said it was easier to get the full impact than in a large format talk.

5/11/16 – OSU – down to 51 minutes!  Great room, seemed to go smoothly – most people seemed reasonably engaged, and some people very engaged.  Questions were fantastic, especially related to how to communicate uncertainty to the public in the context of DIRECT.

5/10/16 – PSU – for some reason, it didn’t seem to click today.  A few out-of-body moments where I was listening to myself talk, no real laughs.  I got some pretty strong push-back about the model photo.  Part of me thinks that it is a good idea to include it BECAUSE it evokes a strong reaction.  But, I need to think about what the message is (models have to be broadly applicable and not idealized to be useful) and how important that is in the overall scheme of the talk.

5/7/16 – Dept of Ecology – forgot to time myself, but it seemed to flow well. Still a couple of parts that could be cut or shortened. Excellent questions about how this could work for stakeholder negotiation and consensus building.

5/5/16 – UW – I was actually only 2 minutes long today … 52 minutes!  Odd room – very steep, no one in the front rows.  But, my voice carried easily!  Steve Burgess had some great critiques in the end.  I’m hoping to get some key references from him!

5/4/16 – SFU – another redesign of the talk with the expressed goal of shortening to 45 minutes or less … still 53 minutes long!  I added the ‘see-saws’.  Really good responses, mix of academic, industry, and ministry attendees.  Especially good questions from students working with BGS and studying geo-hazards risk mitigation.

5/4/16 – UBC – huge audience – filled the room and more people called in from downtown.  (I think that the plentiful Tim Horton’s donuts had something to do with the in room attendance!)  Really good questions about the nature of the model ensemble.  Should we do more with things like stochastic finite elements?  Should we consider both continuous and discontinuous treatments?  How can we consider both uncertainty and ignorance (unknown unknowns).  Interestingly, I had cut the unknown unknowns slide.  I was also thinking of cutting the Grandma/Ben utility curves … but, they got good laughs this time!

4/28/16 – Colorado School of Mines – really fun to be  back at Mines.  Most of the funny parts went over well and the questions were very good.  Again, I went FAR too long.  But, people stayed and several stayed after to ask some really good questions.  These led to a couple of excellent follow on meetings with professors and locally-based consultants.  Also, very nice swag … a cool crystal paper weight with the Mines logo!

4/26/16 – NGWA – very nice feedback after the talk … seemed to hit a broad audience with people saying that it ranged from informative to transformative.  Unfortunately, the questions were cut off at ONE.  Part of this is my fault.  I REALLY need to shorten the talk … still!  But, I also really need to make the point that this is not acceptable.  Why have a Darcy Lecture and then cut off the questioning just to stay on a schedule, especially when there are other choices that waste time?  This is especially galling when there are really good questions asked individually.

4/18/16 – University of Wyoming.  Good version.  I added back a brief version of the dice.  It needs to be fleshed out a bit.  Steve Holbrook saw that I misspelled Doris Kearns Goodwin’s last name!  Embarrassing.  Very good questions.  Large group of students and faculty.  Even though it went over an hour, very few people left.  Very good sign that it held their interest!

4/15/16 – Michigan State University.  Best one yet, I think.  For whatever reason, I was getting a lot of positive reinforcement during the talk.  Nods of approval, indications of surprise at the right times.  Very good questions afterwards.  Sharp and diverse audience.

4/13/16 – Grand Valley State University.  Good talk.  No questions because there was a class immediately after the talk.  My bad … I should either shorten the talk (haven’t managed this yet) or ask people to leave time available for questions after a one hour talk.)  Good discussions after the talk, though.

4/11/16 – Ohio Association of Professional Geologists.  And I’m back.  Second talk in a day, really about an hour or so in between.  I was determined to make this one work.  It flowed well, connected, plenty of laughter at the right places.  Really interesting.  I think it has a lot to do with my mood going into the talk!

4/11/16 – Ohio State University.  NOT my best work.  I was a bit rushed getting to the talk and never really recovered.  I was a half step off, kept missing transitions, didn’t know what slide was coming next.  Good questions and some nice comments.  But, it felt like it fell really flat.

4/8/16 – California Water Resources Association.  I really feel like I nailed the talk.  People stayed with me, even though it was immediately after dinner.  Great questions, including a reference to a recent court case that used multi-model methods in the penalty phase!  Nice!!

4/7/16 – UC Irvine.  I completely overhauled the talk since the last version.  The dice are gone.  The order is changed.  I cut the number of slides by 1/3 … it was still 7 minutes long!  But, I think that it is on a good path forward.  Very good questions from the audience of 65 people.  Nice mixer afterwards.  In all, a very successful talk … now it is a matter of making the new version more smooth … and a bit shorter!

4/5/16 – Nevada Ground Water Association, Reno, NV.  This didn’t seem like it went well.  It could have been the post-dinner slot.  Maybe this mixed with the talk going too long, especially the simple introductory part, made it feel like it dragged on.  Good questions and nice comments, but the energy in the room didn’t feel very ‘up’.

4/5/16 – Reno High School. This was FANTASTIC!  I met with about 50 students from three advanced / AP classes.  I talked about what science is, who Darcy was, built to the mass balance equation and Darcy’s Law.  They seemed to stay attentive and asked great questions.  Super students!

4/1/16 – UA – Tucson, AZ.  It was great to speak to a home town crowd!  The talk was still about 10 minutes long, but people seemed to stay engaged and there were lots of questions.  In fact, there were really good questions, pretty thought provoking!    I think that a couple of heavy-hitters in the audience were disappointed that the level wasn’t higher.  But, other heavy-hitters, and the rest of the audience, seemed to think it was right on target.

3/31/16 – Clemson – Clemson, SC.  The talk seemed to go well – 300+ people, only ran 5 minutes long.  But, it is hard to compete with dressed salad and cake sitting on the table!  There was no time for questions … we went straight to lunch when I finished my last slide and then I had to head out to get back to Tucson!  I think that the lesson is to plan ahead for this and say that I’ll step to the side of the stage if anyone has questions/comments under these conditions.

3/30/16 – University of Georgia – Athens, GA.  The talk really seemed to connect, even though it was still about 11 min long.  I think it hit just the right level of detail.  Unfortunately, my Surface went insane … opened 50 windows, shot through all of the slides for the talk, had to use in on-screen arrows to advance.  Nightmare.  The great thing is that there were fantastic questions … over 45 minutes!  Several points that had not been made before.  Overall, I really had a great visit there – lots of on-campus research opportunities and science/government links

3/28/16 – West Georgia – Carrollton, GA.  I really reorganized the talk, which usually translates to running VERY long.  But, I also cut out all of the details about how DIRECT works for this audience.  I think that went well.  Some good questions from hydrologists, but everyone seemed to understand the talk.  But, it was 50 minutes long!

3/24/16 – OSU Student Water Meeting – Stillwater, OK.  Great audience comprised mostly of students from several different schools (and a good representation of faculty, too).  I went long again!  (This time ‘only’ 11 minutes over.)  It definitely felt comfortable … time to talk off the cuff and to explain points.  Good questions, too.  But, it was too long.  What to cut?

3/2316 – EPA – Ada, OK – VERY long … 15 minutes over!  People said it didn’t seem long.  But, it was!  It’s too bad, because it felt really relaxed and seemed to flow.  (I think that I even avoided ‘swallowing’ the ends of my sentences.)  I need to figure out what to cut for tomorrow at the OSU Student Water Forum!!

3/16/16 – Forgot to start timer, felt a little long.  The screen was low, so I was a bit penned in at the side.  I ended up walking in small circles … a bit odd.  Better when I moved to the other side of the screen.  I am back to ‘swallowing’ the ends of my sentences.  Need to make a conscious effort to pronounce all the way to the end … even if my mind is already moving on to the next slide!

3/14/16 – PERFECT timing … literally 50:00.  I felt like I had time to ad lib a bit, but I was talking pretty fast!  There were plenty of good questions and comments.  Several people loved my Darcy tie … first time wearing it.

3/10/16 – I had a great chance to meet with several program managers at the DOE and some NRC folks.  Unfortunately, like Stanford, I think that I underpitched the talk.  I may need to make a more advanced version with design added back in?  I’ll spend a bit of time thinking about which other places would want that.

3/4/16 – Very cool opportunity to speak to the Alaska Native Science and Engineering Program.  There were about 80 students in a great space that they use for weekly meetings.  BUT … I only had 10 minutes.  Fantastic response – some said it was the best science talk that they had ever heard.  I really focused on the idea that, as scientists, they need to make sure to keep multiple competing hypotheses open, rather than allowing ideas to collapse onto the current best model.  Great experience!

3/2/16 – Very good response at Fairbanks.  Felt like I was hitting all the high points.  But, during the talk I thought I was losing people.  LOTS of good questions afterwards, though!

2/29/16 – Went long in Anchorage.  I need to remember that if I reach back and explain more to a non-hydro audience, I really need to make sure to cut something else … on the fly!  I also always regret adding too many transitional slides. I just need to trust that I can make the transition; otherwise, it’s too clunky.  Mike Reeves suggested explaining how wells are selected in for the case study.  Great idea, too!

2/24/2016 At UU the talk was about 3 minutes long. I had added some transition slides and forgot they were there, so I had a few hiccups (probably not too noticeable). I really need to make use of presenter mode! At lunch, Kip Soloman had a good point … for non-hydro audiences, it would be good to explain why you pump to mine and how that might affect surrounding communities! I will definitely add that in!!

2/23/16 – Kip Solomon suggested that non-hydrologists might not make the connection between pumping to dewater the well and environmental impacts.  Great insight.  It actually offers an opportunity to loop back to Darcy’s insight!

2/22/16 – Paul Brooks suggested, indirectly, adding something about evolution as an organization principle.  Great idea!  I added it as a reason to maintain diversity in the model ensemble!!

2/19/2016 The talk was right on 50 minutes and the audience seemed to stay pretty engaged throughout. Great suggestions from the audience in Urbana Champaign. Jenny Druhan emphasized that the ‘choosing sides’ part was really engaging, but it was dropped. I think it will be a perfect bridge from the dice to the more complicated problems! Craig Bethke thought it would be good to really emphasize what is different in making decisions with a best fit model and the ensemble. I think that can really round out the utility PDF part. A PhD student noted that the utility functions should really be multimodel, too! At first I was having enough trouble just explaining utility. Maybe I have gotten far enough with that that I can attempt uncertain utility! Praveen Kumar pointed out concerns about the ‘all models are wrong’ mantra … I agree, so I even changed my haiku!

2/15/16 – I added a section about ‘should you bring an umbrella’ to point out that we are used to making decisions under uncertainty.  But, I didn’t cut enough to make up for it.  So, I ran long.  That is a killer.  The next iteration will cut time from the umbrella, dice, and model sequence parts.  Jean Bahr noted that the meat of the talk – describing what makes  a measurement discriminatory – was too rushed.

2/9/16 – great feedback from some of Steve Gorelick’s students.  In particular, to clarify what the actual decision is for the mining example and to build the utility PDF and also show the CDF.  I think that the NYT number sequence worked.  I’ll keep it in for now and see if it works for the next few talks.  I also added back the information about my background as a hydrogeophysicist.  I think it helps to put the work in context with my other work.

2/7/16 – major, major reorganization of the talk.  Questionable decision given that it is immediately before the GRAC and Stanford stops!  The diffusion column is gone.  The context elephant is gone.  Introducing my early work in hydrogeophysics is gone.  I introduce the mine problem at the very beginning, just to set the stage for the kinds of applications that interest me.  I also go into more detail with the dice example, trying to use it to illustrate all elements of the structure of DIRECT.  I’ll report back on Monday night with a post-audit after the GRAC!

2/2/16 – got great feedback from the BSU crowd.  Basic message is to hit the main points … a new paradigm for using models … harder.  Also, get the crowd invested in the mining case study right at the beginning.  I tried this out for the water quality conference and I think it was much better.  For that talk, I cut the time to 42 minutes.  I think that I can implement these changes for the full length talk, too.  But, I may need to cut the diffusion in a column part.  Actually, that didn’t seem to resonate so far.  Maybe I am too in love with the idea and should try cutting it!

1/28/16 – another big crowd at the University of Calgary (100).  It can be harder to connect with a larger crowd … a bit of out of body, watching myself talk feeling during the talk.  But, great conversations afterwards.  Figured out that I need to reorganize the part about dueling models – it doesn’t do well with build up, better to jump right to it.  Also, am going to try a ‘team of rivals’ theme to present DIRECT.  May also use ‘wisdom of the crowd’ to present model averaging.

1/26/16 – huge crowd at University of Alberta (120!).  There were a lot of undergraduate students.  Generally, they found the modeling aspects as understandable as they could be … but, not very!  A good lesson to spend a bit more time on some basics depending upon the crowd.  Consultants and regulators had great questions and comments after the talk.  Researchers in related fields found the talk accessible … not a typical modeling talk … which is very good news!

1/22/16 – good reception and very good questions.  First talk to a predominantly geology audience.  For them, some of the abstract discussions of modeling miss the mark.  Suggested adding some real-world images to help relate to structural and boundary condition uncertainties.  Also spoke with a social science researcher.  He pointed out that the end point of DIRECT is the starting point for their work … how do you get people to act on decisions.  Nice discussion!  Also coined a great phrase, collaborative co-generation of data.

1/21/16 – fantastic reception at the TIAA meeting.  This is the first talk to a group that is largely regulatory or management, moreso that hydro-types.  Very gratifying to hear that the basic concepts of DIRECT strike a chord.  Still more call for specifics on the field study … should put more emphasis there.

1/20/16 – my first double header!  Sandia in the morning and University of New Mexico in the afternoon.  Both groups had great questions.  Two things were particularly satisfying … 1) that it resonated with the Kris Kuhlman’s high level science group; and 2) that it seemed to reach a really broad audience at UNM.  The timing seems to be down now.  I may have some room to add back some more detail on the field study.

1/16/16 – repeated comment … too much content, hard to keep up!  One part, in particular, seems to trip people up.  I’m thinking of moving that to the end and making it ‘optional on the fly’!

1/15/16 – UNM contacted me to say that they will be simultaneously translating my talk into sign language.  How cool is that!  (I do feel sorry for the translators, though … they’re going to be pretty tired by the end.)  In preparation I’m going to add notes to the slides in PPT.

1/12/16 – evening after the first Darcy talk!  The talk was attended by at 45 people at Montgomery and Associates in Tucson.  It was the topic for the first 2016 meeting of the Arizona Hydrological Society, Tucson Branch.  Friendly crowd … many of whom I have known for years.  Good questions and great comments after the talk.  Still more work to do, but getting closer!

1/12/16 – continued to work through the day.  Had one chance to practice for timing … was at 55 minutes.  It’s a bit longer than I would like, but talks can tend to contract when they are presented for real.

1/11/16 – first real dry run, with my students.  Let’s just say that I had work to do!  They were especially helpful in pointing out ways to improve the flow of the talk and to make sure that the main messages came through more clearly.  I was back to PPT until 1:00 am … then up at 4:30 to keep things moving.

1/9/16 – second poem lined up today.  Trying to find the balance between light hearted and insufficiently serious!

1/7/16 – yesterday and today I finally put all of the pieces together and timed them, roughly.  Long.  Very long.  After some work, I had it down to 65 minutes.  Last run through was 57 minutes.  Goal … 45-50 minutes.

1/5/16 – I worked on the DIRECT part of the talk today … basic description and two examples.  The description took 10 minutes first time through.  Man, I am going to need to compress!

1/3/16 – OK, so, I tried the first part of the talk for time.  I’m at 34 minutes with the introduction and first lecture segment … and that is talking too fast.  So, I will have some streamlining to do!

12/30/15 – ended the year in Death Valley … away from the internet, mostly working on paper.  Decided to push the idea of a lecture format.  Also committed to aiming to challenge the audience with a ‘what we are doing wrong’ theme.  Feeling better about the structure.  Now need to work in the positive examples.

12/15/15 – I worked on the introduction/Darcy part today.  I spoke through it for the first time … it was really clunky.  I need to cut it down so that it flows one to the next and so that the Darcy story isn’t BORING!

12/12/15 – I have been working on the talk, on-and-off, for a couple of months.  My first thought was to have a real-time choose your own adventure talk, based on audience feedback.  I’m backing off of that for now.  But, I do want to have multiple versions of the talk to address different audiences.  At this point, the talk has the following segments: introduction by host, thoughts on Darcy, the elements of decision making under uncertainty, measurement optimization … other.  The other is the part that will vary among audiences.  Some will hear about advanced measurement optimization, others about design under uncertainty, others about multimodel approaches to stakeholder negotiation.  Clearly, I have been procrastinating putting these pieces together!

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s