Data - The thing that ties it all together

Speech

Peter Harris gave a speech to the Committee for Economic Development of Australia (CEDA) on 22 March 2017 in Melbourne.

Download the speech

Data - The thing that ties it all together (PDF - 191 Kb)

Read the speech

The Productivity Commission is in the final week of its inquiry into Data Availability and Use.

Sweeping statements to suggest that data is crucial to all our futures are so common today as to draw parallels with Paul Keating’s famous commentary about galahs and pet shops.

I’ll try to avoid that fate.

In doing so, I also hope to avoid adding to the weight of commentary that leaves the impression of substance and action.

Because in the public policy, while there’s lots of opportunity there’s not really much action. And what action there is, remains seriously uncoordinated.

There’s lots of data out there, and much more is generated every day in a huge tide of noughts and ones.

A great deal of it is probably only valuable for a few moments and to one or two people. But who is to know which is what? And will the apparently unimportant today be vital tomorrow, just after we have given it away?

For an inquiry like this, it is both refreshing and a bit threatening to recognise that there is no template for good data policy.

And even worse, from our look around the world, no country would claim to be on top of the data revolution.

From our efforts we can also say that no single mental model or theoretical approach has much to offer on data or information, or any of its other near equivalents. The common view is that it is pretty important stuff and don’t mess it up.

Good advice, but not really what the Treasurer had in mind with this Terms of Reference, I’m sure.

You might think from this opening comment that we were a bit depressed at the PC when the request to review such an all-encompassing but substantially unguided topic.

Not a bit of it. We all needed to have this Inquiry, because near enough to every one of us is in the middle of this data thing. Today, just using a supermarket or getting a flight is enough, even if you don’t use the Internet and still have a pedal wireless at home.

It’s a big shift that’s going on, possibly the biggest structural shift in the economy in a generation. And it has a long way to run yet, by all the evidence.

While data is big, it sounds pretty dull to most of us. Yet it can induce high passions, and I expect the final report will demonstrate more of this.

We have had almost as much stakeholder attention to this Inquiry as we had to Workplace Relations. Well over 300 submissions. More than 100 consultations and meetings. Lots of Ministerial and senior Departmental interest.

Even the tone of the most passionate submissions matched the tone achieved by the most interesting in workplace relations. For a few, once again we were the vandals at the gates of Rome.

Excitement there may be, but we aren’t in this game for the excitement.

We welcome the task of framing how Australia might deal better with its data future because it matters deeply to future productivity, and beyond that at a social welfare level right across the economy.

We had sought this task actively for some time, most recently in our advice to the Harper competition Inquiry.

The larger part of the motivation was to improve how public sector data was used, knowing as we did already quite a lot about its failings:

the failure to be used in setting policy
the failure to understand what the data was telling you
and most ugly of all - the failure to be allowed to even glimpse a data set that we know exists.

These views were developed over time in the course of numerous previous Productivity Commission reports.

Yet once we started the work, we quite quickly discovered under strong advice from firms most active in seeking to use data another, even larger unaddressed need in Australian public policy: consumer access to data that they create is incredibly poor.

This despite the near-universal acceptance today that data is an asset, not a liability. And that, somewhat ironically, these same consumers supply pretty much all of it, outside machine-to-machine exchanges and the pure sciences.

I don’t know about you, but I hadn’t really considered that imbalance of incentives – consumers give and give, but share so little in the opportunities – until this Inquiry.

First, a bit about the public sector and its data use.

In reports throughout the 2000s, the Productivity Commission lamented the restrictions on access to data that the public sector held, but which was off limits to researchers and analysts - those outside the organisation and as often as not from inside as well.

Even when we could find a data source to unlock a policy conundrum, those sources that could answer vital questions often had not been linked. At times, this was due to indifference. At times, privacy rules were asserted. And at times, laws prevented it.

Where we could, we became quite skilled at doing that linkage ourselves, or had others to help. But then we had to destroy the information generated, in support of confidentiality or privacy requirements. This is a rule, set by Commonwealth Government policy.

It is akin to burning books.

In 2013, my first year at the Commission, frustration at the breadth of lost opportunities led us to write a chapter in our Annual Report dedicated to describing the poor performance of Australia in using its administrative data: the stuff collected by way of compliance or payment of benefits across State and Commonwealth governments.

This is one area where, although no country claims to be on top of the data game, we were and remain clearly behind better practice amongst our peers.

From our own work, we know how use of public sector data can be done much better.

With the active cooperation of an unheralded public service community, we have produced for more than twenty years one of the best examples of the use of data for performance review across governments – the Review of Government Services, covering roughly 12 000 data points drawn from data sets around the nation.

The sets behind these data points vary in quality and reliability but they often share a common heritage: you can’t link them to do program evaluation.

Instead, and to their credit, the Commonwealth, State and Territory team that maintains the commitment even today - many years after COAG stopped producing agreements like this - relies on the media publishing the results.

And hopefully in the case of underperformance, nature then takes it course.

We know too that in health care, despite earnest efforts by an array of individuals, a combination of intellectual property restrictions; duplication and risk aversion by ethics committees; and legislation devised for a different purpose in a long-past era, locks some of our most valuable data up.

In our report, you can read how hospitals are required to sign up to intellectual property restrictions that prevent data transfer between wards. Or how cancer researchers use foreign data sets because our local ones are more restricted. Or how a nationally-funded research project into vaccination is nearly 7 years into a saga to be allowed access to Commonwealth and States’ data sets. It expects to be finally allowed full access in another year or so.

These are pretty disgraceful events. They are the tip of the iceberg.

Thus while it is obvious to anyone who searches via Google or has used Uber or understands the ability of IBM’s Watson to find forgotten clinical evidence of disease response in medical publications going back decades that big data is driving rapid private sector adaptation and investment over the last decade or so, we in the public sector remain at best relying on small, localised and most often personally-driven efforts at data sharing and deeper analytics.

As innovators, we in the public sector have been poorly served by our current regulation and practice impeding data analytics.

Bracing as that conclusion may be (as well as the recommendations that will come with the final report, due in a week or two) the more interesting development in the course of this Inquiry has been in the consumer space.

Our own consideration and that of notable thinkers on the social shifts that became the means to some very popular social ends – Facebook, Google, Twitter, Instagram, Snapchat; or Uber, Airbnb, Amazon – suggested that the willingness to continue to supply the data at the base of all these services depended on one factor above others – trust.

But there is a paradox here, evident in consumer responses to surveys: consumers, who so willingly supply the data for all these analytical purposes, are nevertheless concerned at some data practices and uncertain of their effect.

More than 70% of us hand over our personal data on social media. More than 80% of us are in a supermarket or airline loyalty program. But at least 50% of us seek to mislead those same sites by messing with our information. We sort of trust them, and yet we don’t.

And this applies in government too. We can’t access data for health or education purposes, due to regulation intended to protect us, the sources of that data. And yet survey after survey says that we expect that this data is being used to obtain medical or educational breakthroughs.

The paradox creates a risk – for public sector and private sector data holders alike. There must be a tipping point, where the balance of willingness tips way from data supply towards data restriction.

Some business groups in response to the draft report seemed to question this. For them, and they claimed support from the Privacy Act, the focus was solely on keeping your data safe. As long as that was achieved, then there was no reason to consider the implications of today’s active tracking, and profiling and all the other techniques now available to big data collectors and users. It was, a few went so far to say, their proprietary data now.

Trust is reduced in this analysis to compliance. Plus a few hints of intellectual property.

Whereas in our analysis, highly successful firms utilising consumer data – and I mean the global leaders here – emphasise trust in their dealings with the people who every day (often many times a day) supply those firms with their data.

Trust to them is not a matter simply of better ads on TV or corporate mission statements, but obligations they impose on themselves as to how they will treat those who contribute their data. You can read more of the detail in support of this in the Draft Report, and we will add a little bit more to that in the Final.

In data, be in no doubt, trust matters.

Thus if you aren’t assuring the people whose activity provides your data not just that your data is safe with us but also showing them how they can share in benefiting from its use, you aren’t in the consumer best practice ball park observable in data use around the developed world.

And keeping the balance in favour of data flow by practice that support trust can’t just be the business of one set of better practice firms or higher quality public sector agencies. If community-wide trust is to be maintained, community-wide application of a sharing the benefits of data collection is necessary.

There’s a term I’m not fond of, but here it fits: we need social license in support of the opportunities in big data.

And to get it, and to keep it, government and private data holder alike will need to practice a common commitment to sharing back with consumers the data that was sourced from them, beyond simple mere compliance with data safety.

The biggest businesses and the government agencies are likely to be the biggest beneficiaries of such an approach. In part because data is increasingly valuable and it is the largest incumbent firms or biggest government data holders that have the most data, and can extract the most value from it. In part too because they have more to lose if trust is lost – just to take a very recent example, after its last hacking episode, Verizon is paying Yahoo $350m less for its data assets than the originally negotiated price.

It is only by putting greater effort into developing community trust that we can hope to be offered patience by the community when, inevitably, there are stuff-ups.

Opportunities abound in better data use. As data holders, we will all be doing new and innovative things, if we are to measure up to opportunity. The capacities for data analytics will not stop at circa 2017.

But with opportunity comes risk. So we can also expect to experience some over-reach at times.

Enthusiasm by a firm to track consumer needs might suddenly look (to use an old-fashioned term, now popular again in the Internet world) positively creepy. Or a government may err badly while attempting to balance between responsibility to taxpayers and responsibility to individuals.

Nothing in data use is risk-free.

Active effort, beyond glossy ads and mission statements, to develop community licence will help us cope with risk, as we press on with creating genuine opportunities.

A concept like this is perhaps not something that traditional advocates of Productivity Commission Inquiries expect us to come up with.

Yet the art of identifying the incentives that align public interests with private ones is what good public policy design is all about. And today the PC is actually pretty good at dealing with the less quantifiable of concepts like this.

We spend a lot of time nowadays in social policy areas – many of you will recall top quality reports into gambling, aged care, the NDIS, access to justice, child care – where we are strongly conscious of factors that may not always be part of the basic economic syllabus but are observable factors in how a policy space works, or doesn’t.

And since our job is not really limited to how to grow productivity (although that case too can be readily made in the case of data) but rather how to redesign policy to enhance national welfare, it is in our method to attempt to identify the relevant incentives.

Incentives that can be relied upon to bind the private interest to the public interest like, in this case, community trust.

In doing so, we will be filling an evident regulatory gap.

There is a complete absence of rights outside the privacy space for individuals in relation to their data.

Firms may have copyright and certainly have contract in order to control access to their data. Increasingly, they will have trade-secret style technology and algorithms.

Governments have compulsion and financial inducements. Plus the application of criminal penalties to mid-level public servants considering whether to share data.

Consumers have only privacy.

As a consumer, you do not own your data. Some firms tell you do, but if you ask to sell it and deprive them of it (as you might an asset that you owned) you rapidly will discover that this is a novel form of ownership.

Yet consumers are the ultimate source of much of the data that are currently providing these immense disruption and service improvement opportunities that are attributed to the data revolution.

And they are 100% of the community license that underpins it.

Yes, we are aware of the Internet of Things (IoT) and a view that machine-to-machine data might be equally important. I will come back to it later. It is pretty important, no doubt, but if you’d like to cogitate while I build the picture, think how many of those devices amongst the Internet of Things are consumer devices.

Pretty reliable estimates suggest that by 2020 we will have 29 devices generating 'our' data. Individuals and families compose perhaps 20% of the IoT today, but are its most rapidly growing category.

I don’t plan today to try to impress you with the breadth of the PC’s knowledge of all these new services, or AI and Watson-style analyses. We published a paper last year on what government should be thinking about in the course of digital disruption, and there’s lots to be found there if you’re interested.

The point to be made today is, rather, that from our analysis:

consumers and their willingness to offer data is the base for much of this innovative change
the consumer data paradox creates an evident risk to such innovation
the solution to that paradox lies in measures to lock-in trust, measures that must be undertaken by data collectors large and small, government and private, across the networks they have created.

By the by, when we refer to consumers here, we are including small businesses.

When it comes to consumers, it should not matter if Mrs Smith is trading on eBay under her ABN or Mrs Smith is doing her family’s online banking, the data-driver throughout is Mrs Smith. This is her data.

Privacy law struggles a little with these distinctions. It is at its clearest when the data is a single human being’s information. But families produce data too, and should benefit from it. So too ABN-holders.

False distinctions will undermine trust in a data-filled world. Consumers are consumers in various forms and all forms are relevant to this vital question of trust.

While we don’t tend to add up the pluses and minuses from submissions, the proposition of a consumer right to be able to trade their data in return for a better service offer drew some interesting support:

MLC Life was supportive – more generally, the Insurance Council supported the link between greater consumer control and continued trust, but preferred an industry-by-industry consideration of a new consumer right
the Australian Automobile Association was supportive – whereas no view was received from car manufacturers or repairers
the Customer-owned Banking Association supported as well – but major banks were a mixed group, some accepting the concept but at least one yet to be convinced
the Australian Food and Grocery Council, representing one-third of Australian manufacturing and a variety of service and retail interests was happy – no submission was received from Woolworths or Coles, so we are a little uncertain where they might stand
Energy Consumers Australia was supportive – but energy companies mostly preferred to remain apart from any general consumer reform
Telstra was supportive in principle as well – but it had doubts about how much content should be included, a common view amongst large data holders
finally the Law Council of Australia – which provided a comprehensive response that may be of on-going value to the government in subsequent consideration of the Final Report.

Many State, Commonwealth, not-for-profit and research bodies also commented positively.

For them, the generation of trust has a unique relevance. Our Draft Report recommended that they develop their own process to offer permanent access to a new class of researcher, the trusted user.

This trust thing gets around.

As it must.

Although support for the concept that trust may be enhanced via improved consumer rights has been widespread, there were significant differences in judgment about what should actually form consumer data.

Which is hardly surprising. Around the world, only a few countries have started down the path to provide this kind of opportunity to consumers. So there is not much to learn from, as yet.

Only the UK is presently using it to encourage competitive behaviour between a limited set of firms, to meet consumer interests. A voluntary approach has had to be supplemented by mandatory standards.

And the description of the data to be provided does not appear to be well-suited to a community-wide right.

The EU has a model of what it terms portability of data, but it is subject to a variety of restrictions that affect form and tradability – both are key factors in our approach, if consumers are to expect different behaviour from firms vying for their business.

Possibly somewhat related to the EU model, it has also been suggested that we limit consumer rights to just personal information, as defined in general terms in the Privacy Act.

Such a choice would have the virtue of avoiding any confusion; but amusingly if it were not serious, has the vice of being nothing new. Very few people today find their personal information to be of sufficient value to seek it out. It appears to have very limited tradability.

That is not just our view: most evident from the submissions of a few of our larger businesses, it is pretty evident that they would not see a customer who seeks out their personal information as being someone to whom they might proffer a better insurance or banking offer.

As I noted earlier, the risk here is that trust is reduced to compliance.

Our own Draft Report tried its hand at a definition and proved to have its own shortcomings, most evidently how to describe in a manner that is effective across a community-wide range of competitive circumstances a coverage of data that was sufficient to deliver a real shift in responses from different providers.

As the history of firms’ and governments’ compliance with personal information requests shows, that information is not likely on its own to induce a shift in behaviour.

Coverage will have to be both wide enough to be useful but tailored enough to meet the needs of exchanges amongst, say, the medical professions on the one hand and banks or telcos on the other.

This is the nature of draft reports. Between the support we received and the critical advice that was also proffered, submissions have helped us to think more effectively towards the outcome needed and we believe we have a solution.

A particular angle in this question of breadth of coverage of a consumer data right is the shift evident today towards devices that a consumer may see as generating what a consumer would normally see as their data, but a supplier may not.

This may become very big, very soon. Your device, but not your data. How can that be?

The IoT could be a big source. Internet-enabled devices as a source of data appear very likely to be bigger in our futures than might have been imagined even five year ago.

Quite reasonably, if you pay for the device and it generates information on your behaviour, you might expect that this is your data. But in fact that question is pretty murky, a bit like perceptions of ownership.

Manufacturers or service suppliers might see it as their sole property because they embedded the collecting technology in the device. And because they have genuine uses for it, whereas you may not appear to be particularly relevant.

We’d like to solve this, in the course of defining consumer data.

In the case of your car, your service workshop or (increasingly) original manufacturer are today accumulating your data, and that data capture is growing. My guess is that today, you are probably unconcerned about any of that.

In future, your insurer may want that data. They already do seek it in the UK as part of offering you a better deal. So logically you should in future be able to obtain it, where it is in your interests to do so, as the Australian Automobile Association pointed out to us.

Yet the law is by no means clear that you can.

And as for your smart phone, while most of the data generated by your phone is probably your personal information, when it comes to apps on your phone there’s much less certainty.

Apps look likely to be big in our futures, although I’m sure they too will eventually prove replaceable. Mapping apps in particular generate a lot of what might be considered to be your information, but you can’t access that.

So there is a bit more uncertainty.

But the most unique generator of your data that may not be yours under today’s limited approach to data is smart meters.

Smart meters aren’t cheap, and as they arrive nationally, you will be paying for them. We have them already in Victoria. But that data isn’t yours.

It’s about your family, or you. And you will be paying for it. But the device itself may well not even be your property. And the data you can get will be decided in an energy suppliers forum.

I hope you can see in all this how the IoT is going to complicate data use in future.

The whole point of having smart meters is that you should alter your use, or change your energy plan, to take advantage of what this data is telling you.

And the entire supply chain from retailers, to network operators, to generators should also be able to do so. We would call this sharing, and it sounds pretty trust-inducing.

Today, this is what you can obtain from your energy supplier. It’s probably better than you can get from your bank or telco, but it’s still beyond the ken of the average customer.

If we had a consumer-oriented data transfer right, you wouldn’t need to interpret this, or indeed do the download/upload thing yourself.

You could instead direct your current supplier to send your data direct to a new supplier and wait to be convinced by their offer. And maybe your current supplier, noting your active use of your data, might make you a better offer too.

You send it to a non-supplier, like a solar panel or battery supplier. To get a tailored offer.

Personal information, which is the Privacy Act’s approach, may struggle to deal with data generated by machines that you won’t even own.

Looking beyond data itself to the wider economy, the Productivity Commission sees in the concept of a new consumer right to trade in their data, the opportunity to reinvigorate national competition policy.

The Harper Review, in recommending we get this Inquiry, may yet trigger change to match its illustrious predecessor, the Hilmer Review.

While the gains may appear today to be a little hard to discern, this was equally the case with national competition policy as originally developed. Early on, no one could be sure how catalytic that would prove to be.

As we know today, that period from 1995 until 2005 when competition policy was at its height lifted productivity and added about 2.5% of GDP, by this Commission’s estimates.

A similar effect on competitive dynamics and social opportunity is not out of the question here, with a community-wide change to data use.

We can all easily envisage the impact this right could have in insurance or banking or telecommunications. But the gains extend to health care and education. The impact is not easily predicted but the range is indicative of significant potential.

One opportunity that the Hilmer reforms did not have, but which is available in this reform process is the complementary shifts in the use and release of public sector data.

In our Draft Report, we proposed a Framework Act – the Data Sharing and Release Act – which would:

create new institutions
add some much-needed resourcing to lift the capability to integrate and link data sets
address safe access arrangements for use of identifiable data
promote a culture of active release of non-sensitive data.

The Act itself was intended in the draft report to be a legislated signal written in the skies over Canberra and other capital cities to the guardians of data sets long inaccessible that the outmoded restrictions and personality politics that have prevented data linkage and analysis might now be relieved.

We should not underestimate the need for this permission to change. Custodians have lived for generations with the threat of prosecution hanging over them if they make a wrong move to share data, even in-house and even for an obviously reasonable research purpose.

The Australian Law Reform Commission identified 506 secrecy or restrictive provisions across 176 pieces of Commonwealth legislation alone, including 358 that were criminal offences. Not all 500 created impediments to data use, of course.

But the culture of criminalising the mishandling of data grew from this ground. And remains with us today.

This kind of cultural burden does not shift readily. It needs a very clear sign that new approaches have been authorised.

Legislation is not sufficient for cultural change. But without it, there will be no enduring change.

I’d like to finish with some reference to the current context for reform in Australia.

It is clear that governments collectively today feel they have much less scope than their predecessors to address big policy questions, certainly in advance of crisis.

In a field like data where the gains are at once both so well-accepted (think health research) and yet so easily blocked by indifference (health research again, but as cited earlier), some of you will find it hard to imagine that any collective of governments will be able to take this sort of change on.

And that’s before we add to the mix unusual concepts like reinforcing trust and reinvigorating competition policy.

Yet if our expectations are low, we will get just that. Not much at all.

We can also guess that the burden for a reformist government will not be eased by the lazy implication that can be made, suggesting that today’s data failings – hacking by malicious parties, failure to prevent denial of service attacks, even the loss of data by the Red Cross – will be magnified by the prospect of more data linkage and analysis.

Yet these failings are the failings of underinvestment (intellectual as well as financial) in effective data handling and storage. They will occur regardless of whether there is better data analytics and research access ion the future, unless we all collectively demand better management of our data by those who hold it.

We could even use a new institution like the National Data Custodian to represent us in such a campaign.

The reforms we will provide to government in a week or two will offer a comprehensive approach to data access and use that seeks specifically to build up through substance rather than rhetoric community licence for change.

I said at the time of the Census failures, as heads were wobbling on shoulders, that the real focus should be on learning from this experience, troubling though it may be.

And my reasoning was not a defence of who did what, but a truth that should be self-evident to all who give this more than a moment’s thought: be in no doubt, we are going to have to do this again.

Future Census’ will be online by default. Future benefit programs will be scrutinised by cross-matching.

Measuring the effectiveness of Indigenous programs spending billions without closing the gap must involve in-depth trawling of a wide array of Commonwealth State and some not-for-profit or private health and education data.

Allowing innovators to create new products from previously unreleased data will probably see as many failures as it will successes, but it will be irresistible.

I suspect most critics of data management initiatives that encountered problems would not see themselves as advocating a pull back from better data use. Many are data-literate supporters of change disappointed by outcomes they think could have been handled much better.

To meet their expectations as well as our own, we need a structure in which these type of changes and many more besides are part of a well-understood whole.

I have given this a speech a title to describe data as the thing that ties it all together.

That begs what 'it' is, of course. It is our future national welfare.

Today, if deprived of consumer data certain very significant social mechanisms would collapse. Most service industries – public sector and private sector – above the very basic level depend on data to forecast, invest and respond to customer needs. Many manufacturers too, along with their facilities of production. Payment systems are deeply data dependent. I could go on.

And it is the same for the willingness of health data, or education results or census responses or many of the big ABS surveys. The sources of these data are also people, most often creating data by consuming public services.

They are still willing to let us – businesses and governments do this – do this.

But there are reservations.

To ensure continued trust, it is worth considering how we give something back, as we take ever greater (and wiser) advantage of what we can now find out.

So yes, data holds all these things together. And we are not using it to its potential.