Musing on authority in geospatial data at GeoCommunity 2011, Nottingham
Here is the deck
What’s the question?
Can a crowdsourced geospatial database be considered authoritative? Indeed can any dataset that describes the real world be considered authoritative, whether crowd sourced or “professionally compiled”? Who determines authority? What constitutes authority in geodata? Does authority matter and if it does, why? What actions or processes might contribute to promoting crowdsourced geodata to a position of authority?
I want to consider the nature of authority in geospatial data and whether it might be possible for a crowdsourced dataset such as OpenStreetMap (although these observations could apply to any crowdsourced geodata) to become authoritative or a primary reference source.
For readers who are impatient let me save you scrolling to the end of this piece by giving you an executive summary:
In a literal sense a crowdsourced dataset is unlikely to ever be granted legal status as authoritative (e.g. for conveyancing) but that does not mean that it cannot attain a level of acceptance that is close to authoritative and may in practice be more accurate/complete/up to date than data that has a formal stamp of authority.

The Crowd at State of the Map 2010, thanks to Chris Fleming http://www.flickr.com/photos/chrisfleming/4784573455/
Defining Authority
Let me start by considering what authority means in terms of a geodata.
The Oxford English Dictionary[1], which in itself would be considered an authority on the English language, defines “authoritative” as
1 able to be trusted as being accurate or true; reliable:
“clear, authoritative information and advice”
“an authoritative source”
t (of a text) considered to be the best of its kind and unlikely to be improved upon:
“this is likely to become the authoritative study of the subject”
2 commanding and self-confident; likely to be respected and obeyed:
“his voice was calm and authoritative”
t proceeding from an official source and requiring compliance or obedience:
“authoritative directives”
Several different concepts are merged in these definitions: accurate, true and reliable all seem to have an absolute quality while best of its kind and unlikely or likely are relative terms. There are also differing ways that authority can be manifested: reliable, commanding and self-confident – does a dataset become authoritative if I assert its authority with self confidence? Perhaps the different aspects of the definition highlight the challenge of determining what constitutes authority in a geodata, is it absolute or relative, is authority granted, assumed or objectively defined?
For the purposes of this paper I wish to explore authority outside of the legal context, it is most unlikely that a UK court of law would accept crowdsourced data as a definitive record in a legal dispute (see the section on Wikipedia and US courts below). However it may be possible for crowdsourced data to be recognised as being of sufficiently high quality and reliability that it could become an authority or a reference source for other applications. In other words it could be “trusted as being accurate or true” or “considered to be the best of its kind and unlikely to be improved upon”
A brief diversion into encyclopaedias
Wikipedia is often cited as the foremost example of a crowdsourced dataset and probably has the highest level of recognition and acceptance of any crowdsourced project. Most people are aware that not all of the content within Wikipedia is absolutely accurate, some of it may be opinion masquerading as fact and some is certainly not written by “recognised authorities”. Not withstanding these known limitations Wikipedia is widely used, quoted and trusted.
It should also be recognised that the Encyclopaedia Britannica, which was until recently considered the authoritative encyclopaedic reference source, can also be subject to error and author bias. In 2005 Nature undertook a comparison of the accuracy of scientific articles in Wikipedia and Britannica using independent reviewers and found that both contained errors with Wikipedia having marginally more errors (3.86 errors per article compared with 2.93)[2]. Commenting on some of the errors identified within Britannica[3] the Wikipedia authors say:
“These examples can serve as useful reminders of the fact that no encyclopedia can ever expect to be perfectly error-free (which is sometimes forgotten, especially when Wikipedia is compared to traditional encyclopedias), and as an illustration of the advantages of an editorial process where anybody can correct an error at any time.”
However in a 2008 judgement the 8th US Circuit Court of Appeals[4] ruled that the Department of Homeland Security could not rely upon Wikipedia as a source in deciding whether to admit asylum seekers. The court went on to quote Wikipedia
“… The site acknowledges [that articles], “may become caught up in a heavily unbalanced viewpoint and can take some time – months perhaps – to regain a better-balanced consensus.” As a consequence, Wikipedia observes, the website’s “radical openness means that any given article may be, at any given moment, in a bad state: for example, it could be in the middle of a large edit or it could have been recently vandalized”
So in the context of an encyclopaedias it would appear that even the gold standard is far from perfect but the nature of crowdsourcing and the continuous process of improvement and correction render Wikipedia unsuitable to be relied upon as a information source within a US court (it is perhaps worth noting that the US Court did not suggest an alternative more authoritative reference source as an acceptable alternative to Wikipedia). That said many commentators have countered that the “wisdom of the crowd” ensures that errors are identified and rectified much more rapidly within Wikipedia than within a traditional printed encyclopaedia.
What makes geodata authoritative?
In the context of authoritative geodata I suggest that we would expect it to be
t Geometrically and positionally accurate (within the scale/specification of capture)
t Complete, no features or objects within the scope of the dataset are omitted
t Correctly attributed (features are correctly named and classified according to a pre-determined but inevitably evolving scheme or taxonomy)
It is possible to imagine a series of tests that could be applied to a dataset along with real world observations that would determine whether a geodata met these requirements absolutely or how the geodata compared to other equivalent datasets using the criteria “best of its kind”. In either case there is a presumption that there is a specification against which the geodata can be assessed. But accuracy and completeness are not the sole determinants of authority, change detection, capture standards and processes and quality assurance processes will all impact our willingness to “trust” or “respect” a dataset.
It is important to distinguish between data that has authority and data that is “accurate” or deemed to be fit for purpose the latter may be good enough or even very good but still may not have the implied safety/reliability seal that comes with being classed as authoritative. The opposite could also be true, it is also possible that data that has some official seal of authority may not be accurate, complete and current.
The National Mapping Agency
In Great Britain, the Ordnance Survey has been designated by government as the National Mapping Agency[5]
“Ordnance Survey is the national mapping agency of Great Britain, collecting, maintaining, managing and distributing the definitive record of the features of the natural, built and planned environment, the definitive record of official boundaries and the record of such other national geographic datasets as required by government and the private sector.”
“Ordnance Survey will work with and consult with others in the geographic information community to help determine and advise upon the standards and quality of its data in relation to present and future national needs. This data will provide the framework to which other geographical data in Britain is referenced.”
Clearly the OS is the authoritative source of geographic information providing a “definitive record” of features and boundaries. OS data is the only basis for determining legal disputes about GB geography (e.g. land ownership, political and administrative boundaries) and it is most unlikely that our courts will accept any alternative reference source whilst OS has this status. Does this mean that OS is the sole authority in all other contexts and that no other data can be considered authoritative? I would suggest not for several reasons:
t Other organisations could collect similar data to OS at the same or a higher standard of accuracy etc. Navteq and TeleAtlas would probably claim with justification that their navigation datasets contain more attribution (e.g. turn and height restrictions) which is maintained to a higher level of currency than OS
t OS only captures a subset of geographic information, usually described as reference data. Other organisations may capture different information (e.g. Environment Agency, British Geological Survey)
But who could be the arbiter of authority outside of the context of core reference data? In an academic context authority is granted following some process of peer review perhaps the crowd could determine accuracy and completeness of alternative geodata sources through mass observations and determine the extent to which data could be relied upon?
An authority can be wrong
What happens when OS omits data or makes a mistake? Even the current data capture SLA for the OS only seeks to record 99.6% of real world change within 6 months, a target that is met or bettered[6], this implies that 0.4% omissions are acceptable, what other tolerances in absolute quality might be acceptable in an authoritative dataset?
Without doubt the authority of OS data is closely linked with the accuracy and detail of their maps and their data capture and QA processes which are based on over 200 years of experience, state of the art technology and 300 specialist surveyors. Whilst OS data is considered authoritative and a “definitive record”, it is still not absolutely correct or accurate at any point in time. QA processes tend to focus on what is included within a dataset rather than omissions, inevitably the ultimate quality check on any dataset’s completeness will be its users’ local knowledge.
How accurate and reliable is OSM within GB?
Is it possible for a crowdsourced dataset such as OSM to be “trusted as being accurate or true” or “considered to be the best of its kind and unlikely to be improved upon”? Let’s consider the 3 criteria for authoritative geodata outlined above.
1. Geometrically and positionally accurate
OSM data is captured by a combination of handheld GPS surveys and “armchair surveys” tracing over aerial imagery donated by Yahoo or Bing (more up to date), in principle it should be possible to capture data to about 5m accuracy or slightly better using these tools. Whether this is sufficient to be relied upon will depend upon the proposed use of the data.
2. Complete, no features or objects within the scope of the dataset are omitted
The community based approach to data capture does not allow for volunteers to be directed to cover specific areas in a planned manner although over time it does appear that the completeness is improving. A lack of completeness will limit the use of the data in applications which require broad cover, however that might not be a concern to an organisation wishing to build an application for say Greater London only.
3. Correctly attributed and classified
Attribution and classification are more dependent on “on the ground” observations than the other criteria above. Consequently the level of attribution and classification has lagged behind the simple capture of geometry. Furthermore the classification model within OSM known as tags can be confusing for new contributors resulting in some potential errors or omissions in classification.
Muki Haklay has undertaken several quantitative studies of the accuracy and completeness of OSM data[7] which suggest that the data that has been captured is accurate but not yet complete or fully attributed.
“By the end of March 2010, OpenStreetMap coverage of England grown to 69.8% from 51.2% a year ago. When attribute information is taken into account, the coverage grown to 24.3% from 14.7% a year ago.”[8]
Although there is a continually improving trend in completeness and attribution it would appear that the demographics and geographic distribution of volunteers may prevent the map ever having full or even close to full attribution and GB cover.
Could OSM become an authoritative source in GB?
This question needs to be considered within the context of the constraints of an informal organisation of volunteer contributors. To become a reliable and trusted source of information within GB, OSM would need to broaden the range of contributors and identify the means to motivate contributors to focus on completing the map to a consistent level for the whole of the GB. It is unclear whether this is something that the current mapping community is able to achieve let alone wishes to do
Accuracy and attribution
There are a wide range of quality evaluation tools and services developed by the OSM community for bug reporting, error detection, monitoring, and analysing tags. Specific tools range from checking network continuity, analysing relationships, visualising turn restrictions and identifying duplicate nodes, there are also tools to mark potential errors, analyse data by contributor and many that are country specific[9]. However there is no mandatory set of processes that data pass through prior to release and it is difficult to determine the extent to which these tools are used by volunteers.
The OSM philosophy on quality can perhaps be summarised as “the wisdom of the crowd will ultimately correct any errors or omissions” whether that is through observation or through the use of the tools available.
If a combination of automated QA tools were applied in a consistent process to OSM edits then potential errors could be flagged and in some way prioritised for further examination and either corrected or verified.
Completeness
Muki Haklay has identified that the level of completeness of OSM is greater in urban areas and that it also inversely correlates with the level of deprivation within an area[10].
“… the analysis of OSM shows is that deprived communities and rural areas are not well covered, especially when attributes are considered”
To rectify these biases OSM would need to find ways to either encourage existing volunteer contributors to step outside of their current areas of activity or attract new contributors in these under-mapped areas.
Blame
Responsibility for the quality of OSM is often raised as a concern by potential users (much less so by people actually using the data) “who would I blame if something goes wrong?” The answer inevitably is no one, however it should be noted that most data providers including OS do not warrant that their data is accurate or even fit for purpose and exclude any liability for errors. For example the PSMA says:
9.4 Ordnance Survey excludes to the fullest extent permissible by law all warranties, conditions, representations or terms, whether implied by, or expressed in, common law or statute including, but not limited to, any regarding the accuracy, compatibility, fitness for purpose, performance, satisfactory quality or use of the Licensed Data.[11]
It would appear that there is little opportunity to assign blame or responsibility to even an authoritative data source. Whilst recognising that even authoritative data is provided “as is” without warranty, a feedback mechanism for OSM that allows non contributing users to identify potential errors and omissions and discuss the specification of capture (even if this specification is informal) would be essential in building a higher level of confidence in the data.
Users as producers
There is no formal mission statement or outline of quality and coverage objectives for OSM, however this description on the OpenStreetMap Foundation’s web site probably is as close as we will get[12]
OpenStreetMap is an open initiative to create and provide free geographic data such as street maps to anyone who wants them. It is a massive online collaboration, with hundreds of thousands of registered users worldwide.
It is focussed on producing maps that are available without charge or constraint and interestingly refers to its contributors as “users” rather than producers.
The direction of OSM is largely driven by an active community of volunteers who have taken on the mission to map the world for a variety of reasons which range from producer centric “because we can” or “because it is fun” to more commercial or humanitarian motivations. The organisation has been highly producer centric and has, up till now, resisted the influence of large potential users of its data (corporates or governments).
A recent blog post by Martijn van Exel makes the case for OSM to focus on “warm” geography rather than seeking to emulate what he describes as the “cold” geography of national mapping agencies and navigation data suppliers.[13]
“… the extremely high churn rate that OpenStreetMap is coping with — less than one tenth of everyone who ever created an OpenStreetMap account continue to become active contributors. ..
OpenStreetMap needs those flesh and blood contributors, because it is ‘Warm Geography’ at its core: real people mapping what is important to them — as opposed to the ‘Cold Geography’ of the thematic geodata churned out by the national mapping agencies and commercial street data providers; data that is governed by volumes of specifications and elaborate QA rules.”
This is one contributor’s view but in my opinion it will resonate with many current contributors. If the current contributors do not want to create data that conforms to a specification then OSM is unlikely to become a trusted and reliable source of geodata.
Perhaps by attracting potential users of OSM who are concerned with that “cold” geography to become contributors, the challenges of a consistent approach to QA and a more structured approach to completeness can be resolved. OSM-GB is one possible way of attracting such users.
OSM-GB
OSM-GB is a project being initiated at the Centre for Geospatial Sciences at Nottingham University[14].
It is a collaboration between CGS and 1Spatial that will apply 1Spatial’s rules based geodata quality tools to a GB extract of OSM. The resulting “improved” and structured data will be projected into BNG and served as an OGC Web Map Service and Web Feature Service, for the duration of this project (approximately 15 months) these services will be available at no charge.
The project has 2 main strands of research:
- Applying rules based quality improvement processes to OSM to identify possible errors and after some experiment and refining of the rules potentially to automatically correct some geometric and attribute errors.
The “improved” dataset will be available for download from the OSM-GB web site and could be offered back to the main OSM database (probably as a basis for further inspection prior to incorporation).
- By making the “improved” data available via standards based web services, it is hoped that public sector users in both central and local government will be encouraged to experiment with OSM and identify potential use cases for OSM that are not met by the geodata currently available through the PSMA. A number of organisations have already confirmed interest in accessing OSM-GB.
The objective of making data available to so called professional users whose expectations have been set by using authoritative geodata is to encourage them to become contributors to OSM, motivated by the potential use cases identified, the flexibility of the range of data that can be captured and the data model. These users will often have a great deal of local knowledge (particularly those working within local government) that could help to address the challenges of completeness detailed above. In the longer term it may even be possible to encourage these users to incorporate contributing to OSM as part of their routine workflows.
Wrapping up
OSM is unlikely to ever be considered authoritative within a legal context.
I hope that I have shown how in the more conversational sense of the term authoritative, OSM data could become an alternative trusted and reliable source of geodata offering a wide range of content which differs from and complements other sources. For this level of trust to be achieved a more formal approach to quality assurance and a more structured and consistent approach to data capture (content, geography and attribution) will be needed.
The current OSM community may not choose to move in this direction but projects like OSM-GB may attract a new group of user/contributors who recognise the opportunities that OSM offers them and their organisations and who are able to help improve quality and extend coverage and attribution.
About the Author
Steven Feldman is a director of KnowWhere Consulting and Geo.me Solutions and an External Lecturer at the Centre for Geospatial Sciences at the University of Nottingham.
For a more detailed biography, links to web content and contact details see http://about.me/stevenfeldman
[10] See Haklay & Ellul “Completeness in volunteered geographical information”