dumitru roman : summer school eswc 2014

Upload: dapaas

Post on 03-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    1/36

    Open Data Publication and Consumption

    An Overview of Relevant Data Access Approaches andDaaS Solutions

    @ESWC Summer School, 2014

    Dumitru Roman, SINTEF, Norway

    [email protected]

    mailto:[email protected]:[email protected]
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    2/36

    Outline

    The context: Open Data

    Data access: Web APIs, OData, SPARQL/LDP

    DaaS solutions landscape and open DaaS architecture

    2

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    3/36

    Outline

    The context: Open Data

    Data access: Web APIs, OData, SPARQL/LDP

    DaaS solutions landscape and open DaaS architecture

    3

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    4/36

    The context: Open Data

    Open Data Movement: make data available (primarily governmentdata)

    Businesses and citizens can develop new ideas, services andapplications

    Can support (government) transparency and accountability

    4Source: McKinseyhttp://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_a

    nd_performance_with_liquid_information

    Gartner:

    By 2016, the use of "open data" will continue to

    increase but slowly, and predominantly limited to

    Type A enterprises.

    By 2017, over 60% of government open dataprograms that do not effectively use open data

    internally, will be scaled back or discontinued.

    By 2020, enterprises and governments will fail to

    protect 75% of sensitive data and will declassify andgrant broad/public access to it.

    Source: Garnerhttp://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data

    _JUN+2014_v2.pdf

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    5/36

    Lots of open datasets on the Web

    A large number of datasets have been published as open data in therecent years

    Many kinds of data: cultural, science, finance, statistics, transportenvironment,

    Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON,

    5

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    6/36

    but few applications

    Applications utilizing open and distributed datasets have been ratherfew, e.g.

    Challenges include: Lack of resources: unreliable data access

    Lack of expertise: not easily available to organisations

    Technical/organizational

    6

    Open Data Portal Datasets Applications

    data.gov ~ 110 000 ~ 350

    publicdata.eu ~ 50 000 ~ 80

    data.gov.uk ~ 20 000 ~ 350

    data.norge.no ~ 300 ~ 40

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    7/36

    Open data publication and access

    Data publishers: complicated data publishing and maintenanceprocess

    Data consumers/developers: complicated programmatic dataaccess

    A decision which lifts a data publication burden from a datapublisher will place that burden on the data access for the dataconsumer

    7

    Easy data

    publication

    Easy data

    access

    Complicated

    data access

    Complicated data

    publication

    Simplify data access!Simplify data publication !

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    8/36

    Outline

    The context: Open Data

    Data access: Web APIs, OData, SPARQL/LDP

    DaaS solutions landscape and open DaaS architecture

    8

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    9/36

    (Programmatic/Web-based) Data access

    Traditional approaches for programmatically consuming data: ODBC,JDBC, RMI, CORBA, ...

    Modern Web applications and data services rely extensively on

    lightweight Web service based approaches exchanging data viastandard protocols (HTTP) and formats (e.g. XML, JSON, RDF, )

    Relevant approaches for programmatic access to open data

    Web APIs

    OData SPARQL and Linked Data Platform (LDP)

    9

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    10/36

    Web APIs

    Programmatic interfaces accessible through HTTP calls (e.g. GET,POST)

    Data (requests/responses) typically in JSON or XML

    Very popular among application developers

    10Source: http://www.programmableweb.com/

    Protocol: HTTP

    Payload: JSON/XML/

    Data Consumer / Dev Data Provider

    Client

    LibraryAppWeb

    ServiceWeb API

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    11/36

    Web APIs - example

    11

    Request:

    GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58

    Response payload:

    http://api.yr.no/weatherapi/locationforecast/1.9/documentation

    http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    12/36

    Open Data Protocol (OData)

    ODBC for the Web

    A protocol forcreating and

    consuming data APIs Builds on HTTP and

    REST

    OASIS Standard(2014), promoted by

    Microsoft, IBM, andSAP

    12

    http://www.odata.org/

    http://www.odata.org/http://www.odata.org/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    13/36

    OData

    Principles: Metadata, Data, Querying, Editing, Operations,Vocabularies

    The OData Data Model based on the Entity Data Model (EDM)

    The OData protocol: CRUD + query language

    XML and JSON serialization

    Source: Microsoft

    http://msdn.microsoft.com/en-us/data/hh237663.aspx

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    14/36

    OData - requesting data examples

    14

    Request (entity by ID):GET serviceRoot/People('russellwhyte')

    Source: http://www.odata.org/getting-started/basic-tutorial/

    Response payload:

    Request (collections):GET serviceRoot/People

    Request (individual property):

    GET serviceRoot/Airports('KSFO')/Name

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    15/36

    OData - querying data examples

    15

    Source: http://www.odata.org/getting-started/basic-tutorial/

    Request (filter):GET serviceRoot/People?$filter=FirstName eq 'Scott' Response payload:

    Filter on complex type:GET serviceRoot/Airports?$filter=contains(Location/

    Address, 'San Francisco')

    orderby:GET serviceRoot/People('scottketchum')/Trips?

    $orderby=EndsAt desc

    top:GET serviceRoot/People?$top=2

    count:GET serviceRoot/People/$count

    expand:GET serviceRoot/People('keithpinckney')?$expand=

    Friends

    select:GET serviceRoot/Airports?$select=Name, IcaoCode

    search:GET serviceRoot/People?$search=Boise

    Lambda Operators: any / allGET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    16/36

    OData - data modification example

    16Source: http://www.odata.org/getting-started/basic-tutorial/

    Request (Create an Entity):

    POST serviceRoot/PeopleOData-Version: 4.0Content-Type:application/json;odata.metadata=minimalAccept: application/json{"@odata.type" :"Microsoft.OData.SampleService.Models.TripPin.Person","UserName": "teresa", "FirstName" : "Teresa","LastName" : "Gilbert", "Gender" : "Female","Emails" : ["[email protected]","[email protected]"], "AddressInfo" : [{ "Address" : "1 Suffolk Ln.", "City" : {"CountryRegion" : "United States", "Name" :"Boise", "Region" : "ID }

    }] }

    Response payload:

    Remove an Entity:DELETE serviceRoot/People('vincentcalabrese')

    Update an Entity (uses PATCH or PUT)

    Relationship Operations (Link to Related Entities):POST serviceRoot/People('scottketchum')/Friends/$ref

    {"@odata.id": "serviceRoot/People('vincentcalabrese')"}

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    17/36

    SPARQL

    A set of specifications that provide languages and protocols to queryand manipulate RDF graph content on the Web or in an RDF store

    17

    Service DescriptionRequest:

    GET /sparql/

    Host: www.example.org

    Response: An RDF description,

    using the Service Description

    vocabulary

    Protocol for RDFRequest:

    GET /sparql/?query=[SPARQL

    Query]

    Host: www.example.org

    Response: A SPARQL Results

    Document or RDF graph

    Update LanguagePREFIX foaf: .

    INSERT DATA {

    foaf:knows [ foaf:name "Dorothy" ]. } ;

    DELETE { ?person foaf:name ?mbox }

    WHERE { foaf:knows

    ?person .

    ?person foaf:name ?name FILTER ( lang(?name) = "EN"

    ) .}

    Examples taken from http://www.w3.org/TR/sparql11-overview/

    Query LanguagePREFIX foaf:

    SELECT ?name (COUNT(?friend) AS ?count)WHERE {

    ?person foaf:name ?name .

    ?person foaf:knows ?friend .

    } GROUP BY ?person ?name

    Result (serialized in XML, JSON, CSV, TSV):

    Graph Store HTTP ProtocolPOST /rdf-

    graphs/service?graph=http%3A%2F%2Fwww.example.org%2Falice

    Host: example.org

    Content-Type: text/turtle

    @prefix foaf: .

    foaf:knows [ foaf:name "Dorothy" ] .

    http://www.w3.org/TR/sparql11-overview/

    http://www.w3.org/TR/sparql11-overview/http://www.w3.org/TR/sparql11-overview/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    18/36

    Linked Data Platform

    Describes the use of HTTP for accessing, updating, creating anddeleting resources from servers that expose data as Linked Data

    Centered around LDPRs, LDPCs, membership, containment

    Under development at W3C; working draft

    18

    http://www.w3.org/TR/ldp/

    LDP-BCRequest: GET /c1/

    Response payload:

    Resource

    Request: GET /netWorth/nw1Response payload:

    LDP-DCRequest: GET /netWorth/nw1/liabilities/Response payload:

    Examples taken from http://www.w3.org/TR/ldp/

    LDP-DCRequest:

    http://www.w3.org/TR/ldp/http://www.w3.org/TR/ldp/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    19/36

    Data Access Summary

    Web APIs

    Very flexible, popular with Web developers, no specific commitment to datamodels

    OData

    ER-based data model, abstract interface to datastores (focus on CRUD),

    perceived as vendor-pushed (strong tool support) SPARQL and LDP

    Graph data model, community-pushed, some interesting features (querying,federation, linking,)

    Though there is overlapping between the various approaches, they all aimto simplify access to distributed data sources for application developers

    Which approach to choose depends on many factors, e.g. type of data, size,relationships, infrastructure, skills to support, frequency of updates, end-usescenarios,

    19

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    20/36

    Outline

    The context: Open Data

    Data access: Web APIs, OData, SPARQL/LDP

    DaaS solutions landscape and open DaaS architecture

    20

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    21/36

    Data publication

    Data access mechanisms simplify data consumption for applicationdevelopers

    But data needs to be provisioned to applications according to thechosen data access mechanism

    And applications will always be dependent on the hosting for the datathey use

    Data publishers and application developers need to rely on genericCloud platforms and build, deploy and maintain a complex Open

    Data software and data stack from scratch Complicated data provisioning and maintenance process

    Data-as-a-Service (DaaS) solutions are emerging to address this issue

    21

    Like all members of the "as a Service" (XaaS) family, DaaS is based on the concept that the product,

    data in this case, can be p ro v id ed o n d em an d to the user reg ar d les s o f g eo g r ap h ic o r

    o r g a n i z a ti o n a l s e p ar a t i o n o f p r o v i d e r a n d c o n s u m e r .

    Source: Wikipedia; https://en.wikipedia.org/wiki/DaaS

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    22/36

    Relevant DaaS solutions

    22

    Windows Azure

    MarketplaceSocrata DataMarket

    Factual Junar PublishMyData

    DaPaaS

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    23/36

    Windows Azure Marketplace

    A marketplace for applicationsand data (~170 datasets; ~700applications)

    Charging data consumers Tools and APIs for data

    publishing, analytics, metadatamanagement, accountmanagement and pricing,monitoring and billing, as well

    as a data portal for datasetexploration

    Supports OData

    23

    https://datamarket.azure.com/

    Source: Microsofthttp://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409

    https://datamarket.azure.com/https://datamarket.azure.com/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    24/36

    Socrata

    Specific focus on Open Data

    Open Data Portal: data publishing &clean-up, metadata generation, data-driven portals for data exploration and

    portal management

    API Foundry for creating and deployingRESTful APIs on top of the data

    Hosted data is accessible through theSocrata Open Data API (SODA) aRESTful interface for searching and

    reading data in XML, JSON or RDF

    24

    http://www.socrata.com/

    Source: Socrata

    http://www.socrata.com/http://www.socrata.com/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    25/36

    DataMarket

    Provides statistical data fromalmost 100 data providers

    ~ 71 000 datasets

    Supports embeddable

    visualisations of data, dataexport, live feeds for dataupdates, ability for datapublishers to monetize data viathe marketplace, custom datadriven portals for publishers,

    data portal, Web API

    25

    http://datamarket.com/

    http://datamarket.com/http://datamarket.com/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    26/36

    Factual

    Data for ~ 65 million local business and pointsof interest in 50 countries; a product databaseof over 650,000 products

    Used to provide the option for hosting

    thousands of 3rd party data sets (CommunityData) but activity has been discontinued

    Data is populated by means of Web crawls,data extraction and 3rd party data services;data model is tabular, based on taxonomy ofaround 400 categories

    Pricing is based on a pay-per-use model Data access is provided through a RESTful API

    Provides a set of tools for data management

    26

    http://www.factual.com/

    http://www.factual.com/http://www.factual.com/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    27/36

    Junar

    Cloud-based Open Dataplatform to collect,enrich, publish andanalyse open data

    Data can be consumedeither directly via theJunar API, or via variousvisual widgets

    27

    http://www.junar.com/

    http://www.junar.com/http://www.junar.com/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    28/36

    PublishMyData

    28

    Hosted, as-a-service solution for Open and Linked Datapublishing

    Uses DCAT and provides data access via Web APIs, a

    SPARQL endpoint and raw data-dumps

    http://www.swirrl.com/publishmydata

    http://www.swirrl.com/publishmydatahttp://www.swirrl.com/publishmydata
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    29/36

    Other relevant solutions

    Comprehensive Knowledge Archive Network (CKAN)(http://ckan.org/) web-based open source data management system forthe storage and distribution of open data; datahub (http://datahub.io/)

    LOD2 (http://lod2.eu/) research project aimed at providing an open

    source, integrated software stack for managing the lifecycle of Linked Data,from data extraction, enrichment, interlinking, to maintenance; not meantto be as-a-service solution

    Project Open Data (http://project-open-data.github.io/) a set of open

    source tools, methodologies and use cases for publishing and utilising OpenData

    COMSODE (http://www.comsode.eu/) research project aiming to createa publication platform for Open Data called Open Data Node

    29

    http://ckan.org/http://ckan.org/http://datahub.io/http://datahub.io/http://lod2.eu/http://lod2.eu/http://project-open-data.github.io/http://www.comsode.eu/http://www.comsode.eu/http://www.comsode.eu/http://project-open-data.github.io/http://lod2.eu/http://datahub.io/http://ckan.org/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    30/36

    DaPaaS towards an Open Data- andPlatfom-as-a-Service for Open Data

    DaPaaS research project for simplifying data publication andconsumption via a Data- and Platform-as-a-Service approach

    30

    http://dapaas.eu

    DaPaaS Platform

    Data Publisher

    End-Users Data Consumer

    Application Developer

    publishes

    open data

    develops and deploys

    applications on top

    published data

    consumes data resulting

    from the available

    applications

    http://dapaas.eu/http://dapaas.eu/
  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    31/36

    DaPaaS Requirements for Data Publisher

    31

    DP-02: Data

    storage and

    querying

    DP-04: Data

    interlinking

    DP-03: Dataset

    search &

    exploration

    DP-09: Data availability

    DaPaaS Platform

    DP-05: Data

    cleaning &

    transformation

    DP-01: Dataset

    Import

    DP-11: Secure

    access to platform

    DP-10: User

    registration & profile

    management

    Data

    Publisher

    DP-08: Data scalability

    DP-06: Dataset

    bookmarking &

    notifications

    DP-07: Dataset metadata

    management, statistics &access policies

    DP-12: UI for data

    publisher

    DP-13: Data

    publishing

    methodology support

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    32/36

    DaPaaS Requirements for ApplicationDeveloper

    32

    AD-04:

    Configure

    application

    deployment

    AD-01: Access to

    Data Publisher

    services

    (DP-01 DP-13)

    AD-03: Develop

    applications in state-

    of-art programming

    languages

    AD-05: Deploy

    and monitor

    application

    AD-06: Application

    metadata management,

    statistics & access policies

    DaPaaS Platform

    AD-07: UI for

    application

    developer

    AD-08: Application

    development methodology

    support

    AD-02: Data

    export

    ApplicationDeveloper

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    33/36

    DaPaaS Requirements for End-Users DataConsumer

    33

    DaPaaS Platform

    End-User

    Data Consumer

    EU-03: Datasets and

    applications bookmarking

    and notifications

    EU-01: User

    registration & profile

    management

    EU-02: Search &

    explore datasets

    and applications

    EU-04: Mobile and

    desktop GUI access

    EU-07: High availability of

    data and applications

    EU-05: Data export and

    download

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    34/36

    DaPaaS PlatformAbstract High-Level Architecture

    34

    Data Layer

    UX Layer

    UX Services

    Open Data

    Warehouse

    Platform Layer

    UsageMonitoring

    Application Hosting

    Environment

    Security&AccessControl

    Tool-supportedMethodologyfor

    D

    ataPublishing/Consumption

    DaaS Services

    PaaS Services

    DatasetsDaaS Services

    DaaS Services

    Data-Driven

    ApplicationsPaaS ServicesPaaS Services

    UX ServicesUX Services

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    35/36

    Summary

    Lots of open datasets, but few applications using them

    Simplifying data publication/consumption can enable anincrease in the number (and quality) of applicationsusing open data

    Various approaches emerging

    For data access: Web APIs, OData, SPARQL/LDP

    For data publication/provisioning: DaaS solutions

    35

  • 8/11/2019 Dumitru Roman : Summer School ESWC 2014

    36/36

    Thank you!

    36

    Contact: [email protected]

    mailto:[email protected]:[email protected]