Upload CSV files

Introduction

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

The CSV file format is supported by almost all spreadsheets and database management systems, including Microsoft Excel, Apple Numbers, LibreOffice Calc, and Apache OpenOffice Calc.

The CSV Uploader Service allows importing data from a CSV file into a MongoDB collection.

This RESTHeart service is bound to the /csv API resource by default.

By uploading a CSV file you create or update one document per each row of the file.

Before running the example requests

The following examples assume the RESTHeart running on localhost with the default configuration: the database restheart is bound to / and the user admin exists with default password secret.

To create the restheart db, run the following:

PUT / HTTP/1.1

Let’s create a poi collection, run the following:

PUT /poi HTTP/1.1

Upload the CSV file

We are going to use the following example file POI.csv:

id,name,city,lat,lon,note
1,Coliseum,Rome,41.8902614,12.4930871,Also known as the Flavian Amphitheatre
2,Duomo,Milan,45.464278,9.190596,Milan Cathedral

To import the POI.csv into the collection poi, run the following:

POST /csv?db=restheart&coll=poi&id=0 HTTP/1.1
Content-Type: text/csv

id,name,city,lat,lon,note
1,Coliseum,Rome,41.8902614,12.4930871,Also known as the Flavian Amphitheatre
2,Duomo,Milan,45.464278,9.190596,Milan Cathedral

The /csv path is a reserved path, used by the RESTHeart CSV Uploader Service

Note that the Content-Type must be text/csv otherwise you’ll get a 400 Bad Request error.

Now the /poi collection contains the documents:

GET /poi HTTP/1.1
Response
[{
    "_id": 2,
    "city": "Milan",
    "lat": 45.464278,
    "lon": 9.190596,
    "name": "Duomo",
    "note": "Milan Cathedral",
    "_etag": {
        "$oid": "5d249beebb77e333b6dc9c84"
    }
},
{
    "_id": 1,
    "city": "Rome",
    "lat": 41.8902614,
    "lon": 12.4930871,
    "name": "Coliseum",
    "note": "Also known as the Flavian Amphitheatre",
    "_etag": {
        "$oid": "5d249beebb77e333b6dc9c83"
    }
}]

Query parameters

The CSV uploader service is controlled by the following query parameters.

query parameter description default value
db (required) the name of the database no default
coll (required) the name of the collection no default
id id column index no id column
sep column separator ,
update if true, update matching documents false
upsert applies when update=true; if true, create new document if no documents match the _id true
props additional properties to add to each row, e.g. ?props=foo&props=bar no props
values values of additional properties to add to each row e.g. ?values=1&values=2 no values

If the id parameter is not specified, a document is created with a new ObjectId per each CSV row.

Update documents from CSV

If the CSV lines are changed or new ones are added, you can update your collection with the update and upsert parameters.

To update your collection use the update parameter.

New lines in the CSV will NOT be added.

POST /csv?db=restheart&coll=poi&id=0&update=true HTTP/1.1
Content-Type: text/csv

id,name,city,lat,lon,note
1,Coliseum,Rome,41.8902614,12.4930871,Also known as the Flavian Amphitheatre -UPDATED-
2,Duomo,Milan,45.464278,9.190596,Milan Cathedral -UPDATED-
3,Cattedrale di Santa Maria del Fiore,43.773251,11.255474,Florence Cathedral

To update existing documents and add new ones, add the upsert=true query parameter:

POST /csv?db=restheart&coll=poi&id=0&update=true&upsert=true HTTP/1.1
Content-Type: text/csv

id,name,city,lat,lon,note
1,Coliseum,Rome,41.8902614,12.4930871,Also known as the Flavian Amphitheatre
2,Duomo,Milan,45.464278,9.190596,Milan Cathedral
3,Cattedrale di Santa Maria del Fiore,43.773251,11.255474,Florence Cathedral

Transform Data

The CSV format allows creating flat documents. Using an Interceptor data can be modified to take advantage of the nested nature of JSON.

We will use csv-interceptor on the restheart-examples repository.

Clone the restheart-examples repository

$ git clone https://github.com/SoftInstigate/restheart-examples.git

Build the examples:

$ cd restheart-examples
$ mvn package

Deploy the csv-interceptor

$ cp csv-interceptor/target/csv-interceptor.jar <restheart>/plugins

Restarting RESTHeart, the plugins will be automatically deployed.

The coordsToGeoJson Interceptor

The code of the coordsToGeoJson follows:

@RegisterPlugin(name = "coordsToGeoJson", 
        description = "transforms cordinates array to GeoJSON point object for csv loader service")
public class CoordsToGeoJson implements Interceptor<BsonFromCsvRequest, BsonResponse> {
    @Override
    public void handle(BsonFromCsvRequest request, BsonResponse response) throws Exception {
        var docs = request.getContent();

        if (docs == null) {
            return;
        }

        docs.stream()
                .map(doc -> doc.asDocument())
                .filter(doc -> doc.containsKey("lon") && doc.containsKey("lat"))
                .forEachOrdered(doc -> {
                    // get Coordinates
                    var coordinates = new BsonArray();
                    coordinates.add(doc.get("lon"));
                    coordinates.add(doc.get("lat"));

                    var point = new BsonDocument();

                    point.put("type", new BsonString("Point"));
                    point.put("coordinates", coordinates);

                    // Add the object to the document
                    doc.append("point", point);
                });
    }

    @Override
    public boolean resolve(BsonFromCsvRequest request, BsonResponse response) {
        return request.isHandledBy("csvLoader")
                && request.isPost()
                && "/csv".equals(request.getPath());
    }
}

Note that the resolve() method returns true for POST requests on the /csv URI (where the csvLoader service is bound).

The handle() method receives the BsonFromCsvRequest object that contains a BsonArray of documents parsed from the uploaded CSV data. It uses a stream to process all documents containing the properties lon and lat to add the corresponding GeoJSON object.

The interceptor implements the Interceptor interface specifying the parametric types BsonFromCsvRequest and BsonResponse. This is mandatory since an interceptor can intercept requests handled by services that use the same exact types (Check the code of CsvLoader service, it implements the parametric Service interface using those types).

public class CoordsToGeoJson implements Interceptor<BsonFromCsvRequest, BsonResponse>

After uploading csv data the result is the following. The GeoJSON field is point.

POST /csv?db=restheart&coll=poi&id=0 HTTP/1.1
Content-Type: text/csv

id,name,city,lat,lon,note
1,Coliseum,Rome,41.8902614,12.4930871,Also known as the Flavian Amphitheatre
2,Duomo,Milan,45.464278,9.190596,Milan Cathedral
GET /poi HTTP/1.1
Response
[
    {
        "_etag": {
            "$oid": "5ed905845db98d3376dc30c8"
        },
        "_id": 2,
        "city": "Milan",
        "lat": 45.464278,
        "lon": 9.190596,
        "name": "Duomo",
        "note": "Milan Cathedral",
        "point": {
            "coordinates": [
                9.190596,
                45.464278
            ],
            "type": "Point"
        }
    },
    {
        "_etag": {
            "$oid": "5ed905845db98d3376dc30c7"
        },
        "_id": 1,
        "city": "Rome",
        "lat": 41.8902614,
        "lon": 12.4930871,
        "name": "Coliseum",
        "note": "Also known as the Flavian Amphitheatre",
        "point": {
            "coordinates": [
                12.4930871,
                41.8902614
            ],
            "type": "Point"
        }
    }
]