Parsing Configuration

A parsing configuration consists of the following parts:

  • selector that defines which fields to pick from a json
  • name under which the extracted data is stored
  • castType that defines what the value is cast to (Note: for a recursive selector that extracts a sequence of single-value fields, use the single-value cast type, that is if you use a recursive selector and each single extracted element is a string, you will use castType ‘STRING’, not ‘SEQ[STRING]’. If every element is a list of strings, ud use ‘SEQ[STRING]’)

The selector syntax is straight-forward. Let’s use the following json as example:

{
  "response": {
    "numFound":  10,
    "docs": [
      {
        "product_id": "id1",
        "description": "yummy yummy",
        "title": "yummy",
        "innerJson": {
          "key1": "value1"
        }
      }
    ]   
  }
}

Now we distinguish between plain and recursive selectors, while both selectors can be combined:

  • plain: \ is the selector. Can apply multiple to navigate deeper into a structure. Example: response \ numFound (in this case castType should be set to INT).
  • recursive: \\ is the selector. Is used to extract sequential values from a list of jsons. Example: response \ docs \\ product_id (in this case castType should be set to STRING, although the result of applying the selector will be a list of strings). If your recursive selector picks up elements that are themselves json objects, you can pick a field by just applying another plain selector, as in response \ docs \\ innerJson \ key1