JsonParser - Superlinked

The JsonParser class specializes in parsing JSON objects into schema-compliant data structures. It extends the base DataParser functionality to handle nested JSON data with dot-separated path mapping for accessing nested fields.

Constructor

Create a new JSON parser for a specific schema with optional path mapping.

JsonParser(schema, mapping=None)

Parameters

schema

IdSchemaObjectT

required

The target schema object that describes the desired output format. This schema defines the structure and fields that the parsed JSON should conform to.

mapping

Mapping[SchemaField, str]

default:"None"

Optional path mapping rules that define how JSON paths correspond to schema fields. Uses dot-separated notation to access nested JSON properties.

Example

from superlinked import JsonParser, schema

@schema
class UserSchema:
    id: str
    name: str
    email: str
    age: int
    city: str

user_schema = UserSchema()

# Create parser with nested path mapping
parser = JsonParser(
    schema=user_schema,
    mapping={
        user_schema.id: "user.id",
        user_schema.name: "user.profile.fullName",
        user_schema.email: "user.contact.email",
        user_schema.age: "user.profile.age",
        user_schema.city: "user.address.city"
    }
)

The constructor will raise an InvalidInputException if the schema parameter is of an invalid type.

Methods

unmarshal()

Parse JSON data into schema-compliant objects using the defined path mapping.

unmarshal(data: Sequence[dict[str, Any]]) -> list[ParsedSchema]

data

Sequence[dict[str, Any]]

required

The JSON data to parse. A sequence of JSON objects.

Returns: list[ParsedSchema] - A list of ParsedSchema objects, one for each JSON object processed.

Example

# Sample nested JSON data
json_data = {
    "user": {
        "id": "U123",
        "profile": {
            "fullName": "John Doe",
            "age": 30
        },
        "contact": {
            "email": "[email protected]"
        },
        "address": {
            "city": "New York"
        }
    }
}

# Parse JSON objects (now requires a sequence)
parsed_data = parser.unmarshal([json_data])

# Parse multiple JSON objects
json_array = [json_data, another_json_object]
parsed_array = parser.unmarshal(json_array)

Path Mapping

The JsonParser uses dot-separated notation to access nested JSON properties:

Simple Field Access

mapping = {
    schema.name: "name",           # Root level field
    schema.email: "contact.email"  # Nested field
}

Complex Nested Access

mapping = {
    schema.product_name: "product.details.name",
    schema.price: "pricing.amount.value",
    schema.category: "product.category.primary.name"
}

Array Access

Access array elements by index:

mapping = {
    schema.first_tag: "tags.0",      # First element
    schema.primary_image: "images.0.url"  # Nested in array
}

Use Cases

API Response Processing

Perfect for processing API responses with nested data:

# API response from user service
api_response = {
    "status": "success",
    "data": {
        "users": [
            {
                "id": "1",
                "profile": {"name": "Alice", "age": 25},
                "preferences": {"theme": "dark"}
            }
        ]
    }
}

# Map API structure to schema
parser = JsonParser(
    user_schema,
    mapping={
        user_schema.id: "data.users.0.id",
        user_schema.name: "data.users.0.profile.name"
    }
)

Configuration File Processing

Parse complex configuration files:

# Configuration JSON
config = {
    "database": {
        "host": "localhost",
        "connection": {
            "timeout": 30,
            "pool_size": 10
        }
    }
}

@schema
class DatabaseConfig:
    host: str
    timeout: int
    pool_size: int

config_parser = JsonParser(
    DatabaseConfig(),
    mapping={
        DatabaseConfig().host: "database.host",
        DatabaseConfig().timeout: "database.connection.timeout",
        DatabaseConfig().pool_size: "database.connection.pool_size"
    }
)

Multi-Source Data Integration

Combine data from different JSON sources:

# Different API formats
source_a = {"user_info": {"name": "John"}}
source_b = {"profile": {"user": {"name": "John"}}}

# Same schema, different mappings
parser_a = JsonParser(user_schema, {user_schema.name: "user_info.name"})
parser_b = JsonParser(user_schema, {user_schema.name: "profile.user.name"})

Best Practices

Path Validation: Always validate that the JSON paths in your mapping exist in your source data. Missing paths will cause parsing failures.

Type Consistency: Ensure the values at your mapped JSON paths match the expected schema field types. The parser will attempt type conversion but explicit handling is safer.

Deep Nesting: Very deep nesting paths can impact performance and readability. Consider flattening complex JSON structures when possible.

Array Handling: When working with arrays, remember that indices are zero-based. Consider using multiple parsers for different array element patterns.

Integration Example

from superlinked import JsonParser, TextSimilaritySpace, CategoricalSimilaritySpace

# Define schema for product data
@schema
class ProductSchema:
    id: str
    name: str
    description: str
    category: str
    price: float

product_schema = ProductSchema()

# Create parser for e-commerce API format
parser = JsonParser(
    product_schema,
    mapping={
        product_schema.id: "product.sku",
        product_schema.name: "product.title",
        product_schema.description: "product.details.description",
        product_schema.category: "product.category.name",
        product_schema.price: "pricing.current.amount"
    }
)

# Process API data
api_data = fetch_products_from_api()
parsed_products = parser.unmarshal(api_data)

# Create vector spaces
text_space = TextSimilaritySpace(text=product_schema.description)
category_space = CategoricalSimilaritySpace(
    category_input=product_schema.category,
    categories=["electronics", "clothing", "books"]
)

Reference

​Constructor

​Parameters

​Example

​Methods

​unmarshal()

​Example

​Path Mapping

​Simple Field Access

​Complex Nested Access

​Array Access

​Use Cases

​API Response Processing

​Configuration File Processing

​Multi-Source Data Integration

​Best Practices

​Integration Example

Constructor

Parameters

Example

Methods

unmarshal()

Example

Path Mapping

Simple Field Access

Complex Nested Access

Array Access

Use Cases

API Response Processing

Configuration File Processing

Multi-Source Data Integration

Best Practices

Integration Example