DataParser - Superlinked

The DataParser class provides a standardized interface for converting source data into the format required by a defined schema. It supports flexible field mapping and serves as the foundation for all data parsing operations in Superlinked.

Constructor

Create a new data parser for a specific schema with optional field mapping.

DataParser(schema, mapping=None)

Parameters

schema

IdSchemaObjectT

required

The target schema object that describes the desired output format. This schema defines the structure and fields that the parsed data should conform to.

mapping

Mapping[SchemaField, str]

default:"None"

Optional field mapping rules that define how source data fields correspond to schema fields. Specified as SchemaField to str pairs, such as {movie_schema.title: "movie_title"}.

Example

from superlinked import DataParser, schema, SchemaField

@schema
class MovieSchema:
    title: str
    year: int
    genre: str

movie_schema = MovieSchema()

# Create parser with field mapping
parser = DataParser(
    schema=movie_schema,
    mapping={
        movie_schema.title: "movie_title",
        movie_schema.year: "release_year",
        movie_schema.genre: "category"
    }
)

The constructor will raise an InvalidInputException if the schema parameter is of an invalid type.

Methods

unmarshal()

Parse source data into the desired schema format using the defined mapping rules.

unmarshal(data: Sequence[SourceTypeT]) -> list[ParsedSchema]

data

Sequence[SourceTypeT]

required

Source data that corresponds to the DataParser’s expected input type.

Returns: list[ParsedSchema] - A list of ParsedSchema objects conforming to the target schema.

Example

# Parse source data
source_data = [{
    "movie_title": "The Matrix",
    "release_year": 1999,
    "category": "Sci-Fi"
}]

parsed_data = parser.unmarshal(source_data)

marshal()

Convert previously parsed data back to its original source format.

marshal(parsed_schemas: ParsedSchema | list[ParsedSchema]) -> list[SourceTypeT]

parsed_schemas

ParsedSchema | list[ParsedSchema]

required

Previously parsed data that follows the schema format of the DataParser.

Returns: list[SourceTypeT] - A list of data in the original source format.

Example

# Convert parsed data back to source format
original_format = parser.marshal(parsed_data)

Inheritance Hierarchy

The DataParser class serves as an abstract base class for specialized parsers:

Specialized Parsers

DataFrameParser - For pandas DataFrame processing - JsonParser - For JSON data handling

Use Cases

Custom Field Mapping

Map source data fields to different schema field names:

# Source has different field names than schema
mapping = {
    product_schema.name: "product_title",
    product_schema.price: "cost",
    product_schema.description: "product_info"
}

parser = DataParser(product_schema, mapping=mapping)

Multi-Format Data Processing

Use as a base for creating parsers that handle various data formats:

class CustomParser(DataParser):
    def unmarshal(self, data):
        # Custom parsing logic
        return self._parse_custom_format(data)

Best Practices

Mapping Strategy: Define comprehensive field mappings upfront to avoid data transformation issues. Use descriptive mapping keys that clearly indicate the source field names.

Schema Validation: Ensure your source data contains all required fields defined in the target schema before calling unmarshal().

Type Safety: The generic typing ensures type safety between the source data format and schema. Maintain consistency in your type annotations.

Reference

​Constructor

​Parameters

​Example

​Methods

​unmarshal()

​Example

​marshal()

​Example

​Inheritance Hierarchy

​Use Cases

​Custom Field Mapping

​Multi-Format Data Processing

​Best Practices

Constructor

Parameters

Example

Methods

unmarshal()

Example

marshal()

Example

Inheritance Hierarchy

Use Cases

Custom Field Mapping

Multi-Format Data Processing

Best Practices