DataParser
class provides a standardized interface for converting source data into the format required by a defined schema. It supports flexible field mapping and serves as the foundation for all data parsing operations in Superlinked.
Constructor
Create a new data parser for a specific schema with optional field mapping.Parameters
The target schema object that describes the desired output format. This schema
defines the structure and fields that the parsed data should conform to.
Optional field mapping rules that define how source data fields correspond to schema fields. Specified as
SchemaField
to str
pairs, such as {movie_schema.title: "movie_title"}
.Example
The constructor will raise an
InvalidInputException
if the schema parameter
is of an invalid type.Methods
unmarshal()
Parse source data into the desired schema format using the defined mapping rules.Source data that corresponds to the DataParser’s expected input type.
list[ParsedSchema]
- A list of ParsedSchema objects conforming to the target schema.
Example
marshal()
Convert previously parsed data back to its original source format.Previously parsed data that follows the schema format of the DataParser.
list[SourceTypeT]
- A list of data in the original source format.
Example
Inheritance Hierarchy
TheDataParser
class serves as an abstract base class for specialized parsers:
Specialized Parsers
Specialized Parsers
- DataFrameParser - For pandas DataFrame processing - JsonParser - For JSON data handling
Use Cases
Custom Field Mapping
Map source data fields to different schema field names:Multi-Format Data Processing
Use as a base for creating parsers that handle various data formats:Best Practices
Mapping Strategy: Define comprehensive field mappings upfront to avoid
data transformation issues. Use descriptive mapping keys that clearly indicate
the source field names.
Schema Validation: Ensure your source data contains all required fields
defined in the target schema before calling
unmarshal()
.Type Safety: The generic typing ensures type safety between the source
data format and schema. Maintain consistency in your type annotations.