DataFrameParser
class specializes in parsing pandas DataFrames into schema-compliant data structures. It extends the base DataParser
functionality to handle tabular data with column-based mapping.
Constructor
Create a new DataFrame parser for a specific schema with optional column mapping.Parameters
The target schema object that describes the desired output format. This schema
defines the structure and fields that the parsed DataFrame should conform to.
Optional column mapping rules that define how DataFrame columns correspond to
schema fields. Specified as
SchemaField
to column name pairs.Example
The constructor will raise an
InvalidInputException
if the schema parameter
is of an invalid type.Methods
unmarshal_single()
Parse a pandas DataFrame into schema-compliant data using the defined column mapping.The pandas DataFrame to parse. Each row will be converted to a ParsedSchema
object.
list[ParsedSchema]
- A list of ParsedSchema objects, one for each row in the DataFrame.
Example
Inheritance
TheDataFrameParser
inherits from the base DataParser
class and implements its abstract methods specifically for pandas DataFrame handling.
Inheritance Chain: DataFrameParser
→ DataParser
→ ABC
+ Generic
Use Cases
CSV Data Processing
Perfect for processing CSV files loaded into pandas DataFrames:Data Cleaning and Transformation
Handle data cleaning during the parsing process:Batch Processing
Efficiently process large datasets in batches:Best Practices
Column Mapping: Always define explicit column mappings when your DataFrame
column names don’t exactly match your schema field names. This ensures data
consistency.
Data Types: Ensure your DataFrame column types are compatible with your
schema field types. Pandas will attempt automatic type conversion, but
explicit conversion is more reliable.
Missing Columns: If a mapped column is missing from the DataFrame, the
parsing will fail. Validate your DataFrame structure before parsing.
Performance: For large DataFrames, consider processing in chunks to manage
memory usage effectively.