RestSource
class provides a production-ready data source implementation that receives data through HTTP REST endpoints. It’s designed for scalable applications where data is ingested via API calls rather than programmatic methods.
Constructor
Parameters
The schema object that defines the structure of data this source will handle. All data received through REST endpoints must conform to this schema.
Optional data parser for processing input data received through REST endpoints. If None, an appropriate parser is selected based on the data format.
Optional REST descriptor for configuring the REST API behavior and endpoint definitions.
Properties
The API endpoint path where this source accepts data through HTTP requests.
Inheritance
TheRestSource
extends several classes for online data processing:
Inheritance Chain:
RestSource
- →
OnlineSource
- →
TransformerPublisher
- →
Source
- →
Generic
- Base Source functionality from
Source
- Online processing capabilities from
OnlineSource
- Event publishing from
TransformerPublisher
- Generic type support for schema types
REST API Integration
Endpoint Configuration
TheRestSource
automatically creates HTTP endpoints for data ingestion:
- Data ingestion endpoints for receiving new data
- Batch processing endpoints for handling multiple records
- Status endpoints for monitoring source health
Request Processing
Incoming HTTP requests are processed through the following pipeline:- Request validation against API contracts
- Data extraction from request body
- Schema validation against the associated schema
- Parser processing if custom parser is configured
- Vector generation through associated spaces
- Index updates for immediate search availability
Production Features
Scalability
- Concurrent Processing: Handle multiple simultaneous requests
- Asynchronous Operations: Non-blocking data processing
- Load Balancing: Support for distributed deployments
Reliability
- Error Handling: Comprehensive error response handling
- Data Validation: Schema validation with detailed error messages
- Request Logging: Full audit trail of API requests
Security
- Authentication Support: Integration with authentication systems
- Rate Limiting: Built-in protection against abuse
- Data Sanitization: Input validation and sanitization
Use Cases
Real-Time Applications
Ideal for applications that need to ingest data from:- Web applications submitting user-generated content
- Mobile applications sending usage data and content
- IoT devices streaming sensor data
- Third-party integrations via webhook endpoints
Microservices Architecture
Perfect for microservices that need to:- Receive data from other services via REST APIs
- Process events from event-driven architectures
- Handle webhooks from external systems
- Provide data ingestion APIs for client applications
Content Management Systems
Suitable for systems that handle:- User-generated content from web interfaces
- Content publishing from editorial systems
- Media uploads through API endpoints
- Metadata updates from content management tools
API Response Handling
Success Responses
RestSource provides appropriate HTTP status codes:- 200 OK: Successful data processing
- 201 Created: New data successfully indexed
- 202 Accepted: Data queued for processing
Error Responses
Comprehensive error handling with detailed messages:- 400 Bad Request: Schema validation failures
- 422 Unprocessable Entity: Data format issues
- 500 Internal Server Error: Processing failures
Integration with Applications
RestSource integrates with Superlinked REST applications to provide API endpoints:- Automatic Endpoint Generation: Creates REST endpoints based on source configuration
- OpenAPI Documentation: Generates API documentation automatically
- Request/Response Validation: Ensures data consistency
- Monitoring Integration: Provides metrics and health checks
Performance Characteristics
Advantages
- Production Ready: Designed for high-throughput production environments
- Scalable: Supports horizontal scaling and load balancing
- Reliable: Built-in error handling and recovery mechanisms
- Monitoring: Comprehensive logging and metrics
Considerations
- Network Latency: Dependent on network conditions for data ingestion
- HTTP Overhead: Additional protocol overhead compared to in-memory sources
- External Dependencies: Requires proper HTTP client configuration
Best Practices
API Design
RESTful Endpoints: Follow REST conventions for intuitive API design and easier integration with client applications.
Error Handling
Schema Validation: Ensure robust schema validation at the API level to prevent processing failures downstream.
Performance Optimization
Batch Processing: Design APIs to accept batch data when possible to reduce HTTP overhead and improve throughput.
Security
Authentication: Always implement proper authentication and authorization for production REST endpoints.
Monitoring
Health Checks: Implement health check endpoints to monitor source availability and processing status.
Deployment Considerations
Infrastructure Requirements
- Load Balancers: For distributing incoming requests
- SSL/TLS: For secure data transmission
- Monitoring Systems: For tracking API performance and health
Configuration Management
- Environment Variables: For different deployment environments
- Configuration Files: For complex REST descriptor settings
- Secret Management: For API keys and authentication credentials