GTFS Data Import Pipeline

Understand in detail how Podaris handles GTFS data

The GTFS Data Pipeline in Podaris is a sophisticated system that transforms raw GTFS data into Podaris-specific data structures optimized for transit planning and analysis. This process involves several key steps and transformations, which may vary depending on which of the Advanced Import options are selected. This pipeline is designed to balance fidelity to the original GTFS data with the need for efficient representation and analysis in Podaris. The various import options allow users to fine-tune this balance according to their specific project needs:

1. Data Ingestion

The pipeline reads the GTFS zip file, extracting and parsing relevant .txt files (agency.txt, stops.txt, calendar.txt, trips.txt, routes.txt, etc.).
It applies user-defined filters for agencies, geographic areas, and date ranges.

2. Data Transformation

Agency and Stop Processing

Agencies and stops are converted into Podaris-specific structures.

Route and Pattern Generation

GTFS routes are converted into Podaris Services.
For each transport mode, an infrastructure layer is created to host the stops which are served by that mode.
Depending on the "Use shapes.txt" option:
- If Yes: Route geometries are derived from the shapes.txt file. These will be hard-coded and sometimes can be challenging to edit, but preserve the fidelity of the source data.
- If No: Route geometries are either auto-generated using street networks or created as straight lines between stops, either of which are easily editable.
Service Patterns are created based on unique stop sequences and shape data.
- Note that GTFS can reuse the same shapes across multiple different services, with not all of them calling at the same sequence of stations. In Podaris, this means that separate patterns will be created for each variant.
- If the "Use shape_id for pattern name" is Yes, then the shape_id will be applied to the patterns, which may be useful for analytical purposes.

Trip Processing

The handling of trips depends on the "Use trips.txt" and "Create individual trips" options:
- If "Use trips.txt" is Yes:
  - Detailed trip information is preserved.
  - If "Create individual trips" is Yes: Each GTFS trip becomes a separate Podaris trip.
  - If "Create individual trips" is No: Trips with identical patterns and schedules are aggregated.
- If "Use trips.txt" is No:
  - Trips are aggregated into frequency-based services where appropriate.
  - This results in a simplified trip structure that maintains overall service levels but may not exactly match the input data.

Calendar Processing

If the "Identical Calendars" option is set to "Merge/De-duplicate", the pipeline combines identical service patterns to reduce redundancy.

3. Data Optimization

Based on the selected preset (Strategic Planning, Network Analysis, or Detailed Editing), the pipeline applies various optimizations:
- Simplifying route patterns for easier editing (Strategic Planning)
- Dropping detailed route geometry for faster analysis (Network Analysis)
- Preserving full data fidelity (Detailed Editing)

4. Additional Processing

The pipeline handles frequency-based services, calculating headways and service spans.
Depending on the "Pattern Timings" option, it includes varying levels of stop timing information.

5. Output Generation

The pipeline generates a Podaris data structure containing:

Services (with associated Patterns and trips)
Calendars
Agencies
Vehicles
Stations