records_mover.records package
Submodules
records_mover.records.base_records_format module
- class records_mover.records.base_records_format.BaseRecordsFormat
Bases:
object
This represents the information needed to be able to parse a set of records data files.
See the records format specification for more detail.
To create an instance, see
ParquetRecordsFormat
orDelimitedRecordsFormat
Module contents
- class records_mover.records.DelimitedRecordsFormat(variant='bluelabs', hints={}, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>)
Bases:
BaseRecordsFormat
Describes records data files in delimited (CSV) format.
- Parameters
variant (str) –
hints (Mapping[str, object]) –
processing_instructions (ProcessingInstructions) –
- __init__(variant='bluelabs', hints={}, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>)
See the records format documentation for full details on parameters.
- Parameters
variant (str) – For a given type (especially delimited), describe the subtype of the format. For ‘delimited’, valid values include ‘dumb’, ‘csv’, ‘bluelabs’, ‘bigquery’ and ‘vertica’.
hints (PartialRecordsHints) –
Dictionary of names of delimited hints mapping to their values. See the records format specification for hints and valid values.
processing_instructions (records_mover.records.ProcessingInstructions) – Directives on how to handle different situations when processing files.
- Return type
None
- class records_mover.records.ParquetRecordsFormat
Bases:
BaseRecordsFormat
Describes records files in Parquet format
- __init__()
Create a new instance of ParquetRecordsFormat
- Return type
None
- class records_mover.records.ProcessingInstructions(fail_if_dont_understand=True, fail_if_cant_handle_hint=True, fail_if_row_invalid=True, max_inference_rows=1000000, max_failure_rows=None)
Bases:
object
- Parameters
fail_if_dont_understand (bool) –
fail_if_cant_handle_hint (bool) –
fail_if_row_invalid (bool) –
max_inference_rows (Optional[int]) –
max_failure_rows (Optional[int]) –
- __init__(fail_if_dont_understand=True, fail_if_cant_handle_hint=True, fail_if_row_invalid=True, max_inference_rows=1000000, max_failure_rows=None)
Directives on how to handle different situations when processing records. Note that not all vendor mechanisms support this level of configurability; when choosing between optimizing for fast transfer and ability to comply, Records Mover will favor fast transfer.
- Parameters
fail_if_dont_understand (bool) – If True, and a part of the RecordsFormat is not understood while processing, then immediately fail and raise an exception. Otherwise, ignore the misunderstood instruction (e.g., ignore the hint, assume default variant, etc etc)
fail_if_cant_handle_hint (bool) – If True, and for whatever reason (e.g., limited options in whatever library/tool/database is being used) a certain hint can’t be handled as specified, raise an exception. Otherwise, ignore the hint and use implementation-specific different behavior.
fail_if_row_invalid (bool) – If True, and a particular row of data in the records file cannot be understood by the library, raise an exception. Otherwise, ignore the row and continue and try to load other rows.
max_failure_rows (Optional[int]) – Sets a tolerance level for number of rows of data in the records file that cannot be understood by the library that should be ignored. After reaching level, raise an exception.
max_inference_rows (Optional[int]) – If the schema is not provided and we need it (e.g., we’re to load the records into a database and there’s no existing table), we’ll figure it out through ‘type inference’ - looking at a bunch of examples of data and building a specific schema that can load those rows. This can take some time, so this parameter controls the maximum number of rows we’ll look at. Higher values will be more likely to result in a schema that can be loaded into, but will take longer to load. If set to None, the entire file will be processed.
- Return type
None
- records_mover.records.DelimitedVariant
Valid string values for the variant of a delimited records format. Variants specify a default set of parsing hints for how the delimited file is formatted. See the records format specification for semantics of each.
alias of Literal[dumb, csv, bigquery, bluelabs, vertica]
- class records_mover.records.AvroRecordsFormat
Bases:
BaseRecordsFormat
Describes records files in Avro format
- format_type: typing_extensions.Literal[avro, delimited, parquet]
- generate_filename(basename)
- Parameters
basename (str) –
- Return type
str
- class records_mover.records.ExistingTableHandling(value)
Bases:
Enum
Specifies behavior when an existing table with the same name is found when loading data into a database.
- DELETE_AND_OVERWRITE = 1
Delete data transactionally (typically with a SQL DELETE statement) and then add new data to the existing table. The delete and the load will be done in a single transaction if the database allows for that, but note that some do not, especially while using the most efficient load method.
- TRUNCATE_AND_OVERWRITE = 2
Remove data from the existing table without regard for transactions, data integrity constraints, triggers or redo logs, typically using a SQL TRUNCATE statement. The specific method depends on the database type. This is typically the fastest way to clear a table, but please read your database documentation first and understand the consequences.
- DROP_AND_RECREATE = 3
Remove the target table entirely, typically with a SQL DROP TABLE command.
- APPEND = 4
Leave the target table and current data in place, and add data to table. Note that Records Mover uses the most efficient method of loading data into the table, which may not honor triggers and integrity constraints.
- exception records_mover.records.RecordsFolderNonEmptyException
Bases:
RecordsException
Raised if trying to write records to a non-empty target directory.
- exception records_mover.records.RecordsException
Bases:
Exception
Base class for all records system exceptions.
- class records_mover.records.Records(db_driver=PleaseInfer.token, url_resolver=PleaseInfer.token, session=PleaseInfer.token)
Bases:
object
To move records from one place to another, you can use the methods on this object.
This object should be pulled from the ‘records’ property on a
records_mover.Session
object instead of being constructed directly.To move data, you can call the
records_mover.records.move()
method, which is aliased for your convenience on this object.Example:
records = session.records db_engine = session.get_default_db_engine() url = 's3://some-bucket/some-directory/' source = records.sources.directory_from_url(url=url) target = records.targets.table(schema_name='myschema', table_name='mytable', db_engine=db_engine) results = records.move(source, target)
- Parameters
db_driver (Union[Callable[[Optional[Union[Engine, Connection]], Optional[Connection], Optional[Engine]], DBDriver], PleaseInfer]) –
url_resolver (Union[UrlResolver, PleaseInfer]) –
session (Union[Session, PleaseInfer]) –
- move: Callable
Alias of
records_mover.records.move()
- sources: RecordsSources
Object containing factory methods to create various sources from which to copy records, of type
records_mover.records.sources.RecordsSources
- targets: RecordsTargets
Object containing factory methods to create various targets to which records can be copied, of type
records_mover.records.targets.RecordsTargets
- records_mover.records.move(records_source, records_target, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>)
Copy records from one location to another. Applies a sequence of possible techniques to do this in an efficient way and respects the preferences set in records_source, records_target and processing_instructions.
Example use:
records = session.records db_engine = session.get_default_db_engine() url = 's3://some-bucket/some-directory/' source = records.sources.directory_from_url(url=url) target = records.targets.table(schema_name='myschema', table_name='mytable', db_engine=db_engine) results = records.move(source, target)
- Parameters
records_source (RecordsSource) – object returned by a factory method in
records_mover.records.sources.RecordsSources
which represents the place we’re copying records from.records_target (RecordsTarget) – object returned by a factory method in
records_mover.records.targets.RecordsTargets
which represents the place we’re copying records to.processing_instructions (records_mover.records.ProcessingInstructions) – Directives on how to handle different situations when processing files.
- Return type
- class records_mover.records.MoveResult(move_count, output_urls)
Bases:
NamedTuple
Represents the result of a move() operation between a records source and a records target.
Note that move_count and output_urls may be empty depending on the nature of the move - e.g., a move to a database doesn’t currently map to a formal URL, and whole-file based moves do not currently count the number of records being moved.
- Parameters
move_count (Optional[int]) –
output_urls (Optional[Mapping[str, str]]) –
- move_count: Optional[int]
Number of rows moved (Optional[int])
- output_urls: Optional[Mapping[str, str]]
A dictionary of short string aliases mapping to URLs of the resulting data (Optional[Mapping[str, str]])