records_mover package¶
Module contents¶
-
class
records_mover.
Session
(default_db_creds_name=None, default_aws_creds_name=<PleaseInfer.token: 1>, default_gcp_creds_name=<PleaseInfer.token: 1>, session_type=<PleaseInfer.token: 1>, scratch_s3_url=<PleaseInfer.token: 1>, creds=<PleaseInfer.token: 1>, default_db_facts=<PleaseInfer.token: 1>, default_boto3_session=<PleaseInfer.token: 1>, default_gcp_creds=<PleaseInfer.token: 1>, default_gcs_client=<PleaseInfer.token: 1>)¶ Bases:
object
-
__init__
(default_db_creds_name=None, default_aws_creds_name=<PleaseInfer.token: 1>, default_gcp_creds_name=<PleaseInfer.token: 1>, session_type=<PleaseInfer.token: 1>, scratch_s3_url=<PleaseInfer.token: 1>, creds=<PleaseInfer.token: 1>, default_db_facts=<PleaseInfer.token: 1>, default_boto3_session=<PleaseInfer.token: 1>, default_gcp_creds=<PleaseInfer.token: 1>, default_gcs_client=<PleaseInfer.token: 1>)¶ This is an object which ties together configuration on how to do key things in order to move records.
It tries to autoconfigure as much as possible - in many cases you won’t need to specify any constructor arguments at all.
Generally unless otherwise configured, this class will look up and use the default credentials for things like AWS and GCP if they exist and are needed for an operation. When running in a managed environment like Apache Airflow (session_type = “airflow”), that might mean looking up an Airflow Connection via the Airflow Python API. On the command line (session_type = “cli”), that might mean using e.g., the AWS or GCP Python APIs to pull any default credentials which have been configured. In other environments (e.g., containerized systems) you may way want to use environment variables whenever possible to specify exactly what is desired (sesssion_type = ‘env’).
- Parameters
default_db_creds_name (Optional[str]) – Name of the database credential to used when
records_mover.Session.get_default_db_engine()
is called. If not specified, the default will depend on the session type.default_aws_creds_name (Union[None, str, records_mover.mover_types.PleaseInfer]) – Name of the AWS IAM credential to used when needed, e.g. when reading or writing to an s3:// URL. This will be inferred unless directly specified.
default_gcp_creds_name (Union[None, str, records_mover.mover_types.PleaseInfer]) – Name of the GCP Cloud IAM credential to used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.
session_type (Union[str, records_mover.mover_types.PleaseInfer]) – What assumptions to use when inferring and/or looking up credentials. Valid values of “airflow” (for code running in Apache Airflow), “cli” (for running on the command-line”, “lpass” (for using the LastPass password manager for credentials), and ‘env’ (for looking up credentials via environment variables). This will be inferred unless directly specified.
scratch_s3_url (Union[None, str, records_mover.mover_types.PleaseInfer]) – An s3:// URL used as a base directory where temporary files/directories can be created. This is necessary for Amazon Redshift, which supports only S3 for bulk import/export.
default_db_facts (Union[records_mover.mover_types.PleaseInfer, Dict[str, Any]]) – Information about the database connection that should be made. This is a dictionary with string keys of type DBFacts
default_boto3_session (Optional[Union[records_mover.mover_types.PleaseInfer, boto3.session.Session]]) – The boto3.Session object used when needed, e.g. when reading or writing to an s3:// URL. This will be inferred unless directly specified.
default_gcp_creds (Optional[Union[records_mover.mover_types.PleaseInfer, google.auth.credentials.Credentials]]) – The google.auth.credentials.Credentials object to used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.
default_gcs_client (Optional[Union[records_mover.mover_types.PleaseInfer, google.cloud.storage.Client]]) – The google.cloud.storage.Client object to be used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.
creds (Union[records_mover.creds.base_creds.BaseCreds, records_mover.mover_types.PleaseInfer]) – Experimental interface; do not use.
- Return type
None
-
get_default_db_engine
()¶ Provide the database object corresponding to the default database credentials. The details of how that credential is looked up depends on the session_type determined in the constructor, but can be overridden using the default_db_creds_name parameter.
- Returns
SQLALchemy Engine object
- Return type
Engine
-
get_db_engine
(db_creds_name, creds_provider=None)¶ Provide a database object corresponding to a given credential name. The details of how that credential is looked up depends on the session_type determined in the constructor.
- Parameters
db_creds_name (str) – Credential name to look up using the configured credentials provider.
creds_provider (Optional[records_mover.creds.base_creds.BaseCreds]) –
- Returns
SQLALchemy Engine object
- Return type
Engine
-
set_stream_logging
(name='records_mover', level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, fmt='%(asctime)s - %(message)s', datefmt='%H:%M:%S')¶ records-mover logs details about its operations using Python logging. This method is a simple way to configure that logging to be output to a stream (by default, stdout).
You can use it for other things (e.g., dependencies of records-mover) by adjusting the ‘name’ argument.
- Parameters
name (str) – Name of the package to set logging under. If set to ‘foo’, you can set a log variable FOO_LOG_LEVEL to the log level threshold you’d like to set (INFO/WARNING/etc) - so you can by default set, say, export RECORDS_MOVER_LOG_LEVEL=WARNING to quiet down loging, or export RECORDS_MOVER_LOG_LEVEL=DEBUG to increase it.
level (int) – Logging more detailed than this will not be output to the stream.
stream (IO[str]) – Stream which logging should be sent (e.g., sys.stdout, sys.stdin, or perhaps a file you open)
fmt (str) – Logging format to send to Python’slogging.Formatter() - determines what details will be sent.
datefmt (str) – Date format to send to Python’slogging.Formatter() - determines how the current date/time will be recorded in the log.
- Return type
None
-
property
records
¶ Property containing a
records_mover.Records
object pre-configured with configuration using this Session. Once you have a Session object constructed, this is your jumping off point to moving records.
-
-
class
records_mover.
Records
(db_driver=<PleaseInfer.token: 1>, url_resolver=<PleaseInfer.token: 1>, session=<PleaseInfer.token: 1>)¶ Bases:
object
To move records from one place to another, you can use the methods on this object.
This object should be pulled from the ‘records’ property on a
records_mover.Session
object instead of being constructed directly.To move data, you can call the
records_mover.records.move()
method, which is aliased for your convenience on this object.Example:
records = session.records db_engine = session.get_default_db_engine() url = 's3://some-bucket/some-directory/' source = records.sources.directory_from_url(url=url) target = records.targets.table(schema_name='myschema', table_name='mytable', db_engine=db_engine) results = records.move(source, target)
-
move
: Callable¶ Alias of
records_mover.records.move()
-
sources
: RecordsSources¶ Object containing factory methods to create various sources from which to copy records, of type
records_mover.records.sources.RecordsSources
-
targets
: RecordsTargets¶ Object containing factory methods to create various targets to which records can be copied, of type
records_mover.records.targets.RecordsTargets
-
-
records_mover.
set_stream_logging
(name='records_mover', level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, fmt='%(asctime)s - %(message)s', datefmt='%H:%M:%S')¶ records-mover logs details about its operations using Python logging. This method is a simple way to configure that logging to be output to a stream (by default, stdout).
You can use it for other things (e.g., dependencies of records-mover) by adjusting the ‘name’ argument.
- Parameters
name (str) – Name of the package to set logging under. If set to ‘foo’, you can set a log variable FOO_LOG_LEVEL to the log level threshold you’d like to set (INFO/WARNING/etc) - so you can by default set, say, export RECORDS_MOVER_LOG_LEVEL=WARNING to quiet down loging, or export RECORDS_MOVER_LOG_LEVEL=DEBUG to increase it.
level (int) – Logging more detailed than this will not be output to the stream.
stream (IO[str]) – Stream which logging should be sent (e.g., sys.stdout, sys.stdin, or perhaps a file you open)
fmt (str) – Logging format to send to Python’slogging.Formatter() - determines what details will be sent.
datefmt (str) – Date format to send to Python’slogging.Formatter() - determines how the current date/time will be recorded in the log.
- Return type
None
-
records_mover.
move
(records_source, records_target, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>)¶ Copy records from one location to another. Applies a sequence of possible techniques to do this in an efficient way and respects the preferences set in records_source, records_target and processing_instructions.
Example use:
records = session.records db_engine = session.get_default_db_engine() url = 's3://some-bucket/some-directory/' source = records.sources.directory_from_url(url=url) target = records.targets.table(schema_name='myschema', table_name='mytable', db_engine=db_engine) results = records.move(source, target)
- Parameters
records_source (records_mover.records.sources.base.RecordsSource) – object returned by a factory method in
records_mover.records.sources.RecordsSources
which represents the place we’re copying records from.records_target (records_mover.records.targets.base.RecordsTarget) – object returned by a factory method in
records_mover.records.targets.RecordsTargets
which represents the place we’re copying records to.processing_instructions (records_mover.records.ProcessingInstructions) – Directives on how to handle different situations when processing files.
- Return type