records_mover package

Module contents

class records_mover.Session(default_db_creds_name=None, default_aws_creds_name=<PleaseInfer.token: 1>, default_gcp_creds_name=<PleaseInfer.token: 1>, session_type=<PleaseInfer.token: 1>, scratch_s3_url=<PleaseInfer.token: 1>, creds=<PleaseInfer.token: 1>, default_db_facts=<PleaseInfer.token: 1>, default_boto3_session=<PleaseInfer.token: 1>, default_gcp_creds=<PleaseInfer.token: 1>, default_gcs_client=<PleaseInfer.token: 1>, scratch_gcs_url=<PleaseInfer.token: 1>)

Bases: object

__init__(default_db_creds_name=None, default_aws_creds_name=<PleaseInfer.token: 1>, default_gcp_creds_name=<PleaseInfer.token: 1>, session_type=<PleaseInfer.token: 1>, scratch_s3_url=<PleaseInfer.token: 1>, creds=<PleaseInfer.token: 1>, default_db_facts=<PleaseInfer.token: 1>, default_boto3_session=<PleaseInfer.token: 1>, default_gcp_creds=<PleaseInfer.token: 1>, default_gcs_client=<PleaseInfer.token: 1>, scratch_gcs_url=<PleaseInfer.token: 1>)

This is an object which ties together configuration on how to do key things in order to move records.

It tries to autoconfigure as much as possible - in many cases you won’t need to specify any constructor arguments at all.

Generally unless otherwise configured, this class will look up and use the default credentials for things like AWS and GCP if they exist and are needed for an operation. When running in a managed environment like Apache Airflow (session_type = “airflow”), that might mean looking up an Airflow Connection via the Airflow Python API. On the command line (session_type = “cli”), that might mean using e.g., the AWS or GCP Python APIs to pull any default credentials which have been configured. In other environments (e.g., containerized systems) you may way want to use environment variables whenever possible to specify exactly what is desired (sesssion_type = ‘env’).

Parameters
  • default_db_creds_name (Optional[str]) – Name of the database credential to used when records_mover.Session.get_default_db_engine() is called. If not specified, the default will depend on the session type.

  • default_aws_creds_name (Union[None, str, records_mover.mover_types.PleaseInfer]) – Name of the AWS IAM credential to used when needed, e.g. when reading or writing to an s3:// URL. This will be inferred unless directly specified.

  • default_gcp_creds_name (Union[None, str, records_mover.mover_types.PleaseInfer]) – Name of the GCP Cloud IAM credential to used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.

  • session_type (Union[str, records_mover.mover_types.PleaseInfer]) – What assumptions to use when inferring and/or looking up credentials. Valid values of “airflow” (for code running in Apache Airflow), “cli” (for running on the command-line”, “lpass” (for using the LastPass password manager for credentials), and ‘env’ (for looking up credentials via environment variables). This will be inferred unless directly specified.

  • scratch_s3_url (Union[None, str, records_mover.mover_types.PleaseInfer]) – An s3:// URL used as a base directory where temporary files/directories can be created. This is necessary for Amazon Redshift, which supports only S3 for bulk import/export.

  • default_db_facts (Union[records_mover.mover_types.PleaseInfer, Dict[str, Any]]) – Information about the database connection that should be made. This is a dictionary with string keys of type DBFacts

  • default_boto3_session (Optional[Union[records_mover.mover_types.PleaseInfer, boto3.session.Session]]) – The boto3.Session object used when needed, e.g. when reading or writing to an s3:// URL. This will be inferred unless directly specified.

  • default_gcp_creds (Optional[Union[records_mover.mover_types.PleaseInfer, google.auth.credentials.Credentials]]) – The google.auth.credentials.Credentials object to used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.

  • default_gcs_client (Optional[Union[records_mover.mover_types.PleaseInfer, google.cloud.storage.Client]]) – The google.cloud.storage.Client object to be used when needed, e.g. when reading or writing to an gs:// URL. This will be inferred unless directly specified.

  • creds (Union[records_mover.creds.base_creds.BaseCreds, records_mover.mover_types.PleaseInfer]) – Experimental interface; do not use.

  • scratch_gcs_url (Union[None, str, records_mover.mover_types.PleaseInfer]) – A gs:// URL used as a base directory where temporary files/directories can be created. This can be helpful for large imports into Google BigQuery.

Return type

None

get_default_db_engine()

Provide the database object corresponding to the default database credentials. The details of how that credential is looked up depends on the session_type determined in the constructor, but can be overridden using the default_db_creds_name parameter.

Returns

SQLALchemy Engine object

Return type

Engine

get_db_engine(db_creds_name, creds_provider=None)

Provide a database object corresponding to a given credential name. The details of how that credential is looked up depends on the session_type determined in the constructor.

Parameters
  • db_creds_name (str) – Credential name to look up using the configured credentials provider.

  • creds_provider (Optional[records_mover.creds.base_creds.BaseCreds]) –

Returns

SQLALchemy Engine object

Return type

Engine

set_stream_logging(name='records_mover', level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, fmt='%(asctime)s - %(message)s', datefmt='%H:%M:%S')

records-mover logs details about its operations using Python logging. This method is a simple way to configure that logging to be output to a stream (by default, stdout).

You can use it for other things (e.g., dependencies of records-mover) by adjusting the ‘name’ argument.

Parameters
  • name (str) – Name of the package to set logging under. If set to ‘foo’, you can set a log variable FOO_LOG_LEVEL to the log level threshold you’d like to set (INFO/WARNING/etc) - so you can by default set, say, export RECORDS_MOVER_LOG_LEVEL=WARNING to quiet down loging, or export RECORDS_MOVER_LOG_LEVEL=DEBUG to increase it.

  • level (int) – Logging more detailed than this will not be output to the stream.

  • stream (IO[str]) – Stream which logging should be sent (e.g., sys.stdout, sys.stdin, or perhaps a file you open)

  • fmt (str) – Logging format to send to Python’slogging.Formatter() - determines what details will be sent.

  • datefmt (str) – Date format to send to Python’slogging.Formatter() - determines how the current date/time will be recorded in the log.

Return type

None

property records

Property containing a records_mover.Records object pre-configured with configuration using this Session. Once you have a Session object constructed, this is your jumping off point to moving records.

class records_mover.Records(db_driver=<PleaseInfer.token: 1>, url_resolver=<PleaseInfer.token: 1>, session=<PleaseInfer.token: 1>)

Bases: object

To move records from one place to another, you can use the methods on this object.

This object should be pulled from the ‘records’ property on a records_mover.Session object instead of being constructed directly.

To move data, you can call the records_mover.records.move() method, which is aliased for your convenience on this object.

Example:

records = session.records
db_engine = session.get_default_db_engine()
url = 's3://some-bucket/some-directory/'
source = records.sources.directory_from_url(url=url)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)
move: Callable

Alias of records_mover.records.move()

sources: records_mover.records.sources.factory.RecordsSources

Object containing factory methods to create various sources from which to copy records, of type records_mover.records.sources.RecordsSources

targets: records_mover.records.targets.factory.RecordsTargets

Object containing factory methods to create various targets to which records can be copied, of type records_mover.records.targets.RecordsTargets

records_mover.set_stream_logging(name='records_mover', level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, fmt='%(asctime)s - %(message)s', datefmt='%H:%M:%S')

records-mover logs details about its operations using Python logging. This method is a simple way to configure that logging to be output to a stream (by default, stdout).

You can use it for other things (e.g., dependencies of records-mover) by adjusting the ‘name’ argument.

Parameters
  • name (str) – Name of the package to set logging under. If set to ‘foo’, you can set a log variable FOO_LOG_LEVEL to the log level threshold you’d like to set (INFO/WARNING/etc) - so you can by default set, say, export RECORDS_MOVER_LOG_LEVEL=WARNING to quiet down loging, or export RECORDS_MOVER_LOG_LEVEL=DEBUG to increase it.

  • level (int) – Logging more detailed than this will not be output to the stream.

  • stream (IO[str]) – Stream which logging should be sent (e.g., sys.stdout, sys.stdin, or perhaps a file you open)

  • fmt (str) – Logging format to send to Python’slogging.Formatter() - determines what details will be sent.

  • datefmt (str) – Date format to send to Python’slogging.Formatter() - determines how the current date/time will be recorded in the log.

Return type

None

records_mover.move(records_source, records_target, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>)

Copy records from one location to another. Applies a sequence of possible techniques to do this in an efficient way and respects the preferences set in records_source, records_target and processing_instructions.

Example use:

records = session.records
db_engine = session.get_default_db_engine()
url = 's3://some-bucket/some-directory/'
source = records.sources.directory_from_url(url=url)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)
Parameters
Return type

records_mover.records.MoveResult