records_mover.records.targets package

Module contents

class records_mover.records.targets.RecordsTargets(url_resolver, db_driver)

Bases: object

These methods produce objects representing the target of a records move. The objects can be used as the ‘target’ argument to records_mover.records.move()

This object should be pulled from the ‘targets’ property of the ‘records’ property on a records_mover.Session object instead of being constructed directly.

Example use:

records = session.records
db_engine = session.get_default_db_engine()
url = 's3://some-bucket/some-directory/'
source = records.sources.directory_from_url(url=url)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)

Parameters

url_resolver (UrlResolver) –
db_driver (Callable[[Optional[Union[Engine, Connection]], Optional[Connection], Optional[Engine]], DBDriver]) –

directory_from_url(output_url, records_format=None)

Represents a Records Directory pointed to by a URL as a target.

Parameters

output_url (str) – Location to write the records directory. Must be a URL format understood by the records_mover.url library, and must be a directory URL that ends with a ‘/’.
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files to write out. If not specified, an efficient format for bulk moves will be chosen.

Return type

DirectoryFromUrlRecordsTarget

table(db_engine, schema_name, table_name, existing_table_handling=ExistingTableHandling.DELETE_AND_OVERWRITE, drop_and_recreate_on_load_error=False, add_user_perms_for=None, add_group_perms_for=None, db_conn=None)

Represents a SQLALchemy-accessible database table as as a target.

Parameters

db_engine (Engine) – SQLAlchemy database engine to write data to.
schema_name (str) – Schema name of a table to write data to.
table_name (str) – Table name of a table to write data to.
existing_table_handling (ExistingTableHandling) – When loading into a database table, controls how any existing table found will be handled. This must be a records_mover.records.ExistingTableHandling object.
drop_and_recreate_on_load_error (bool) – If True, table load errors will attempt to be addressed by dropping the target table and reloading the incoming data.
add_user_perms_for (Optional[Dict[str, List[str]]]) – If specified, a table’s permissions will be set for the specified users. Format should be like {‘all’: [‘username1’, ‘username2’], ‘select’: [‘username3’, ‘username4’]}
add_group_perms_for (Optional[Dict[str, List[str]]]) – If specified, a table’s permissions will be set for the specified group. Format should be like {‘all’: [‘group1’, ‘group2’], ‘select’: [‘group3’, ‘group4’]}
db_conn (Optional[Connection]) – SQLAlchemy database connection to write data to. If not specified, one will be created from the db_engine.

Return type

TableRecordsTarget

google_sheet(spreadsheet_id, sheet_name, google_cloud_creds)

Represents a sheet in a Google Sheets spreadsheet as a target, via the Google Sheets API.

Parameters

spreadsheet_id (str) – This is the xyz in https://docs.google.com/spreadsheets/d/xyz/edit?ts=5be5b383#gid=abc
sheet_name (str) – This is the label of the particular tab within the Google Sheets spreadsheet where the data should go.
google_cloud_creds (google.auth.credentials.Credentials) – Credentials object for Google Cloud Platform access.

Return type

GoogleSheetsRecordsTarget

fileobj(output_fileobj, records_format)

Represents a stream of data files bytes as a target.

Parameters

output_fileobj (IO[bytes]) – Stream where the file shoud be written to.
records_format (BaseRecordsFormat) – Description of the format of the data files to write out. If not specified, an efficient format for bulk moves will be chosen.

Return type

FileobjTarget

data_url(output_url, records_format=None)

Represents a URL pointer to a data file as a target.

Parameters

output_url (str) – Location of the data file to write. Must be a URL format understood by the records_mover.url library corresponding to a file, not a directory (i.e., not ending with a ‘/’)
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files to write out. If not specified, an efficient format for bulk moves will be chosen.

Return type

DataUrlTarget

local_file(filename, records_format=None)

Represents a data file on the local filesystem as a target.

Parameters

filename (str) – File path (relative or absolute) of the data file to unload to.
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files to write out. If not specified, an efficient format for bulk moves will be chosen.

Return type

DataUrlTarget

spectrum(schema_name, table_name, db_engine, spectrum_base_url=None, spectrum_rdir_url=None, existing_table_handling=ExistingTableHandling.TRUNCATE_AND_OVERWRITE)

Represents a location in Amazon Redshift Spectrum as a target.

Parameters

schema_name (str) – Schema name of a table to write data to.
table_name (str) – Table name of a table to write data to.
db_engine (Engine) – SQLAlchemy database engine to write data to.
spectrum_base_url (Optional[str]) – Root S3 URL under which a simple directory structure will be created for files to be stored, if spectrum_rdir_url is not specified. Note that when using the mover CLI, db-facts may be used to provide a default.
spectrum_rdir_url (Optional[str]) – S3 URL where a records directory with files will be stored; otherwise, use db-facts default if exists. If this is not specified, spectrum_base_url must be.
existing_table_handling (ExistingTableHandling) – When loading into a database table, controls how any existing table found will be handled. This must be a records_mover.records.ExistingTableHandling object.

Return type

SpectrumRecordsTarget