records_mover.records.sources package
Module contents
- class records_mover.records.sources.RecordsSources(db_driver, url_resolver)
Bases:
object
These methods produce objects representing the source of a records move. The objects can be used as the ‘source’ argument to
records_mover.records.move()
This object should be pulled from the ‘sources’ property of the ‘records’ property on a
records_mover.Session
object instead of being constructed directly.Example use:
records = session.records db_engine = session.get_default_db_engine() url = 's3://some-bucket/some-directory/' source = records.sources.directory_from_url(url=url) target = records.targets.table(schema_name='myschema', table_name='mytable', db_engine=db_engine) results = records.move(source, target)
- Parameters
db_driver (Callable[[Optional[Union[Engine, Connection]], Optional[Connection], Optional[Engine]], DBDriver]) –
url_resolver (UrlResolver) –
- dataframe(df, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>, records_schema=None, include_index=False)
Represents a single dataframe source.
- Parameters
df (DataFrame) – Pandas DataFrame to move data from.
processing_instructions (ProcessingInstructions) – Instructions used during creation of the schema SQL as a
records_mover.records.ProcessingInstructions
object.include_index (bool) – If True, the Pandas DataFrame index column will be included in the move as a column; if False, it will be disregarded.
records_schema (Optional[RecordsSchema]) – Experimental interface; do not use.
- Return type
DataframesRecordsSource
- dataframes(dfs, processing_instructions=<records_mover.records.processing_instructions.ProcessingInstructions object>, records_schema=None, include_index=False)
Represents multiple dataframes as a source. Note that this accepts an iterable, meaning that the dataframes in question can be generated dynamically in chunks.
- Parameters
dfs (Iterable[DataFrame]) – Iterable of Pandas DataFrames to move data from – all data from these DataFrames will be added to the same table.
processing_instructions (ProcessingInstructions) – Instructions used during creation of the schema SQL as a
records_mover.records.ProcessingInstructions
object.include_index (bool) – If True, the Pandas DataFrame index column will be included in the move as a column; if False, it will be disregarded.
records_schema (Optional[RecordsSchema]) – Experimental interface; do not use.
- Return type
DataframesRecordsSource
- fileobjs(target_names_to_input_fileobjs, records_format=None, initial_hints=None, records_schema=None)
Represents one or more streams of data files as a source.
- Parameters
target_names_to_input_fileobjs (Mapping[str, IO[bytes]]) – Filenames mapping to streams of data file.
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files.
initial_hints (Optional[PartialRecordsHints]) – If records_format is not provided, the format of the file will be determined automatically. If that effort fails, you can help it out by providing hints in this dictionary as needed. See the records format specification for hints and valid values.
records_schema (Optional[RecordsSchema]) – Experimental interface; do not use.
- Return type
Union[UninferredFileobjsRecordsSource, FileobjsSource]
- data_url(input_url, records_format=None, initial_hints=None, records_schema=None)
Represents a URL pointer to a data file as a source.
- Parameters
input_url (str) – Location of the data file. Must be a URL format understood by the records_mover.url library.
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files.
initial_hints (Optional[PartialRecordsHints]) –
If records_format is not provided, the format of the file will be determined automatically. If that effort fails, you can help it out by providing hints in this dictionary as needed. See the records format specification for hints and valid values.
records_schema (Optional[RecordsSchema]) – Experimental interface; do not use.
- Return type
DataUrlRecordsSource
- table(db_engine, schema_name, table_name, db_conn=None)
Represents a SQLALchemy-accessible database table as as a source.
- Parameters
db_engine (Engine) – SQLAlchemy database engine to pull data from.
schema_name (str) – Schema name of a table to get data from.
table_name (str) – Table name of a table to get data from.
db_conn (Optional[Connection]) – SQLAlchemy database connection to use to pull data from.
- Return type
TableRecordsSource
- directory_from_url(url, hints={}, fail_if_dont_understand=True)
Represents a Records Directory pointed to by a URL as a source.
- Parameters
url (str) – Location of the records directory. Must be a URL format understood by the records_mover.url library, and must be a directory URL that ends with a ‘/’.
hints (PartialRecordsHints) – Any additional hints that should override the description of the data files already in the records directory.
fail_if_dont_understand (bool) – If True, and a part of the RecordsFormat is not understood while processing, then immediately fail and raise an exception. Otherwise, ignore the misunderstood instruction (e.g., ignore the hint, assume default variant, etc etc)
- Return type
RecordsDirectoryRecordsSource
- local_file(filename, records_format=None, initial_hints=None, records_schema=None)
Represents a data file on the local filesystem as a source.
- Parameters
filename (str) – File path (relative or absolute) of the data file to load.
records_format (Optional[BaseRecordsFormat]) – Description of the format of the data files.
initial_hints (Optional[PartialRecordsHints]) –
If records_format is not provided, the format of the file will be determined automatically. If that effort fails, you can help it out by providing hints in this dictionary as needed. See the records format specification for hints and valid values.
records_schema (Optional[RecordsSchema]) –
- Return type
DataUrlRecordsSource
- google_sheet(spreadsheet_id, sheet_name_or_range, google_cloud_creds, out_of_band_column_headers=None, header_translator=None)
Represents a sheet or range in a Google Sheets spreadsheet as a source, via the Google Sheets API.
- Parameters
spreadsheet_id (str) – This is the xyz in https://docs.google.com/spreadsheets/d/xyz/edit?ts=5be5b383#gid=abc
sheet_name_or_range (str) – This is the label of the particular tab within the Google Sheets spreadsheet where the data should go, or a valid Google Sheets-style range formula
google_cloud_creds (google.auth.credentials.Credentials) – This is an object representing Google Cloud Platform access credentials.
out_of_band_column_headers (Optional[Iterable[str]]) – If provided, we’ll use these column names instead of the first row of the spreadsheet. If set, the first row will be treated as data.
header_translator (Optional[Callable[[str], str]]) – If provided, header names pulled from the sheet will be translated through this function. Not used if out_of_band_column_headers is set.
- Return type
GoogleSheetsRecordsSource