Processing a header and trailer textfile in Datastage
Example case
The input file in our scenario is a stream of records representing invoices. It is a text file in a header-trailer format and has the following structure:
- A header starts with a letter H which is followed by an invoice number, a customer number, a date and invoice currency
- Every item starts with a letter I which is followed by product ID, product age, quantity and net value
- And the trailer starts with a T and contains two values which fullfill the role of a checksum: the total number of invoice lines and total net value.
Header and trailer input textfile in Datastage:
Solution
Datastage job design which solves the problem of loading an extract structured into headers and items:
Detailed illustration of each component of the job
TrsfGetKinds transformer - a type of record is extracted from each input line and assigned and written into a kind column
The transformer reads first character of each line to mark records and divide them into kinds:
TrsfAssignHeaders transformer assigns invoice headers to each line of the input flow. Invoice header is a 8 character key which is extracted from the dataline using the following formula: L02.data_line[1,8]
A datastage transformer which assigns headers using a stage variable:
TrsfReformat transformer splits the data flow into invoice headers and invoice lines. A InvoiceLineID stage variable is used to populate a sequence number for each invoice line. In order to correctly load net value, the ereplace function replaces commas with dots in this numeric field.
The following transformer splits the flow into headers and items and reformats records into a corresponding structure:
Target oracle tables with correctly loaded invoice headers and lines:
Processing a header and trailer textfile
Goal
Process a text file which contains records arranged in blocks consisting of a header record, details (items, body) and a trailer. The aim is to normalize records and load them into a relational database structure.
Scenario overview and details
Typically, header and item processing needs to be implemented when processing files that origin from mainframe systems, and also EDI transmission files, SWIFT Schemas, EPIC files.
The input file in our scenario is a stream of records representing invoices. Each invoice consists of :
- A header containing invoice number, dates, customer reference and other details
- One or more items representing ordered products, including an item number, quantity, price and value
- A trailer which contains summary information for all the items
The records are distinguished by the first character in each line: H stands for headers, I for items and T for the trailers.
The lines in a section of the input file are fixed length (so for example all headers have the same number of characters but it varies from items).
Input data:
Text file in a Header-trailer format. The file presented in our example is divided into headers, items and trailers.
- A header starts with a letter H and then contains an invoice number, a customer number, a date and invoice currency
- Every item starts with a letter I which is followed by product ID, product age, quantity and net value
- The trailer starts with a T and contains two values which act as a checksum: the total number of invoice lines and a total net value.
Sample header and trailer input textfile:
Output data:
INVC_HEADER and INVC_LINE relational tables: one with invoice headers and the other with the invoice lines
rejects_080822.txt - a text file with loading errors desired output
0 comments:
Post a Comment