Class FixedWidthExtractor<TRecord, TProgress>
- Namespace
- Wolfgang.Etl.FixedWidth
- Assembly
- Wolfgang.Etl.FixedWidth.dll
Reads a fixed-width text file and yields records of type TRecord
as an asynchronous stream.
public class FixedWidthExtractor<TRecord, TProgress> : ExtractorBase<TRecord, TProgress>, IExtractWithProgressAndCancellationAsync<TRecord, TProgress>, IExtractWithCancellationAsync<TRecord>, IExtractWithProgressAsync<TRecord, TProgress>, IExtractAsync<TRecord>, IDisposable where TRecord : notnull, new() where TProgress : notnull
Type Parameters
TRecordThe POCO type representing a single record. Properties decorated with FixedWidthFieldAttribute are populated from each line. The type must have a public parameterless constructor.
TProgressThe type of the progress object reported during extraction. Override CreateProgressReport() to return an instance of this type. If you do not need a custom progress type, use FixedWidthReport.
- Inheritance
-
ExtractorBase<TRecord, TProgress>FixedWidthExtractor<TRecord, TProgress>
- Implements
-
IExtractWithProgressAndCancellationAsync<TRecord, TProgress>IExtractWithCancellationAsync<TRecord>IExtractWithProgressAsync<TRecord, TProgress>IExtractAsync<TRecord>
- Inherited Members
-
ExtractorBase<TRecord, TProgress>.ExtractAsync()ExtractorBase<TRecord, TProgress>.CreateProgressReport()ExtractorBase<TRecord, TProgress>.IncrementCurrentItemCount()ExtractorBase<TRecord, TProgress>.IncrementCurrentSkippedItemCount()ExtractorBase<TRecord, TProgress>.ReportingIntervalExtractorBase<TRecord, TProgress>.CurrentItemCountExtractorBase<TRecord, TProgress>.CurrentSkippedItemCountExtractorBase<TRecord, TProgress>.MaximumItemCountExtractorBase<TRecord, TProgress>.SkipItemCount
Examples
// Stream-based (preferred for files — 64 KB buffer reduces syscall overhead):
await using var stream = File.OpenRead("data.txt");
using var extractor = new FixedWidthExtractor<CustomerRecord, FixedWidthReport>(stream);
// TextReader-based (caller owns the reader):
var extractor = new FixedWidthExtractor<CustomerRecord, FixedWidthReport>(reader);
// Custom progress type — subclass and override CreateProgressReport:
public class CustomerExtractor : FixedWidthExtractor<CustomerRecord, MyProgress>
{
public CustomerExtractor(Stream stream)
: base(stream) { }
protected override MyProgress CreateProgressReport() =>
new MyProgress(CurrentItemCount, CurrentSkippedItemCount);
}
Remarks
Two construction modes are supported, each with different ownership semantics:
- TextReader constructor — the caller owns the TextReader lifetime. The extractor does not dispose it. Calling Dispose() is optional and has no effect.
- Stream constructor — the extractor creates an internal StreamReader with a 64 KB buffer for improved throughput on large files. The caller retains ownership of the Stream (it is not closed), but Dispose() must be called to release the internal reader.
Constructors
FixedWidthExtractor(Stream, ILogger<FixedWidthExtractor<TRecord, TProgress>>?)
Initializes a new FixedWidthExtractor<TRecord, TProgress> that reads from the specified Stream using an internal StreamReader with a 64 KB buffer for improved throughput on large files.
public FixedWidthExtractor(Stream stream, ILogger<FixedWidthExtractor<TRecord, TProgress>>? logger = null)
Parameters
streamStreamThe Stream to read fixed-width records from. The stream must be readable. The caller retains ownership — the extractor does not dispose the stream.
loggerILogger<FixedWidthExtractor<TRecord, TProgress>>An optional ILogger<TCategoryName> for diagnostic output. Pass null (the default) to disable logging.
Exceptions
- ArgumentNullException
streamis null.
FixedWidthExtractor(TextReader, ILogger<FixedWidthExtractor<TRecord, TProgress>>?)
Initializes a new FixedWidthExtractor<TRecord, TProgress> that reads from the specified TextReader.
public FixedWidthExtractor(TextReader reader, ILogger<FixedWidthExtractor<TRecord, TProgress>>? logger = null)
Parameters
readerTextReaderThe TextReader to read fixed-width records from. This can be a StreamReader wrapping a file stream (local or network share), a StringReader for in-memory content, or any other TextReader implementation. Reading is performed synchronously for throughput; callers with slow or non-buffered sources should pre-buffer into a StringReader. The caller is responsible for the reader's lifetime.
loggerILogger<FixedWidthExtractor<TRecord, TProgress>>An optional ILogger<TCategoryName> for diagnostic output. Pass null (the default) to disable logging.
Exceptions
- ArgumentNullException
readeris null.
Properties
BlankLineHandling
Specifies what happens when a truly blank line (zero length) is encountered in the file. Evaluated before the skip budget and Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.MaximumItemCount.
public BlankLineHandling BlankLineHandling { get; set; }
Property Value
Remarks
- ThrowException (default) — always throws a LineTooShortException regardless of position.
- Skip — the line is invisible to all counting logic. Does not count toward Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.SkipItemCount or Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.MaximumItemCount.
- ReturnDefault — a default
TRecordinstance is yielded. Counts toward the skip budget if within Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.SkipItemCount, otherwise counts toward Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.MaximumItemCount.
Note: a line consisting entirely of spaces is not blank — it is a valid data line that will parse to a record with all whitespace-trimmed (empty/default) fields.
LineFilter is not invoked for blank lines.
CurrentLineNumber
The 1-based physical line number of the line most recently read from the file. Updated before each line is parsed so that if an exception is thrown, this value points to the offending line. Matches the line number shown in a text editor — no adjustment is needed for header or separator lines.
public long CurrentLineNumber { get; }
Property Value
Remarks
Thread-safe: reads are performed with Interlocked so this property may be sampled from a progress-reporting timer thread without a data race.
FieldDelimiter
The delimiter string present between fields in the source file, or null (default) for pure fixed-width input with no delimiter. Must match the FieldDelimiter used when the file was written.
public string? FieldDelimiter { get; set; }
Property Value
Examples
// File was written with FieldDelimiter = " | " — set the same value on the extractor:
extractor.FieldDelimiter = " | ";
// Pure fixed-width file with no delimiter (default):
extractor.FieldDelimiter = null;
Remarks
When set, the extractor accounts for the delimiter width when calculating field start positions, ensuring each field is read from the correct offset.
FieldSeparator
When non-null, the line immediately following the last header line is treated as a separator and skipped. Has no effect if HeaderLineCount is 0. The value of the character is not used for parsing — only its presence matters. Set to null (default) for no separator. Mirrors FieldSeparator.
public char? FieldSeparator { get; set; }
Property Value
- char?
Examples
extractor.HasHeader = true;
extractor.FieldSeparator = '-'; // skips a "----------" separator line after the header
extractor.FieldSeparator = null; // no separator line (default)
HasHeader
Convenience property — when set to true, sets HeaderLineCount to 1. When set to false, sets HeaderLineCount to 0. Returns true if HeaderLineCount is greater than zero. Mirrors WriteHeader.
public bool HasHeader { get; set; }
Property Value
Examples
// Skip one header line before reading records:
extractor.HasHeader = true;
// Skip one header line and one separator line:
extractor.HasHeader = true;
extractor.FieldSeparator = '-';
HeaderLineCount
The number of header lines to skip at the beginning of the file before extracting records. Defaults to 0. For the common single-header case, use HasHeader instead.
public int HeaderLineCount { get; set; }
Property Value
Examples
// File has two header lines followed by data:
extractor.HeaderLineCount = 2;
// Equivalent shorthand for the common single-header case:
extractor.HasHeader = true;
Remarks
Lines 1 through HeaderLineCount are skipped entirely without parsing. If FieldSeparator is also set, the line immediately after the last header line is additionally skipped as a separator line.
LineFilter
A delegate invoked for every data line (after header and separator lines have been skipped) before any parsing occurs. Return Process to parse the line normally, Skip to skip it, or Stop to end the stream immediately without parsing the line.
public Func<string, LineAction> LineFilter { get; set; }
Property Value
Examples
// Footer string — stop when a known marker line is reached
extractor.LineFilter = line => line == "END" ? LineAction.Stop : LineAction.Process;
// Trailing separator — stop when a line consists entirely of dashes
extractor.LineFilter = line => line.All(c => c == '-') ? LineAction.Stop : LineAction.Process;
// EOF marker — stop when a line starts with a sentinel prefix
extractor.LineFilter = line => line.StartsWith("$$") ? LineAction.Stop : LineAction.Process;
// Comment lines — skip lines that begin with '#'
extractor.LineFilter = line => line.StartsWith("#") ? LineAction.Skip : LineAction.Process;
// Blank line as terminator — stop at the first empty line
extractor.LineFilter = line => string.IsNullOrWhiteSpace(line) ? LineAction.Stop : LineAction.Process;
Remarks
Evaluated after BlankLineHandling — blank lines never reach the filter.
Evaluated before the skip budget and Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.MaximumItemCount.
Both Skip and Stop are invisible
to all counting logic — they do not affect Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.SkipItemCount,
Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.MaximumItemCount, or
CurrentSkippedItemCount.
Defaults to a function that always returns Process.
MalformedLineHandling
Specifies what happens when a line is encountered that is too short or whose field values cannot be converted to the target property type.
public MalformedLineHandling MalformedLineHandling { get; set; }
Property Value
Remarks
Defaults to ThrowException.
When set to Skip, the line is skipped and
CurrentSkippedItemCount is incremented.
When set to ReturnDefault, a default instance
of TRecord is yielded for the offending line.
ValueParser
A delegate that converts a raw string read from the file into the target property type. The FieldContext provides the property type, format string, and other field metadata needed to perform the conversion. Defaults to DefaultParser. Mirrors ValueConverter.
public FixedWidthValueParser ValueParser { get; set; }
Property Value
Examples
// Treat "Y"/"N" as bool, fall back to DefaultParser for everything else:
extractor.ValueParser = (text, ctx) =>
ctx.PropertyType == typeof(bool)
? (object)(text.Span.SequenceEqual("Y".AsSpan()))
: FixedWidthConverter.DefaultParser(text, ctx);
// Parse a custom date format for a specific field:
extractor.ValueParser = (text, ctx) =>
ctx.PropertyName == "BirthDate"
? DateTime.ParseExact(text.ToString(), "dd/MM/yyyy", CultureInfo.InvariantCulture)
: FixedWidthConverter.DefaultParser(text, ctx);
Remarks
The delegate must return a value that is assignable to the property's CLR type. Returning an incompatible type will cause an InvalidCastException when the framework attempts to set the property value.
Exceptions
- FieldConversionException
The default DefaultParser wraps any parse failure in a FieldConversionException. Custom parsers should do the same so that MalformedLineHandling can handle them uniformly.
Methods
CreateProgressReport()
Creates a progress report snapshot for the current extractor state.
Override in a derived class to return a custom TProgress
instance. The default implementation returns a FixedWidthReport
when TProgress is FixedWidthReport or the
base Wolfgang.Etl.Abstractions.Report, and throws NotSupportedException otherwise.
protected override TProgress CreateProgressReport()
Returns
- TProgress
A
TProgresssnapshot containing Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.CurrentItemCount, Wolfgang.Etl.Abstractions.ExtractorBase<TSource, TProgress>.CurrentSkippedItemCount, and CurrentLineNumber at the moment of the call.
Examples
// To use a custom progress type, subclass and override:
public class MyExtractor : FixedWidthExtractor<MyRecord, MyProgress>
{
public MyExtractor(TextReader reader) : base(reader) { }
protected override MyProgress CreateProgressReport() =>
new MyProgress(CurrentItemCount, CurrentLineNumber);
}
Exceptions
- NotSupportedException
Thrown when
TProgressis not FixedWidthReport or Wolfgang.Etl.Abstractions.Report and CreateProgressReport() has not been overridden.
CreateProgressTimer(IProgress<TProgress>)
Creates the Wolfgang.Etl.Abstractions.IProgressTimer used to drive progress callbacks. Override this method in a derived class to inject a custom timer (for example, a custom implementation that allows manual control in unit tests).
protected override IProgressTimer CreateProgressTimer(IProgress<TProgress> progress)
Parameters
progressIProgress<TProgress>The progress sink that will receive callbacks.
Returns
- IProgressTimer
A started Wolfgang.Etl.Abstractions.IProgressTimer instance.
Dispose()
Disposes the internal StreamReader when this instance was constructed from a Stream. Has no effect when constructed from a caller-owned TextReader.
public void Dispose()
Dispose(bool)
Releases managed resources when disposing is
true. Override in a derived class to add cleanup logic.
protected virtual void Dispose(bool disposing)
Parameters
ExtractWorkerAsync(CancellationToken)
This method is the core implementation of the extraction logic and should be overridden by derived classes.
protected override IAsyncEnumerable<TRecord> ExtractWorkerAsync(CancellationToken token)
Parameters
tokenCancellationTokenA CancellationToken to observe while waiting for the task to complete.
Returns
- IAsyncEnumerable<TRecord>
IAsyncEnumerable<TSource> The result may be an empty sequence if no data is available or if the extraction fails.