For parser, decoder, or file input import integrations, you can locate the source file name associated with a record import error.
In the job log associated with a data import, you can detect errors and identify the source file that caused those errors. Messages that indicate a skipped record or "skipped line" are recorded in the log along with the source file name. Even when multiple files are imported in one job, you can detect which files are problematic.
In the event of an error, you see one of the following messages:
If the job is configured to continue processing in the event of an error:
2019-03-11 04:45:05.426 +0000 [WARN] (0034:task-0001): Skipped line gcs://example-bucket/sample_02.gz:2 (java.lang.NumberFormatException: For input string:
If the job is configured to stop processing in the event of an error:
2019-03-15 03:27:54.734 +0000 [WARN] (main): Invalid record at gcs://big-query-test/sample_02.gz:2: 2,14824,2015-01-27 19:01:23,20150127,embulk jruby
org.embulk.exec.PartialExecutionException: org.embulk.spi.DataException: Invalid record at gcs://example-bucket/sample_02.gz:2: 2,14824,2015-01-27 19:01:23,20150127,embulk jruby
You can see the source file name in import job logs for the following parser, decoder, or file input integrations:
- Microsoft Azure Blob Storage
- Google Cloud Storage
- Amazon S3
- Adobe Analytics
- Google DoubleClick Platform via SFTP