JPEG files - extracting Comments from the metadata

Using (or providing) components based on the "Win32" framework

JPEG files - extracting Comments from the metadata

Postby nicholas.small on Fri Aug 05, 2022 9:50 pm

Could someone explain to me how I could extract from a JPEG file the Comments field of the metadata? (I don't really want to know the details, if a function to do it is already available. If it could write back to the file, that would be a bonus.)

Thanks,

Nicholas
nicholas.small
 
Posts: 23
Joined: Tue Mar 30, 2021 8:45 pm

Re: JPEG files - extracting Comments from the metadata

Postby PGilbert on Sun Aug 07, 2022 8:54 pm

If you don't mind using .Net you can find what you are looking for here: https://old.aplwiki.com/netFreeImage

Regards,

Pierre Gilbert
User avatar
PGilbert
 
Posts: 439
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: JPEG files - extracting Comments from the metadata

Postby nicholas.small on Mon Aug 08, 2022 9:28 pm

Thanks, Pierre - I'll have a look at it.

Nicholas
nicholas.small
 
Posts: 23
Joined: Tue Mar 30, 2021 8:45 pm

Re: JPEG files - extracting Comments from the metadata

Postby nicholas.small on Mon Sep 12, 2022 9:57 pm

Apologies to Pierre for not pursuing his suggestion. Instead, I decided on the direct approach, which has resulted in a deeper understanding of Dyalog and Unicode, albeit that I was sometimes left feeling a bit like the Ancient Mariner - "A sadder but a wiser man he woke the morrow morn". I have also added to the number of tools in my toolbox.

The files from which I wished to read the comments fields had been created by scanning photographs using an Epson Perfection V39 scanner; the comments were added in Windows, using File Properties>Details>Comments. Note that the files are all in Motorola (big-endian) format.

My chief sources of reference were:
dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPEG_files
http://www.itu.int/itudoc/itu-t/com16/t ... /tiff6.pdf
http://www.media.mit.edu/pia/Research/d ... /exif.html

A JPEG file contains several segments, with segments and data blocks being delimited by two-byte codes, the first byte of which is 0xFF, the second being an identifier.

From inspection of the file, I could see that the Comments data were in a section that was application specific (identifier 0xE1) - but I was unable to find a description, so certain values in my function were obtained by inspection.
      [0] Comments←JPEG_Comments File;z;C0;Len;Loc;Exif0
[1] z←9600 NBREAD File
[2] Exif0←30 ⍝ Offset to table
[3] C0←1↑((HEXtoCHAR 2 2⍴'99CC')⎕S 0)200↑z ⍝ Offset to table row
[4] Len←CHARtoUSI 4↑(C0+4)↓z
[5] Loc←CHARtoUSI 4↑(C0+8)↓z
[6] Comments←(HEXtoCHAR'00')~⍨Len↑(Exif0+Loc)↓z

At line [1], the first 9600 bytes of the file are read as binary (data type 80); this is more than enough to include the Comments.
The Exif data is indexed in tables with four columns called Image File Directories (IFDs), and subsequent offsets are relative to the start of the first IFD, following the 30 bytes of the TIFF header (line 2). By inspection, the table row sought starts with 0x9C9C; this is followed by a two-byte data-type code which is decimal 7, defined as "undefined"! Then come a four byte data length and a four byte offset. Finally, the Comments can be extracted and the nulls between each pair of characters removed.
nicholas.small
 
Posts: 23
Joined: Tue Mar 30, 2021 8:45 pm

Re: JPEG files - extracting Comments from the metadata

Postby nicholas.small on Tue Sep 13, 2022 9:32 pm

Correction to literary allusion:
I should have written that I felt like the wedding guest in the Rime of the Ancient Mariner.
Nicholas
nicholas.small
 
Posts: 23
Joined: Tue Mar 30, 2021 8:45 pm

Re: JPEG files - extracting Comments from the metadata

Postby nicholas.small on Fri Sep 23, 2022 10:21 pm

Correction regarding use of ⎕S.
I had failed to read sufficient of the "small print" on the search utility and line [3] failed to produce the desired answer in about 1% of cases. This is because the default mode for ⎕S is to treat the input data as a delimited stream of "lines" and reset the counter every time a delimiter is found; unsurprisingly, every now and then, one of the recognized delimiters appears nearer the start of a JPEG file than the search string.

Corrected version of line [3]:
C0←1↑((HEXtoCHAR 2 2⍴'99CC')⎕S 0⍠'Mode' 'D')200↑z

Alternatives to line [3]:
C0←1↑{⎕IO-⍨(⍵∧1⌽⍵)/⍳⍴⍵}(HEXtoCHAR'9C')=200↑z
C0←1↑(HEXtoCHAR 2 2⍴'99CC'){⎕IO-⍨(⍺⍷⍵)/⍳⍴⍵}200↑z

(Thanks to Vince for help with this.)
nicholas.small
 
Posts: 23
Joined: Tue Mar 30, 2021 8:45 pm


Return to Windows: GUI, COM/OLE/ActiveX

Who is online

Users browsing this forum: No registered users and 1 guest