JPEG files - extracting Comments from the metadata
6 posts
• Page 1 of 1
JPEG files - extracting Comments from the metadata
Could someone explain to me how I could extract from a JPEG file the Comments field of the metadata? (I don't really want to know the details, if a function to do it is already available. If it could write back to the file, that would be a bonus.)
Thanks,
Nicholas
Thanks,
Nicholas
- nicholas.small
- Posts: 23
- Joined: Tue Mar 30, 2021 8:45 pm
Re: JPEG files - extracting Comments from the metadata
If you don't mind using .Net you can find what you are looking for here: https://old.aplwiki.com/netFreeImage
Regards,
Pierre Gilbert
Regards,
Pierre Gilbert
-
PGilbert - Posts: 439
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: JPEG files - extracting Comments from the metadata
Thanks, Pierre - I'll have a look at it.
Nicholas
Nicholas
- nicholas.small
- Posts: 23
- Joined: Tue Mar 30, 2021 8:45 pm
Re: JPEG files - extracting Comments from the metadata
Apologies to Pierre for not pursuing his suggestion. Instead, I decided on the direct approach, which has resulted in a deeper understanding of Dyalog and Unicode, albeit that I was sometimes left feeling a bit like the Ancient Mariner - "A sadder but a wiser man he woke the morrow morn". I have also added to the number of tools in my toolbox.
The files from which I wished to read the comments fields had been created by scanning photographs using an Epson Perfection V39 scanner; the comments were added in Windows, using File Properties>Details>Comments. Note that the files are all in Motorola (big-endian) format.
My chief sources of reference were:
dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPEG_files
http://www.itu.int/itudoc/itu-t/com16/t ... /tiff6.pdf
http://www.media.mit.edu/pia/Research/d ... /exif.html
A JPEG file contains several segments, with segments and data blocks being delimited by two-byte codes, the first byte of which is 0xFF, the second being an identifier.
From inspection of the file, I could see that the Comments data were in a section that was application specific (identifier 0xE1) - but I was unable to find a description, so certain values in my function were obtained by inspection.
At line [1], the first 9600 bytes of the file are read as binary (data type 80); this is more than enough to include the Comments.
The Exif data is indexed in tables with four columns called Image File Directories (IFDs), and subsequent offsets are relative to the start of the first IFD, following the 30 bytes of the TIFF header (line 2). By inspection, the table row sought starts with 0x9C9C; this is followed by a two-byte data-type code which is decimal 7, defined as "undefined"! Then come a four byte data length and a four byte offset. Finally, the Comments can be extracted and the nulls between each pair of characters removed.
The files from which I wished to read the comments fields had been created by scanning photographs using an Epson Perfection V39 scanner; the comments were added in Windows, using File Properties>Details>Comments. Note that the files are all in Motorola (big-endian) format.
My chief sources of reference were:
dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPEG_files
http://www.itu.int/itudoc/itu-t/com16/t ... /tiff6.pdf
http://www.media.mit.edu/pia/Research/d ... /exif.html
A JPEG file contains several segments, with segments and data blocks being delimited by two-byte codes, the first byte of which is 0xFF, the second being an identifier.
From inspection of the file, I could see that the Comments data were in a section that was application specific (identifier 0xE1) - but I was unable to find a description, so certain values in my function were obtained by inspection.
[0] Comments←JPEG_Comments File;z;C0;Len;Loc;Exif0
[1] z←9600 NBREAD File
[2] Exif0←30 ⍝ Offset to table
[3] C0←1↑((HEXtoCHAR 2 2⍴'99CC')⎕S 0)200↑z ⍝ Offset to table row
[4] Len←CHARtoUSI 4↑(C0+4)↓z
[5] Loc←CHARtoUSI 4↑(C0+8)↓z
[6] Comments←(HEXtoCHAR'00')~⍨Len↑(Exif0+Loc)↓z
At line [1], the first 9600 bytes of the file are read as binary (data type 80); this is more than enough to include the Comments.
The Exif data is indexed in tables with four columns called Image File Directories (IFDs), and subsequent offsets are relative to the start of the first IFD, following the 30 bytes of the TIFF header (line 2). By inspection, the table row sought starts with 0x9C9C; this is followed by a two-byte data-type code which is decimal 7, defined as "undefined"! Then come a four byte data length and a four byte offset. Finally, the Comments can be extracted and the nulls between each pair of characters removed.
- nicholas.small
- Posts: 23
- Joined: Tue Mar 30, 2021 8:45 pm
Re: JPEG files - extracting Comments from the metadata
Correction to literary allusion:
I should have written that I felt like the wedding guest in the Rime of the Ancient Mariner.
Nicholas
I should have written that I felt like the wedding guest in the Rime of the Ancient Mariner.
Nicholas
- nicholas.small
- Posts: 23
- Joined: Tue Mar 30, 2021 8:45 pm
Re: JPEG files - extracting Comments from the metadata
Correction regarding use of ⎕S.
I had failed to read sufficient of the "small print" on the search utility and line [3] failed to produce the desired answer in about 1% of cases. This is because the default mode for ⎕S is to treat the input data as a delimited stream of "lines" and reset the counter every time a delimiter is found; unsurprisingly, every now and then, one of the recognized delimiters appears nearer the start of a JPEG file than the search string.
Corrected version of line [3]:
C0←1↑((HEXtoCHAR 2 2⍴'99CC')⎕S 0⍠'Mode' 'D')200↑z
Alternatives to line [3]:
C0←1↑{⎕IO-⍨(⍵∧1⌽⍵)/⍳⍴⍵}(HEXtoCHAR'9C')=200↑z
C0←1↑(HEXtoCHAR 2 2⍴'99CC'){⎕IO-⍨(⍺⍷⍵)/⍳⍴⍵}200↑z
(Thanks to Vince for help with this.)
I had failed to read sufficient of the "small print" on the search utility and line [3] failed to produce the desired answer in about 1% of cases. This is because the default mode for ⎕S is to treat the input data as a delimited stream of "lines" and reset the counter every time a delimiter is found; unsurprisingly, every now and then, one of the recognized delimiters appears nearer the start of a JPEG file than the search string.
Corrected version of line [3]:
C0←1↑((HEXtoCHAR 2 2⍴'99CC')⎕S 0⍠'Mode' 'D')200↑z
Alternatives to line [3]:
C0←1↑{⎕IO-⍨(⍵∧1⌽⍵)/⍳⍴⍵}(HEXtoCHAR'9C')=200↑z
C0←1↑(HEXtoCHAR 2 2⍴'99CC'){⎕IO-⍨(⍺⍷⍵)/⍳⍴⍵}200↑z
(Thanks to Vince for help with this.)
- nicholas.small
- Posts: 23
- Joined: Tue Mar 30, 2021 8:45 pm
6 posts
• Page 1 of 1
Return to Windows: GUI, COM/OLE/ActiveX
Who is online
Users browsing this forum: No registered users and 0 guests
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group