A Virtual Homehttps://blog.avirtualhome.com/2018-08-19T15:08:00-04:00Memory problems with MozJPEG and Pillow2018-08-19T15:08:00-04:002018-08-19T15:08:00-04:00Peter van der Doestag:blog.avirtualhome.com,2018-08-19:/memory-problems-with-jpg-files-and-pillow/<p>After implementing MozJPEG to create smaller files we noticed it would not always work as we got the message “I/O suspension not supported in scan optimization”</p><p>We implemented <a href="https://github.com/mozilla/mozjpeg">MozJPEG</a> to be used with <a href="https://python-pillow.org/">Pillow 4.x</a> to create smaller thumbnails of files uploaded by users, when we noticed that sometimes this process did not work.
We looked into our logs and noticed the following error message <code>I/O suspension not supported in scan optimization</code>. Time to enter the <span class="caps">GSO</span> workflow, <span class="caps">GSO</span> stands for Google Stack Overflow, in other words search the Internet. The error message results in links to the source code of MozJPEG, not very helpful at first.</p>
<p>Time to brush up on my C knowledge, <span class="caps">OK</span> I never programmed in C but that doesn’t stop me from going through the source.</p>
<p>The error message is defined in <code>jerror.h</code></p>
<div class="highlight"><pre><span></span><span class="cp">#endif</span>
<span class="n">JMESSAGE</span><span class="p">(</span><span class="n">JERR_BAD_PARAM</span><span class="p">,</span> <span class="s">"Bogus parameter"</span><span class="p">)</span>
<span class="n">JMESSAGE</span><span class="p">(</span><span class="n">JERR_BAD_PARAM_VALUE</span><span class="p">,</span> <span class="s">"Bogus parameter value"</span><span class="p">)</span>
<span class="n">JMESSAGE</span><span class="p">(</span><span class="n">JERR_UNSUPPORTED_SUSPEND</span><span class="p">,</span> <span class="s">"I/O suspension not supported in scan optimization"</span><span class="p">)</span>
<span class="cp">#ifdef JMAKE_ENUM_LIST</span>
</pre></div>
<p>So now we have to find the <code>JERR_UNSUPPORTED_SUSPEND</code> constant. Luckily it appears only in one file, <code>jcmaster.c</code></p>
<div class="highlight"><pre><span></span><span class="k">while</span> <span class="p">(</span><span class="n">size</span> <span class="o">>=</span> <span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MEMCOPY</span><span class="p">(</span><span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">next_output_byte</span><span class="p">,</span> <span class="n">src</span><span class="p">,</span> <span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span><span class="p">);</span>
<span class="n">src</span> <span class="o">+=</span> <span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span><span class="p">;</span>
<span class="n">size</span> <span class="o">-=</span> <span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span><span class="p">;</span>
<span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">next_output_byte</span> <span class="o">+=</span> <span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span><span class="p">;</span>
<span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">free_in_buffer</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="o">*</span><span class="n">cinfo</span><span class="o">-></span><span class="n">dest</span><span class="o">-></span><span class="n">empty_output_buffer</span><span class="p">)(</span><span class="n">cinfo</span><span class="p">))</span>
<span class="n">ERREXIT</span><span class="p">(</span><span class="n">cinfo</span><span class="p">,</span> <span class="n">JERR_UNSUPPORTED_SUSPEND</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Cool, it seems to be related to memory cleanup, just my guess because of the empty_output_buffer line.
Now we have to find out where Pillow sets the buffersize for saving an <span class="caps">JPEG</span> image.</p>
<p>The file <code>PIL/JpegImagePlugin.py</code> is used for all functions related to a <span class="caps">JPEG</span> image, and this includes saving.</p>
<p>The whole save method is a bit large to post here, but the part below determines the buffer size and it’s used to save the image. The buffer size is set to be holding the entire image file.</p>
<div class="highlight"><pre><span></span><span class="n">bufsize</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">if</span> <span class="n">optimize</span> <span class="ow">or</span> <span class="n">progressive</span><span class="p">:</span>
<span class="c1"># CMYK can be bigger</span>
<span class="k">if</span> <span class="n">im</span><span class="o">.</span><span class="n">mode</span> <span class="o">==</span> <span class="s1">'CMYK'</span><span class="p">:</span>
<span class="n">bufsize</span> <span class="o">=</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># keep sets quality to 0, but the actual value may be high.</span>
<span class="k">elif</span> <span class="n">quality</span> <span class="o">>=</span> <span class="mi">95</span> <span class="ow">or</span> <span class="n">quality</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">bufsize</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">bufsize</span> <span class="o">=</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># The exif info needs to be written as one block, + APP1, + one spare byte.</span>
<span class="c1"># Ensure that our buffer is big enough. Same with the icc_profile block.</span>
<span class="n">bufsize</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">ImageFile</span><span class="o">.</span><span class="n">MAXBLOCK</span><span class="p">,</span> <span class="n">bufsize</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"exif"</span><span class="p">,</span> <span class="sa">b</span><span class="s2">""</span><span class="p">))</span> <span class="o">+</span> <span class="mi">5</span><span class="p">,</span>
<span class="nb">len</span><span class="p">(</span><span class="n">extra</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">ImageFile</span><span class="o">.</span><span class="n">_save</span><span class="p">(</span><span class="n">im</span><span class="p">,</span> <span class="n">fp</span><span class="p">,</span> <span class="p">[(</span><span class="s2">"jpeg"</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span><span class="o">+</span><span class="n">im</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">rawmode</span><span class="p">)],</span> <span class="n">bufsize</span><span class="p">)</span>
</pre></div>
<p>I don’t want to change the Pillow source itself cause of potential issues whenever we upgrade Pillow in the future. So the best thing I can do is modify the <code>ImageFile.MAXBLOCK</code>, not that big of deal I think.</p>
<p>I came up with the following solution</p>
<div class="highlight"><pre><span></span><span class="n">new_maxblock</span> <span class="o">=</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">image</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">image</span><span class="o">.</span><span class="n">size</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="c1"># ...3 bytes per every pixel in the image</span>
<span class="n">old_maxblock</span> <span class="o">=</span> <span class="n">ImageFile</span><span class="o">.</span><span class="n">MAXBLOCK</span>
<span class="k">if</span> <span class="n">new_maxblock</span> <span class="o">></span> <span class="n">ImageFile</span><span class="o">.</span><span class="n">MAXBLOCK</span><span class="p">:</span>
<span class="n">ImageFile</span><span class="o">.</span><span class="n">MAXBLOCK</span> <span class="o">=</span> <span class="n">new_maxblock</span>
<span class="n">requested_size</span> <span class="o">=</span> <span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">width</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">height</span><span class="p">))</span>
<span class="n">image</span><span class="o">.</span><span class="n">thumbnail</span><span class="p">(</span><span class="n">requested_size</span><span class="p">,</span> <span class="n">Image</span><span class="o">.</span><span class="n">ANTIALIAS</span><span class="p">)</span>
<span class="n">image</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">thumb_file</span><span class="p">,</span> <span class="s2">"JPEG"</span><span class="p">,</span> <span class="n">progressive</span><span class="o">=</span><span class="bp">True</span><span class="p">,)</span>
<span class="n">ImageFile</span><span class="o">.</span><span class="n">MAXBLOCK</span> <span class="o">=</span> <span class="n">old_maxblock</span>
</pre></div>
<p>We determine a new max block size, as most <span class="caps">JPEG</span> files are 24bits color (<span class="caps">RGB</span>), we need 3 bytes per pixel. This might be overkill in certain situations but at times I prefer overkill over not having having the thumbnail.</p>
<p>After implementing the above solution the <code>I/O suspension not supported in scan optimization</code> error message has not been seen in the logs.</p>Replace JPEG libraries with MozJPEG2018-03-28T09:03:00-04:002018-03-28T09:03:00-04:00Peter van der Doestag:blog.avirtualhome.com,2018-03-28:/replace-jpeg-libraries-with-mozjpeg/<p>For a project in Python we had to squeeze more bytes out of <span class="caps">JPG</span> files using Pillow. Currently MozJPEG fits that bill but there isn’t a repository available to install it on Ubuntu.</p><p>For an image heavy site we were building we needed to squeeze more bytes out of the <span class="caps">JPEG</span> files. We use <a href="http://pillow.readthedocs.io/en/latest/">Pillow</a> within our Python project to create thumbnails which in turn uses the <span class="caps">JPEG</span> libraries installed on your system, so we had to look for a 1-on-1 replacement of the system jpeg libraries.</p>
<p>For Ubuntu you can use <a href="https://libjpeg-turbo.org/">libjpeg-turbo</a> but using <a href="https://github.com/mozilla/mozjpeg">MozJPEG</a> by Mozilla makes the thumbnails even smaller. The only problem we ran into was the fact there is no repository you can add in Ubuntu and therefore we had to compile MozJPEG manually.</p>
<p>If you just want to skip the steps go to <a href="#tldr">tl;dr</a>. All the steps need to be ran as root.</p>
<h4 id="install-requirements">Install requirements</h4>
<p>To compile MozJPEG you need to install some requirements.</p>
<div class="highlight"><pre><span></span>apt -y install build-essential cmake libtool autoconf automake m4 nasm pkg-config
</pre></div>
<p>and then configure the dynamic linker run-time bindings
<div class="highlight"><pre><span></span>ldconfig /usr/lib
</pre></div></p>
<h4 id="get-mozjpeg-source">Get MozJPEG source</h4>
<p>We’ll be working with version 3.2 of the MozJPEG library.
<div class="highlight"><pre><span></span>wget https://github.com/mozilla/mozjpeg/archive/v3.2.tar.gz
tar xf v3.2.tar.gz
</pre></div></p>
<h4 id="configure-and-install">Configure and Install</h4>
<p>Before we can configure and install we have to create the configuration. Go to the directory you extract the archive in.</p>
<div class="highlight"><pre><span></span><span class="nb">cd</span> mozjpeg-3.2
autoreconf -fiv
</pre></div>
<p>To keep source and build separate we’ll do the build in it’s own directory.</p>
<div class="highlight"><pre><span></span>mkdir build
<span class="nb">cd</span> build
sh ../configure --with-jpeg8
make install <span class="nv">libdir</span><span class="o">=</span>/usr/lib/x86_64-linux-gnu <span class="nv">prefix</span><span class="o">=</span>/usr
</pre></div>
<p>We have to copy one source file over as it’s not included in the build.
<div class="highlight"><pre><span></span>cp ../jpegint.h /usr/include/jpegint.h
</pre></div></p>
<p>That’s it, now almost any program on your server that use the <span class="caps">JPEG</span> libraries to create images will be using MozJPEG and making the files much smaller than with the standard or even libjpeg-turbo libraries.</p>
<h4 id="tldr"><span class="caps">TL</span>;<span class="caps">DR</span></h4>
<p>The script below will do all the above steps. Remember to run this as root.</p>
<div class="highlight"><pre><span></span><span class="c1">#/bin/sh</span>
apt -y install build-essential cmake libtool autoconf automake m4 nasm pkg-config
ldconfig /usr/lib
rm -rf mozjpeg-3.2
wget https://github.com/mozilla/mozjpeg/archive/v3.2.tar.gz
tar xf v3.2.tar.gz
<span class="nb">cd</span> mozjpeg-3.2
autoreconf -fiv
mkdir build
<span class="nb">cd</span> build
sh ../configure --with-jpeg8
make install <span class="nv">libdir</span><span class="o">=</span>/usr/lib/x86_64-linux-gnu <span class="nv">prefix</span><span class="o">=</span>/usr
cp ../jpegint.h /usr/include/jpegint.h
</pre></div>
<h4 id="bonus">Bonus</h4>
<p>If you use Pillow in your Python project and it was already installed you need to reinstall it, we ran into issues where after reinstalling it still would not use the MozJPEG libraries. In order to make that work, we had to recompile Pillow. The code below will recompile pillow</p>
<div class="highlight"><pre><span></span>pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile -v Pillow
</pre></div>Create a custom 410 error page in NGINX2018-03-23T08:03:00-04:002018-03-23T08:03:00-04:00Peter van der Doestag:blog.avirtualhome.com,2018-03-23:/create-custom-410-error-page-nginx/<p>I had a need to create a 410 page for a whole bunch pages. As it turns out it was not as easy as it sounds</p><p>When converting this blog from WordPress to Pelican I decided I just ditch a whole bunch of articles I had written in WordPress. According to several articles on the net it is best practice to have pages you delete return a 410 page. For a better user experience I wanted used to land on a page that looks like part of the blog.</p>
<p>As I’m using <span class="caps">NGINX</span> you can utilize the map function to create a new variable whose value depends on values of one or more of the source variables specified in the first parameter. I created a file called <code>old_request.nginx</code> in the directory <code>/etc/nginx/snippets/</code></p>
<div class="highlight"><pre><span></span>map $request_uri $gone_var {
/the-avh-amazon-plugin-has-reached-its-end-of-life/ 1;
/wordpress-plugin-update-avh-first-defense-against-spam-v3-0/ 1;
/end-of-the-avh-amazon-plugin/ 1;
/wordpress-plugin-update-avh-extended-categories/ 1;
}
</pre></div>
<p>This file just needs to be included in your configuration file, and if you want to have a custom 410 error page you just have to tell <span class="caps">NGINX</span> which file to use when it encounters a 410 error.</p>
<div class="highlight"><pre><span></span>include snippets/old_request.nginx;
server {
....
error_page 404 /404.html;
error_page 410 /410.html;
if ($gone_var) {
return 410;
}
location / {
....
}
}
</pre></div>
<p>Easy enough, <strong><span class="caps">NOT</span></strong>. The above configuration does not work. If you try to browse one of the URLs mentioned in the the <code>old_request.nginx</code> file you get the default <span class="caps">NGINX</span> 410 error page and not the file you said it should show.</p>
<p>To fix this we have to use a named location. A named location has the <code>@</code> prefix. Such a location is not used for a regular request processing, but instead used for request redirection.</p>
<div class="highlight"><pre><span></span>include snippets/old_request.nginx;
server {
....
error_page 404 /404.html;
error_page 410 @gone;
if ($gone_var) {
return 410;
}
location @gone {
rewrite ^(.*)$ /410.html break;
}
location / {
....
}
}
</pre></div>
<p>And now your custom 410 error page works.</p>Move from WordPress to Pelican2018-03-11T22:15:24-04:002018-03-17T23:03:00-04:00Peter van der Doestag:blog.avirtualhome.com,2018-03-11:/move-from-wordpress-to-pelican/<p>I have been playing with the thought of moving my blog from WordPress to a static website generator for years and I finally pulled the trigger.</p><p>I have been playing with the thought of moving my blog from WordPress to a static website generator for years.</p>
<p>I never really had the energy to start this projects, for several reasons. I know I could never find a theme that is completely to my liking so I would have to change it, possibly altering some code. There were some generator written in <span class="caps">PHP</span>, my language of choice for a long time, but they seemed not very mature. The best known static web site generator was Jekyll, written in Ruby and I really couldn’t get the energy to start learning Ruby on the side. The other reason that was keeping me from moving was how to move all my articles from WordPress to the static generator.</p>
<p>I started working for <a href="https://oneilinteractive.com/">ONeil Interactive</a> in 2017. We build websites for the home building industry and the language of choice is Python. We started looking into building an internal site for documenting our projects. We develop in Python and it was quickly decided the static generator should be written in Python as it would make it easier to extend the generator if needed.</p>
<p>As a result of this project I decided to bite the bullet for my personal site as well. After a quick look at the site <a href="https://www.staticgen.com/">StaticGen</a> and some quick research I decided to go with <a href="https://github.com/getpelican/pelican">Pelican</a>. Of course I needed a theme, and there is never a theme that completely satisfies my needs but my starter theme is <a href="https://github.com/kdeldycke/plumage">Plumage by Kevin Deldycke</a>. I’ve modified it to work with Bootstrap 4, and used the <a href="https://bootswatch.com/slate/">Slate theme by Bootswatch</a>.</p>
<p>I only moved two articles from my old blog as the rest of the articles were not visited that often. I still have them in the database so if it’s ever needed I can pull them up.</p>
<p>Oh in case you were wondering, for the project at work we decided to go with <a href="https://www.mkdocs.org/">MkDocs</a></p>