Evolutionary fine tuning of quantized convolution-based deep learning models

arXiv cs.LG 05/08/26, 04:00 AM Papers
Summary
This paper proposes a neuroevolution-based fine-tuning method to improve the accuracy of quantized deep learning models, showing that nearest-neighbor rounding alone is suboptimal and that evolutionary mutation of weights can yield better results on architectures like VGG and ResNet.
arXiv:2605.05228v1 Announce Type: new Abstract: Deep learning models are the most efficient models in many machine learning tasks. The main disadvantage when using them in IoT, mobile devices, independent autonomous or real-time systems is their complexity and memory size. Therefore, much research has concentrated on compression techniques of deep learning architectures. One of the most popular technique is quantization. In most of the works, the quantization is done based on the nearest neighbour quantization technique. This work focuses on improving the quantization efficiency in pretrained and quantized models. This approach has the potential to improve the final accuracy of quantized models. The main postulate of the work is that final quantization states of the network based on nearest neighbour rounding does not guarantee optimal accuracy. In the presented work, the evolution strategy is used as an optimization approach. The evolution in each iteration changes the values of the small percentage of weights. It shifts theirs values to different quantization states. The work shows that proposed evolution with an appropriate set of operators and parameters can fast improve the accuracy of the quantized models. The results are presented for popular architectures such as VGG and Resnet for image classification and detection. Additionally, simulations were carried out for the autoencoder architecture.
Original Article Export to Word Export to PDF
View Cached Full Text
Cached at: 05/08/26, 06:45 AM
# Evolutionary fine tuning of quantized convolution-based deep learning models
Source: [https://arxiv.org/html/2605.05228](https://arxiv.org/html/2605.05228)
###### Abstract

Deep learning models are the most efficient models in many machine learning tasks\. The main disadvantage when using them in IoT, mobile devices, independent autonomous or real\-time systems is their complexity and memory size\. Therefore, much research has concentrated on compression techniques of deep learning architectures\. One of the most popular technique is quantisation\. In most of the works, the quantisation is done based on the nearest neighbour quantisation technique\. This work focuses on improving the quantisation efficiency in pretrained and quantised models\. This approach has the potential to improve the final accuracy of quantised models\. The main postulate of the work is that final quantisation states of the network based on nearest neighbour rounding does not guarantee optimal accuracy\. In the presented work, the neuroevolution strategy is used as an optimisation approach\. The neuroevolution in each iteration changes the values of the small percentage of weights\. It shifts theirs values to different quantisation states\. The work shows that proposed neuroevolution with an appropriate set of operators and parameters can fast improve the accuracy of the quantised models\. The results are presented for popular architectures such as VGG and Resnet for image classification and detection\. Additionally, simulations were carried out for the autoencoder architecture\.

*K*eywordsdeep learning⋅\\cdotquantization⋅\\cdotneuroevolution⋅\\cdotfine tuning

## 1Introduction

Low bit quantisation is a process which enables the adaption of deep neural architectures in embedded systems and helps to decrease theirs computational complexity\. There are many approaches to decreasing bit precision while minimising the drop in accuracy\[[1](https://arxiv.org/html/2605.05228#bib.bib1),[2](https://arxiv.org/html/2605.05228#bib.bib2),[3](https://arxiv.org/html/2605.05228#bib.bib3),[4](https://arxiv.org/html/2605.05228#bib.bib4),[5](https://arxiv.org/html/2605.05228#bib.bib5),[6](https://arxiv.org/html/2605.05228#bib.bib6),[7](https://arxiv.org/html/2605.05228#bib.bib7),[8](https://arxiv.org/html/2605.05228#bib.bib8),[9](https://arxiv.org/html/2605.05228#bib.bib9)\]\. In\[[10](https://arxiv.org/html/2605.05228#bib.bib10),[1](https://arxiv.org/html/2605.05228#bib.bib1)\]authors show how quantisation can decrease execution time by reducing the bit\-width format of deep learning models\. Many of these techniques are based on complex optimisation algorithms which are time consuming and very often difficult for parallelisation\. Creating a fast and efficient quantisation method which can improve the model accuracy presents a significant challenge\. The quantisation methods can be divided according to several factors\. The first factor is the stage in which it is performed\. It can be done during the training process or just after it\. The second factor is the type of quantisation\. It can be linear or nonlinear quantisation\. In this work, the presented method is run on linearly quantised model\. The input of the presented algorithm is the pretrained and quantised model\. The main assumption of the solution is that the quantization of the input model is done by rounding to the nearest neighbour\. This is the approach used in most quantization algorithms\[[1](https://arxiv.org/html/2605.05228#bib.bib1),[2](https://arxiv.org/html/2605.05228#bib.bib2),[11](https://arxiv.org/html/2605.05228#bib.bib11),[3](https://arxiv.org/html/2605.05228#bib.bib3),[9](https://arxiv.org/html/2605.05228#bib.bib9)\]\.

In presented approach, the novel fine tuning approach is performed\. It mutates the weights by value which is multiplication of least significant bit\. The small percentage of weights are mutated in each iteration\. The neuroevolution process is run on each layer separately\. In addition, the algorithm is equipped with a sensitivity analysis of the tested model layers\. It iterates through the layers from the least to the most sensitive\. This feature allows to stop the process if some user defined drop in accuracy is achieved\. Another approach with a similar input assumption is the algorithm presented in the paper\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\]\. It also elaborates that an approximation to the nearest neighbor does not guarantee optimal performance of the model\. The main goal of the work was to create a methodology that is scalable and able to quickly improve accuracy of quantised models\.

## 2Related work

Quantisation is one of the most efficient techniques for the compression of deep learning models\[[1](https://arxiv.org/html/2605.05228#bib.bib1),[2](https://arxiv.org/html/2605.05228#bib.bib2)\]\. Common strategies include the quantisation of all coefficients in a single layer with a specified number of bits to represent the integer and fractional components\[[4](https://arxiv.org/html/2605.05228#bib.bib4),[3](https://arxiv.org/html/2605.05228#bib.bib3)\]based on the range of values of the coefficients set\. Another strategy is to represent coefficients and data by integer numbers with an appropriate scaling factor\. Many quantisation approaches in the literature adopt linear\[[1](https://arxiv.org/html/2605.05228#bib.bib1),[2](https://arxiv.org/html/2605.05228#bib.bib2),[3](https://arxiv.org/html/2605.05228#bib.bib3)\]or non\-linear approaches including clustering\[[5](https://arxiv.org/html/2605.05228#bib.bib5)\]\. Quantisation can be performed during model training\[[2](https://arxiv.org/html/2605.05228#bib.bib2)\]or can be run on a pre\-trained model\[[1](https://arxiv.org/html/2605.05228#bib.bib1),[5](https://arxiv.org/html/2605.05228#bib.bib5)\]\. Recently, several methods for low\-bit representation have been designed\[[9](https://arxiv.org/html/2605.05228#bib.bib9),[13](https://arxiv.org/html/2605.05228#bib.bib13),[14](https://arxiv.org/html/2605.05228#bib.bib14),[15](https://arxiv.org/html/2605.05228#bib.bib15),[6](https://arxiv.org/html/2605.05228#bib.bib6)\]\. Many of these can not be run without some significant degradation in accuracy\. One known advantage of quantisation is the fact that it facilitates the adoption of deep neural networks in specialised hardware accelerators with limited arithmetic bit\-width and memory space\[[16](https://arxiv.org/html/2605.05228#bib.bib16),[10](https://arxiv.org/html/2605.05228#bib.bib10)\]\.

The authors in\[[17](https://arxiv.org/html/2605.05228#bib.bib17)\]show that a two\-level neuroevolution strategy scheme can outperform human\-designed models in some specific tasks, for example, language modelling and image classification\. In\[[18](https://arxiv.org/html/2605.05228#bib.bib18)\], a novel neuroevolutionary method for the optimisation of the architecture and hyperparameters of convolutional autoencoders is presented\. In\[[19](https://arxiv.org/html/2605.05228#bib.bib19)\], it is shown that a genetic algorithm could evolve autoencoders that can reproduce the data better than manually created autoencoders with more hidden units\. The first approach of co\-evolutionary neuroevolution\-based multivariate anomaly detection system is presented in\[[20](https://arxiv.org/html/2605.05228#bib.bib20)\]\. The authors show that proposed neuroevolutionary\-based solution can outperform other human designed models on a well known benchmarks\. These works show that neuroevolution can be very efficient method in deep learning architecture exploration\.

Fine\-tuning is quite a popular technique for improving the accuracy of the pretrained models\. The most popular technique is gradient\-based fine\-tuning\[[21](https://arxiv.org/html/2605.05228#bib.bib21)\],\[[22](https://arxiv.org/html/2605.05228#bib.bib22)\]\. The non\-gradient approach is quite rare but can yield significant improvements as shown in\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\]\. In\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\]the authors give the same input assumption is shown in this work\. The authors in\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\]try to find the near optimal state of the model which allows to have quantisation weight values which are not rounded to the nearest neighbour\.

In the presented work, a novel approach is proposed\. The fine tuning based on neuroevolution is used for improving the accuracy of pretrained models\.

## 3Quantisation

After network distillation by the pruning process, quantisation can be performed as the next step of reducing the complexity of the model\. Quantisation is the procedure of constraining values from a continuous set or a more dense domain to a relatively discrete set\. It is possible to define general mapping from a floating\-point datax∈𝒮x\\in\{\\mathcal\{S\}\}to a fixed\-pointq∈𝒬q\\in\\mathcal\{Q\}using a functionf𝒬:𝒮→𝒬f\_\{\\mathcal\{Q\}\}:\\mathcal\{S\}\\rightarrow\\mathcal\{Q\}as follows \(assuming signed representation\):

q=f𝒬\(x\)=μ\+σ⋅round\(σ−1⋅\(x−μ\)\)\.q=f\_\{\\mathcal\{Q\}\}\(x\)=\\mu\+\\sigma\\cdot\\text\{round\}\(\\sigma^\{\-1\}\\cdot\(x\-\\mu\)\)\.\(1\)In our case,μ=0\\mu=0andσ=2−𝐟𝐫𝐚𝐜\_𝐛𝐢𝐭𝐬\\sigma=2^\{\-\{\\bf frac\\\_bits\}\}where:

𝐢𝐧𝐭\_𝐛𝐢𝐭𝐬=ceil\(log2⁡\(maxx∈𝒮⁡\|x\|\)\)\{\\bf int\\\_bits\}=\\text\{ceil\}\(\\log\_\{2\}\(\\max\_\{x\\in\{\\mathcal\{S\}\}\}\|x\|\)\)\(2\)
and

𝐟𝐫𝐚𝐜\_𝐛𝐢𝐭𝐬=𝐭𝐨𝐭𝐚𝐥\_𝐛𝐢𝐭𝐬−𝐢𝐧𝐭\_𝐛𝐢𝐭𝐬−1\.\{\\bf frac\\\_bits\}=\{\\bf total\\\_bits\}\-\{\\bf int\\\_bits\}\-1\.\(3\)
The number of MAC operations in the layer is equal toPi⋅Ci⋅Di⋅Hi⋅WiP\_\{i\}\\cdot C\_\{i\}\\cdot D\_\{i\}\\cdot H\_\{i\}\\cdot W\_\{i\}, wherePiP\_\{i\}is the number of neurons in an output feature map,CiC\_\{i\}andDiD\_\{i\}are numbers of input and output channels, andWiW\_\{i\}andHiH\_\{i\}are the width and the height of the filter\. The memory footprint of the single layer isCi⋅Di⋅Hi⋅WiC\_\{i\}\\cdot D\_\{i\}\\cdot H\_\{i\}\\cdot W\_\{i\}\. The quantised model has reduced the number of MAC operations\. The same situation applies in the case of the memory footprint\. The quantisation reduces the complexity further\. The bit width of data decreases the number of cycles needed to run multiplication operations\. In the case of the 8\-bit quantisation \(both weights and activations in 8\-bit format\), the number of MAC operations of the baseline floating point configuration is reduced by1/91/9\. If weights are further reduced to 4\-bit, the number of cycles in full 8\-bit configuration are reduced by more than1/21/2\. For 8\-bit weights and 16\-bit half precision activations, the reduction ratio is2/92/9\.

## 4Neuroevolution fine tuning

Neuroevolution is an optimisation algorithm based on a genetic approach which helps to find more optimal neural architectures\. In this work, the neuroevolution strategy is adapted for fine tuning the quantised weights in order to improve the accuracy of the model\. The optimisation process is run on a pretrained quantised model\. The pretrained model is quantised using linear quantisation with nearest neighbour rounding\. The whole process in divided into two phases\. The first phase is sensitivity analysis and the second is fine tuning the quantised layers by through the use of the neuroevolution approach\.

### 4\.1Sensitivity analysis

The goal of the sensitivity analysis is to set up the ranking list of the most sensitive layers\. The whole process is shown in Alg\.[1](https://arxiv.org/html/2605.05228#alg1)\. At the beginning of the analysis, the empty list is created \(line 1\)\. The list will be used for storing the accuracy measured by quantising the specific layer\. A single layer in each iteration is then quantised into a low bit format \(loop in lines 2\-8\)\. First, the copy of the original model is created \(line 3\)\. Then, the layerllis quantised \(line 4\)\. Theθl\\theta\_\{l\}weight tensor is transformed to a low bit formatθlq\\theta\_\{l\}^\{q\}\. The quantised layer is inserted to the whole model weight tensor \(line 5\)\. Finally, the inference is run in order to check the accuracy\. The accuracy is added to the list \(line 7\)\. When all layers are run in the loop, the ranking is returned \(line 9\)\.

0:

ψ\\psi– desired bit\-width

0:

Θ\\Theta– weights of a model

0:

FΘF\_\{\\Theta\}– model

1:

Λ←∅\\Lambda\\leftarrow\\emptyset\{List for top1 metric\}

2:for

θi\\theta\_\{i\}in

Θ\\Thetado

3:

FΘ′←copy\(FΘ\)F\_\{\\Theta\}^\{\\prime\}\\leftarrow copy\(F\_\{\\Theta\}\)
4:

θiq←\\theta\_\{i\}^\{q\}\\leftarrowq\(

θi\\theta\_\{i\},

ψ\\psi=4\) \{quantize layer to four bit\}

5:

Θ′←\{θ0,θ1,…,θiq,…,θN\}\\Theta^\{\\prime\}\\leftarrow\\\{\\theta\_\{0\},\\theta\_\{1\},\.\.\.,\\theta\_\{i\}^\{q\},\.\.\.,\\theta\_\{N\}\\\}
6:

ai←a\_\{i\}\\leftarroweval\(

FΘ′′F\_\{\\Theta^\{\\prime\}\}^\{\\prime\}\)

7:

Λ←Λ∪ai\\Lambda\\leftarrow\\Lambda\\cup a\_\{i\}
8:endfor

9:returnargsort\(

Λ\\Lambda\)

Algorithm 1Sensitivity analysisThe algorithm goes through all the layers and quantises them to a specified bit width \-ϕ\\phi\. In the presented approach, the bit\-width is set to 4\. In line 1, the algorithm initialises the list in which it stores the accuracy values\.

### 4\.2Fine tuning pretrained model

The deep learning network is defined as a sequence of layers :

FΘ\(X\)=fθL\(fθL−1…\(fθ0\(X\)\)\)F\_\{\\Theta\}\(X\)=\{f\_\{\\theta\_\{L\}\}\(f\_\{\\theta\_\{L\-1\}\}\.\.\.\(f\_\{\\theta\_\{0\}\}\(X\)\)\)\}\(4\)
The trainable parameters \(weights\) are defined as a following list:

Θ=\{θ0,θ1,…,θL\}\\Theta=\\\{\\theta\_\{0\},\\theta\_\{1\},\.\.\.,\\theta\_\{L\}\\\}\(5\)
The quantized version of the weight tensor is as follows:

Θq=\{θ0q,θ1q,…,θLq\}\\Theta^\{q\}=\\\{\\theta\_\{0\}^\{q\},\\theta\_\{1\}^\{q\},\.\.\.,\\theta\_\{L\}^\{q\}\\\}\(6\)
Each layer weight tensor can be quantized \(usingqqfunction\) with specified bit width \-ϕ\\phi:

θiq=q\(θi,ϕ\)\\theta\_\{i\}^\{q\}=q\(\\theta\_\{i\},\\phi\)\(7\)
The pretrained quantised model is represented asFΘqF\_\{\\Theta^\{q\}\}\. It is an input to the neuroevolution strategy which is described in Alg\.[2](https://arxiv.org/html/2605.05228#alg2)\. The algorithm iterates through all the layers in the model \(line 1\-20\)\. It begins from the less sensitive layers\. It takes the weight tensor of the layer \(line 2\) and adds the copies of it to the initial populationPP\(line 4\)\. The initial population is just a list of copies of the quantisedlllayer:

P=\{Θlq,Θlq,…,Θlq\}P=\\\{\\Theta^\{q\}\_\{l\},\\Theta^\{q\}\_\{l\},\.\.\.,\\Theta^\{q\}\_\{l\}\\\}\(8\)
Then, mask is generated using binomial distribution \(line 9\)\. The mask is a binary tensor:

Mi∈\{0,1\}SHθiM\_\{i\}\\in\\\{0,1\\\}^\{SH\_\{\\theta\_\{i\}\}\}\(9\)
The shapeSHθiSH\_\{\\theta\_\{i\}\}of the mask is the same as the shape of weight tensorθi\\theta\_\{i\}\. The mask haspppercentage of11values\. The weights position for which the mask is set to11will be change during the mutation step \(line 12\):

θi=θi\+Mi⊙q→\\theta\_\{i\}=\\theta\_\{i\}\+M\_\{i\}\\odot\\vec\{q\}\(10\)
The vectorq→\\vec\{q\}is generated by following equation \(line 11\):

q→=r→⋅σl\\vec\{q\}=\\vec\{r\}\\cdot\\sigma\_\{l\}\(11\)
wheresigmalsigma\_\{l\}is least significant bit value for the specificlllayer andr→\\vec\{r\}is defined as:

r→=\{r0,r1,…,rN\}\\vec\{r\}=\\\{r\_\{0\},r\_\{1\},\.\.\.,r\_\{N\}\\\}\(12\)
Eachrir\_\{i\}value is a randomly generated number from the predefined set \(line 10\):

ri∈\{−2,−1,1,2\}r\_\{i\}\\in\\\{\-2,\-1,1,2\\\}\(13\)
The−1\-1and11values are generated with probability equal to 40% each\. The−2\-2and22are generated with 10% each\. Finally the mutated layer is added to the model weight tensor \(line 15\) and is evaluated\. In line 18, the accuracy values in the population are sorted\. The best candidates are taken to the next iteration \(line 19\)\.

0:

rr– ranking list

0:

PSP\_\{S\}– population size

0:

Θ\\Theta– weights of a model

1:for

llin

rrdo

2:

P←∅P\\leftarrow\\emptyset
3:for

iiin

PSP\_\{S\}do

4:

P←P∪copy\(θlq\)P\\leftarrow P\\cup copy\(\\theta^\{q\}\_\{l\}\)
5:endfor

6:

A←∅A\\leftarrow\\emptyset
7:for

iiin

IIdo

8:for

jjin

PSP\_\{S\}do

9:

Ml←\{B\(n,p\)\},B\(n,p\)\},…,B\(n,p\)\}M\_\{l\}\\leftarrow\\\{B\(n,p\)\\\},B\(n,p\)\\\},\.\.\.,B\(n,p\)\\\}\{Bernoulli distribution\}

10:

r→←\\vec\{r\}\\leftarrowsample vector from

\{−2,−1,1,2\}\\\{\-2,\-1,1,2\\\}
11:

q=r→⋅σlq=\\vec\{r\}\\cdot\\sigma\_\{l\}
12:

P\[j\]=P\[j\]\+Ml⊙qP\[j\]=P\[j\]\+M\_\{l\}\\odot q
13:

P←P∪P\[j\]P\\leftarrow P\\cup P\[j\]\{add mutated layer to the population\}

14:

θl=P\[j\]\\theta\_\{l\}=P\[j\]
15:

Θ=\{θ0,…,θl,…,θN\}\\Theta=\\\{\\theta\_\{0\},\.\.\.,\\theta\_\{l\},\.\.\.,\\theta\_\{N\}\\\}
16:

A←A∪eval\(FΘ\)A\\leftarrow A\\cup eval\(F\_\{\\Theta\}\)
17:endfor

18:

sorted←argsort\(A\)sorted\\leftarrow argsort\(A\)
19:

P←P\[sorted\[:PS\]\]P\\leftarrow P\[sorted\[:P\_\{S\}\]\]\{take

PSP\_\{S\}best candidates\}

20:endfor

21:endfor

22:return

PP

Algorithm 2Fine tuning

## 5Results

The presented results show that the proposed solution can significantly improve the results achieved by quantisation based on rounding to the nearest neighbour\. The baseline floating point results are shown in Table[1](https://arxiv.org/html/2605.05228#S5.T1)\. There are four models and four datasets\. The FasterRCNN is a object detection model\. It is tested on a VOC Pascal dataset\. The baseline mAP \(mean average precision\) is 71\.2%\. The remaining three models: Resnet18, VGG16 are the models for image classification and CNN autoencoder \- 12 layer autoencoder from\[[20](https://arxiv.org/html/2605.05228#bib.bib20)\]for time series anomaly detection\. The Resnet18 and VGG16 are simulated on CIFAR100 and ImageNet\. The CNN autoencoder is tested on SWAT benchmark\. For CIFAR100 and ImageNet the metric is top1\. The metric for SWAT is F1\.

Table 1:Baseline resultsCIFAR100VOC PascalImageNetSWATFasterRCNN\-71\.18\-Resnet1875\.0\-69\.75VGG1670\.4\-71\.59\-CNN AE\-\-\-82\.0Table 2:8 bit linear quantisation resultsCIFAR100VOC PascalImageNetSWATFasterRCNN\-71\.04\-\-Resnet1874\.9\-69\.5\-VGG1670\.4\-71\.25\-CNN AE\-\-\-81\.98Table 3:Linear quantisation results: 4 bit Resnet18 and VGG16 for CIFAR100, 6 bit Resnet18 and VGG16 for ImageNet, 5 bit CNN AE, 6 bit FasterRCNNCIFAR100VOC PascalImageNetSWATFasterRCNN\-64\.0\-\-Resnet1873\.5\-62\.04\-VGG1668\.9\-54\.73\-CNN AE\-\-\-80\.70In Table[2](https://arxiv.org/html/2605.05228#S5.T2)8\-bit quantisation results for all models are presented, in Table[3](https://arxiv.org/html/2605.05228#S5.T3)4\-bit Resnet18 and VGG16 on CIFAR100, 6\-bit Resnet18 and 5\-bit VGG16 on ImageNet, 6\-bit FasterRCNN and 5\-bit CNN AE results are described\. In Table[2](https://arxiv.org/html/2605.05228#S5.T2)and[3](https://arxiv.org/html/2605.05228#S5.T3)linear quantisation with rounding to nearest neighbour was used \(section 3\)\. It can be observed that in the case of 8 bit quantisation, a less than 1% drop in accuracy was achieved in all models and datasets\. In the case of lower bit quantisation, the drop is significantly higher for 6\-bit Resnet18 \(7\.71%\) and 5\-bit VGG16 \(16\.86%\) on ImageNet\. The same can be observed for 6\-bit FasterRCNN where the drop is 7\.18%\. The small drop is for 5\-bit CNN AE \(1\.3%\), and for Resnet18 and VGG16 on CIFAR100 \(less than 1\.5%\)\. In Table[4](https://arxiv.org/html/2605.05228#S5.T4)and[5](https://arxiv.org/html/2605.05228#S5.T5)the results are presented after applying fine tuning neuroevolution approach\. In simulations for Resnet18 and VGG16, the population size was set to 32, the number of iterations was 64, and 2% of weights were mutated in each candidate\. For CNN AE the population was 24, iterations number was set to 30\. In the case of FasterRCNN the simulations were run with 16 iterations and 16 solutions in population\. The FasterRCNN achieves a drop in accuracy of less than 1% in the cases of both 6 and 8\-bit quantisation\. The fine tuned 8 bit Resnet18 and VGG16 have better results on CIFAR100 than floating point version \(\+0\.2%\)\. The 4 bit fine tuned versions are just 0\.2% worse than baseline floating point counterparts\. For CNN autoencoder the accuracy is very close the floating point version \(82\.09% for 8 bit, which is better by 0\.09%, and 81\.93% for 5 bit version, which is better by more than 1% from the baseline 5 bit quantisation\)\. The highest drops in accuracy are observed for 6\-bit and 5\-bit quantisation for ImageNet, but the proposed fine tuning significantly improves the baseline quantisation results \(by more than 6% in case of Resnet18 and more than 14\.5% in case of VGG16\) and achieves accuracy about 1\.5% and 2% worse than floating point baseline versions for Resnet18 and VGG16, respectively\. For 8\-bit quantisation the results are also improved and very close to the floating point results \(less than 0\.1%\)\.

The obtained results show that the proposed method allows to achieve results very close to the original non\-quantized model\. Only for the ImageNet dataset it gives worse results than the method presented in\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\], which is able in some cases to regain the accuracy in 4 bit format\. The main advantage of proposed method is definitely better scalability and better time efficiency than\[[12](https://arxiv.org/html/2605.05228#bib.bib12)\]\.

Table 4:8 bit fine tuned resultsCIFAR100VOC PascalImageNetSWATFasterRCNN\-71\.34\-\-Resnet1875\.2\-69\.7\-VGG1670\.6\-71\.5\-CNN AE\-\-\-82\.09Table 5:Fine tuned linear quantisation results: 4 bit Resnet18 and VGG16 for CIFAR100, 6 bit Resnet18 and VGG16 for ImageNet, 5 bit CNN AE, 6 bit FasterRCNNCIFAR100VOC PascalImageNetSWATFasterRCNN\-70\.5\-\-Resnet1874\.8\-68\.2\-VGG1670\.2\-69\.6\-CNN AE\-\-\-81\.93
## 6Conclusions and future work

The results described in this work show that the proposed solution is quite efficient\. The simulations run on different datasets with various models showing that evolutionary fine tuning can boost accuracy and minimize the drop after nearest neighbour quantisation\. This is a good alternative for gradient fine tuning\. The next advantage is its scalability\. It can be fully parallelised\. In the next step the algorithm will be incorporated with the new features like finding different bit formats for each channel in a single layer\. Future works will concentrate on improving the method by combining gradient with non\-gradient fine tuning and adapting similar methodology for nonlinear quantisation\. The more architectures like MobileNet and Transformer\-based will be explored\.

## References

- \[1\]M\. Al\-Hami, M\. Pietron, R\. Casas, and M\. Wielgosz\.Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification\.January 2020\.
- \[2\]Song Han, Jeff Pool, John Tran, and William Dally\.Learning both weights and connections for efficient neural network\.2015\.
- \[3\]P\. Gysel, M\. Motamedi, and S\. Ghiasi\.Hardware\-oriented approximation of convolutional neural networks\.arXiv:1604\.03168, 2016\.
- \[4\]S\. Anwar, K\. Hwang, and W\. Sung\.Fixed point optimization of deep convolutional neural networks for object recognition\.IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\), pages 1131–1135, 2015\.
- \[5\]M\.Pietron, M\.Karwatowski, M\.Wielgosz, and J\.Duda\.Fast Compression and Optimization of Deep Learning Models for Natural Language Processing\.2019\.
- \[6\]A\. Mishra and E\. Nurvitadhi\.Wrpn: Wide reduced\-precision networks\.ICLR, 2018\.
- \[7\]Jongsoo Park\., Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey\.Faster CNNs with Direct Sparse Convolutions and Guided Pruning\.2016\.
- \[8\]Y\. Zhang, Z\. Y\. Dong, W\. Kong, and K\. Meng\.A composite anomaly detection system for data\-driven power plant condition monitoring\.IEEE Transactions on Industrial Informatics, 2019\.
- \[9\]E\. Park, J\. Ahn, and S\. Yoo\.Weighted\-entropy\-based quantization for deep neural networks\.IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\), 2017 July\.
- \[10\]Marcin Pietron, Dominik Zurek, and Bartlomiej Sniezynski\.Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit\-width reduction, volume 67\.2023\.
- \[11\]Philipp Gysel\.Ristretto: Hardware\-oriented approximation of convolutional neural networks\.2016\.
- \[12\]Markus Nagel, Rana A\. Amjad, Mart van Baalen, Christos Louizos, and Tijmen Blankevoort\.Up or down? adaptive rounding for post\-training quantization\.Proceedings of ICML, 2020\.
- \[13\]D\. Zhang, J\. Yang, D\. Ye, and G\. Hua\.Learned quantization for highly accurate and compact deep neural networks\.arXiv:1807\.10029, 2018\.
- \[14\]S\. Jung, C\. Son, S\. Lee, J\. Son, Y\. Kwak, J\.J\. Han, and C Choi\.Joint training of low\-precision neural network with quantization interval parameters\.arXiv:1808\.05779, 2018\.
- \[15\]M\. D\. McDonnell\.Training wide residual networks for deployment using a single bit for each weight\.ICLR, 2018\.
- \[16\]K\. Xu, J\. An D\. Zhang, L\. Liu, L\. Liu, and D\. Wang\.GenExp: Multi\-objective pruning for deep neural network based on genetic algorithm\.2021\.
- \[17\]R\. Miikkulainen, J\. Liang, E\. Meyerson, A\. Rawal, D\. Fink, O\. Francon, B\. Raju, H\. Shahrzad, A\. Navruzyan, N\. Duffy, and B\. Hodjat\.Evolving deep neural networks\.CoRR abs/1703\.00548, Mar 2017\.
- \[18\]Daniel Dimanov, Emili Balaguer\-Ballester, Colin Singleton, and Shahin Rostami\.Moncae: Multi\-objective neuroevolution of convolutional autoencoders\.ICLR, Neural Architecture Search Workshop, 2021\.
- \[19\]Hidehiko Okada\.Neuroevolution of autoencoders by genetic algorithm\.International Journal of Science and Engineering Investigations, 6:127–131, 2017\.
- \[20\]Kamil Faber, Marcin Pietron, and Dominik Zurek\.Ensemble neuroevolution\-based approach for multivariate time series anomaly detection\.Entropy, 23\(11\), November 2021\.
- \[21\]Yuxiang Zhou, Lejian Liao, Yang Gao, Rui Wang, and Heyan Huang\.Topicbert: A topic\-enhanced neural language model fine\-tuned for sentiment classification\.IEEE Transactions on Neural Networks and Learning Systems \(Early Access\), 2021\.
- \[22\]Youngmin Ro, Jongwon Choi, Byeongho Heo, and Jin Young Choi\.Rollback ensemble with multiple local minima in fine\-tuning deep learning networks\.IEEE Transactions on Neural Networks and Learning Systems \(Early Access\), 2021\.
Evolutionary fine tuning of quantized convolution-based deep learning models

Similar Articles

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

Need Info on quality benchmarks to run on DeepSeek V3.2 different quant levels [D]

Universal statistical signatures of evolution in artificial intelligence architectures

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Extensions and limitations of the neural GPU

Submit Feedback

Similar Articles

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]
Need Info on quality benchmarks to run on DeepSeek V3.2 different quant levels [D]
Universal statistical signatures of evolution in artificial intelligence architectures
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
Extensions and limitations of the neural GPU