Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 2.0.4

Bug #3498 XML container: several add() produce encoding junk
Submitted: 2005-02-16 20:01 UTC
From: ojai at nerim dot net Assigned: quipo
Status: Closed Package: Translation2
PHP Version: 4.3.10 OS: Linux Debian
Roadmaps: (Not assigned)    

 [2005-02-16 20:01 UTC] ojai at nerim dot net
Description: ------------ Hi, When calling several times add() the encoding conversion routine is called over and over, so that already utf8 converted data is converted again and again... It produces so horrible junk inside the XML file that I can't paste in here ;-) Actually there are two bugs : - when save_on_shutdown is set to false, the _data property is victim or redundant conversion performed by _convertLangEncodings() and _convertEncodings(). Indeed, these method are called by _saveData() which itself is called several times. The problem here comes from the fact that encoding conversion is performed "in-place", in the _data property - when save_on_shutdown is set to true, this problem should not happen, because _saveData() is supposed to be called only once. But the current implementation of _scheduledSaving() actually register it as a shutdown function several times... So that encoded data is reencoded as well. Here's a patch that fix these issues. I attempted to reduce memory usage by using a buffer (the _dataBuffer property) instead of returning converted data from convert[Lang]Encodings() : It applies fine on the current CVS. Cheers -- Olivier Guilyardi


 [2005-02-17 13:11 UTC] quipo
mmm... I don't really understand it, when save_on_shutdown is set to false the encoding is done once, because _scheduleSaving() calls _loadFile() again after the file has been updated. So data is encoded from memory to the file, but it is decoded again from the file to memory. At least my tests tell that there's no problem here. Instead, when save_on_shutdown is set to true, the register_shutdown_function is called many times indeed, and I'll fix that. Can you send me a reproducing script for the first case? The testsuite should cover it, but maybe I left sth out. TIA -- Lorenzo Alberton
 [2005-02-17 13:42 UTC] ojai at nerim dot net
You're right, I did observe the reencoding bug with save_on_shutdown set to true, but I assumed it for save_on_shutdown set to false by simply reading the code. I didn't see the _loadFile() call in _scheduleSaving()... But, my patch provide a performance enhancement : why loadFile()'ing everytime _saveData() is called ? That's very heavy : serializing + unserializing each time. With save_on_shutdown set to false, it already takes a pretty long time for my script to perform about 10 add()'s, and I'm using the bufferized version as my patch provide. However, having thought about it, I don't really like this _dataBuffer property I created. I think it would be better to optionally pass a buffer reference to the convert*Encodings() method : function _convertEncodings($direction, &$buffer=null) { <snip> if ($buffer) { $data =& $buffer; } else { $data =& $this->_data; } foreach ($data['pages'] as $page_id => $page_content) { <snip> }
 [2005-02-17 18:07 UTC] ojai at nerim dot net
Here's an updated patch to optimize saving when save_on_shutdown is set to false : - it removes the need to reload data after saving it - it does not use a _dataBuffer property anymore, but a buffer reference which is passed to the convert*Encodings() methods - it does not contain the "_isScheduledSaving" fix anymore, since I see that you implemented that in cvs
 [2005-02-21 16:50 UTC] quipo
This bug has been fixed in CVS. In case this was a documentation problem, the fix will show up at the end of next Sunday (CET) on In case this was a website problem, the change will show up on the website in short time. Thank you for the report, and for helping us make PEAR better. -- I finally found a moment to look at this one. Tested and committed! It works smoothly, many thanks.