2014-12-18

UTF8MB4 character set in Amazon RDS MySQL server and SQLAlchemy client

I've recently had to move a massive dataset that includes UTF-8 strings which contains extended set code points (i.e. planes other than the Basic Multilingual Plane, including Emoji) into a MySQL database hosted using Amazon's RDS.

Even though all of the databases and tables were configured to use purely utf8 character set and the unicode_ci collation, and though SQLAlchemy was also configured to use UTF8, I quickly ran into issues:

Incorrect string value: '\xF0\x9D\x8C\x86' for column 'column_name' at row 1 

The solution was:
  1. Read Mathias Bynens' awesome tutorial: The things we do to store U+01F4A9 PILE OF POO () correctly 
  2. After you created a backup of your current database, change the MySQL server settings via the Parameter Groups section of the RDS console:
    • Click "Create DB Parameter Group" 
    • Choose the correct Group Family (probably mysql5.6)
    • Input a group name and description (probably "mysql5.6-utf8mb4" and "MySQL 5.6 using UTF8MB4 charset by default"
    • Select this new Parameter Group in the console and click "Edit Parameters". Set the following parameter values:
      character_set_client: utf8mb4
      character_set_database: utf8mb4
      character_set_results: utf8mb4
      character_set_connection: utf8mb4
      collation_connection: utf8mb4_unicode_ci
      collation_server: utf8mb4_unicode_ci
      character_set_server: utf8mb4
      and click "Save Changes".
    • Go to the "Instances" dashboard, right click your RDS instance and "Modify" it, change the "Parameter Group" option to your newly created Parameter Group and click "Continue", "Modify DB Instance".
    • You can "Reboot" the instance if you want to be extra sure the new configuration was loaded.
  3. (Optional) Change the MySQL client settings. For the CLI mysql client, edit /etc/mysql/my.cnf and under [client] add:
    [client]
    default-character-set = utf8mb4
    This is to allow proper viewing of data using the mysql tool.
  4. Modify your existing databases, tables and columns to use UTF8MB4, as explained in the tutorial in part 1.
  5. Modify your SQLAlchemy connection string from:
    mysql+mysqldb://user:password@host:port/dbname
    to:
    mysql+mysqldb://user:password@host:port/dbname?charset=utf8mb4
    (whether or not to add use_unicode=0 is left for the programmer's discretion.)
Enjoy.

2013-03-15

Extracting WhatsApp message logs from a WhatsApp database

Just a small update.

After returning from my long trip abroad, having used WhatsApp profusely during the time there, I had lots of conversations I wanted to export to a message log and store for future, older days. ("Wow, I can't believe she wrote me THAT! ;)")

These chat logs included lots of media (images, audio clips and videos), which I also obviously wanted to keep. I used an Android phone, so I looked for an existing solution in the Google Play store. I found an applicaton called WhatsApp to Text which is quite nice, but fails to offer the option of exporting the media. I found no other solution in the Play store.

I then looked on-line for another solution. The team of D. Cortjens, A. Spruyt and W.F.C. Wieringa from the University of Amsterdam have published the results of a research project titled WhatsApp Database Encryption. Based on the results of this project, a Python script called WhatsApp Xtract was coded, to allow the generate of WhatsApp messages logs - this time with all media intact. Great.

Only thing was, some features weren't working correctly. Media wasn't automatically detected in some cases, the generated log files were humungous, the names of contacts were sometimes not displayed, there was no way to see all media files (like in the actual WhatsApp software), it supposedly was able to repair corrupt databases (and salvage what it can from them) but didn't really, and generally it didn't satisfy my requirements.

So, I updated it a bit:
v2.5 (updated by Alon Diamant - Mar 14, 2013)
- Improved encrypted Android database detection and decryption code
- Can now repair malformed Android databases (depends on availability of sqlite3 executable)

v2.4 (updated by Alon Diamant - Mar 06, 2013)
- Generates media index file - but crappily, we should set this up better..

v2.3 (updated by Alon Diamant - Mar 05, 2013)
- now generates separate file for each contact
- fixed file search to search for image files in days prior to date given (to fix a bug where because of timezone differences the image file exists but is not found)
- fixed message counts for contacts
- does not list contacts with 0 messages
- now writes version number in generated files
- (Android Version) displays WhatsApp name (server based) if no display name is found for a contact
- (Android Version) Supports Python 2.6
It still is medicore, and I am not happy with the way it works nor the way it is coded, but it'll serve for now. I do feel like coding an Android application to do this properly on the phone, with well formatted output files that include all media. We'll see.

For now, enjoy. You can also find updated in the project repository.

2009-10-29

CLI definition language

Hey,

I'm sure you've had the occasion on which you developed an interesting application, and now required the ability to control it from the command line. The standard C/C++ functions for implementing various switches and input is nothing fun to work with, and the various template and composition based solutions are too complex most of the time.

Check out this interesting project:
http://www.codesynthesis.com/projects/cli/

You define the options you want in a very, very simple definition language, compile it into a C++ class file and voila: you can use it in your main() function.

Nice.

2009-10-28

Catastrophic failure (Exception from HRESULT : 0x8000FFFF (E_UNEXPECTED))

Are you trying to use an ActiveX (OCX) control in a .NET application?

Did you add the reference, create a class and are quite happy at how easy it was?

So, now you try and run the application and CRASH - Catastrophic failure!!!

The problem is that an ActiveX control requires special handling when embedded into a .NET application. The easiest way to do this is:

1. When in the Form Designer, open the Toolbox window and right click one of the tabs.
2. Click "Choose Items..." menu option
3. Select the "COM Components" tab
4. Find your control in the list and add it.
5. Drag the control to your form.

This will create various new Interop classes, including special event handlers for your events, et cetera. It should work now.

Hurrah!!

2009-10-22

map/set iterators incompatible

So, you've been working on some piece of code that employs STL containers, specifically a map. Perhaps you've also added some multi-threaded flavoring to pack a punch. Great.

Suddenly, you see this:

Debug Assertion Failed!

Program: C:\....\Test.exe
File: C:\Program Files\Microsoft Visual Studio 8\VC\include\xtree
Line: 293

Expression: map/set iterators incompatible
Strange. What just happened? Did you just mix the iterators of different classes together? No, you're certain you used map iterators, not set iterators. Everything SEEMS well...

Ah, that's right. You have an iterator that points to the map's items, and are indeed iterating over them. The map is protected by a mutex, but - so as to not block for too long - you release and reacquire the mutex every iteration. Another thread then executes and clear()'s the map, rendering your iterator useless. You now compare it to the end() iterator, and voila - crash!

The solution? Make sure you don't invalidate your iterators while using them. That means - don't modify the STL container from another thread (even if you are using synchronization mechanisms) if you're not absolutely certain that you can and that you will not invalidate existing iterators. Operations like insert, erase and clear can invalidate iterators - read the details at each container's documentation page.

2009-08-16

Winsock error 10049 with no apparent reason

Hey all,

So, it's been a while since I last updated. I assume I am experiencing the way too common "early enthusiasm leads to early laziness" syndrome.

Without further a do - a lovely issue I had to face.

On certain machines - all Vista SP1 machines, we thought - certain outgoing TCP connections to well known HTTP servers (www.google.com, www.facebook.com, www.microsoft.com etc) simply failed.

The failure returned a last error code 10049 (0x2741) which means "The requested address is not valid in its context". This usually happens when someone tries to use a non-existent local IP address as the socket address, or when someone tries to connect to an invalid remote computer.

I wrote a test program and reproduced the behavior. I discovered that, oddly, using ADDR_ANY works perfectly - but binding to a specific interface doesn't work.

The test program seemed to work everywhere, including Vista SP2 machines. I found no mention of this behavior anywhere on-line. I disabled the Windows Firewall and the on-board ESET NOD32 3.0 Business anti-virus, and double checked that everything was turned off. It still failed. I was stumped.

I posted on alt.programming.winsock asking about this:



Hello all,

I have been witnessing some very strange behavior of Winsock on
Windows Vista SP1, and would like to share my findings to see if
anyone could help me figure out the answer to what's wrong.

I've been running a simple program which tests the behavior of Winsock
when it comes to binding a UDP socket, binding a TCP socket and
binding and then connecting a TCP socket with a wildcard interface and
a specific interface.

On my Windows XP SP3, running as an administrator, I get the following
results:

----------------------------------------------------------------------------------------

Testing bind() UDP with 0.0.0.0:0...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() UDP with 192.168.2.110:0...
+++ Success!!!
Testing bind() UDP with 192.168.2.110:1024...
+++ Success!!!
Testing bind() UDP with 192.168.2.110:32033...
+++ Success!!!
Testing bind() UDP with 192.168.2.110:55301...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:0...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() TCP with 192.168.2.110:0...
+++ Success!!!
Testing bind() TCP with 192.168.2.110:1024...
+++ Success!!!
Testing bind() TCP with 192.168.2.110:32033...
+++ Success!!!
Testing bind() TCP with 192.168.2.110:55301...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:0...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() AND connect() TCP with 192.168.2.110:0...
+++ Success!!!
Testing bind() AND connect() TCP with 192.168.2.110:1024...
+++ Success!!!
Testing bind() AND connect() TCP with 192.168.2.110:32033...
+++ Success!!!
Testing bind() AND connect() TCP with 192.168.2.110:55301...
+++ Success!!!
Press any key to continue . . .

----------------------------------------------------------------------------------------

The same results appear in Windows Vista SP2, being run as an
Administrator and as a standard user. Windows firewall is disabled, or
an exception is added, in all cases.

Windows Vista SP1, with firewall disabled, eSET NOD32 antivirus
disabled and no other special software I can think of gives me this
result, however:

----------------------------------------------------------------------------------------

Testing bind() UDP with 0.0.0.0:0...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() UDP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() UDP with 192.168.2.24:0...
+++ Success!!!
Testing bind() UDP with 192.168.2.24:1024...
+++ Success!!!
Testing bind() UDP with 192.168.2.24:32033...
+++ Success!!!
Testing bind() UDP with 192.168.2.24:55301...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:0...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() TCP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() TCP with 192.168.2.24:0...
+++ Success!!!
Testing bind() TCP with 192.168.2.24:1024...
+++ Success!!!
Testing bind() TCP with 192.168.2.24:32033...
+++ Success!!!
Testing bind() TCP with 192.168.2.24:55301...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:0...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:1024...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:32033...
+++ Success!!!
Testing bind() AND connect() TCP with 0.0.0.0:55301...
+++ Success!!!
Testing bind() AND connect() TCP with 192.168.2.24:0...
--- connect() failed: 10049 (The requested address is not valid in its
context.).
Testing bind() AND connect() TCP with 192.168.2.24:1024...
--- connect() failed: 10049 (The requested address is not valid in its
context.).
Testing bind() AND connect() TCP with 192.168.2.24:32033...
--- connect() failed: 10049 (The requested address is not valid in its
context.).
Testing bind() AND connect() TCP with 192.168.2.24:55301...
--- connect() failed: 10049 (The requested address is not valid in its
context.).

----------------------------------------------------------------------------------------

I can't seem to understand why this happens only on Windows Vista SP1,
with a specific *CORRECT* interface.

The program code is attached after this post.

Does anyone have any clue as to what is causing this?

Thanks,

Alon




I wrote an ICMP ping client and discovered that ping worked properly on all machines - including Vista SP1. Strange - is this something to do with specific ports?

I received a response in the newsgroups - something about Vista SP1's anti-malware logic. I was so confused at this point that I actually started looking this up, but naturally - I found nothing.

Then I discovered that a clean Vista SP1 does not exhibit this issue at all - it worked properly.

I started searching for TCP/IP stack hooks, or for NDIS filter drivers - I was looking everywhere for answer, really.

I found nothing!

My final move was to recheck the firewall and anti-virus. I was amazed to discover that shutting down the ESET NOD32 anti-virus and disabling all active protection doesn't disable the HTTP filtering module.

This module, which basically blocks all outgoing connections from or to ports 80, 3128 and 8080 from all non-approved "browser" applications, was the cause of all my problems!

I discovered this issue in only one other place on-line, at yoics.

ESET themselves explain this on their own page, but they don't say anything about what behavior a programmer should expect to see if their HTTP filter is active.

Basically - you can can create a socket, you can bind(), but if you try and connect to an address with port 80/3128/8080 using a socket bound to a specific interface - Winsock error 10049 is thrown. This is as if the address is invalid. Confusing!

Right now I'm looking at anti-virus/firewall detection possibilities - including WMI. This might help us notify our users as to what actions they must take to unblock our application with their anti-virus "flavor".

I hope this post will help others, somewhere, somehow.