Introduction To Android Programming
Android relies on the Dalvik Virtual Machine which uses Dalvik bytecode (named after the fishing village of Dalvík in Eyjafjörður, Iceland where some of the creators' ancestors lived according to Wikipedia). Developers write their programs in Java and the Android SDK converts it into Dalvik machine code. The main reasons for using Java as an intermediate language (and why it does not run as Java bytecode on the device) are obvious, developers do not have to learn yet another programming language, Android requires no permission from Sun Microsystems (or Oracle) and a lot of the bloat in the Java VM can be removed. Dalvik is also based on registers rather than a stack like the Java VM.
.dex is the file extension of a Dalvik Executable, this would normally sit inside a .apk file (Android PacKage). APK files typically include other details such as the AndroidManifest.xml which lists how the program appear's on the launcher and what permissions the program requires. If a program tries to call a method which needs a permission not listed in the manifest it will generate a SecurityException which if not correctly handled will lead to a "Force Quit".
Acquiring the APK
I am currently aware of 5 methods of acquiring APK files.
- Running an Android emulator on your computer with a ROM that contains the Android Market Place.
- Plugging in a Wireless Router to a wireless Windows computer and using Internet Connection Sharing while running Wireshark on your computer capturing traffic on the ethernet port.
- ARP Spoof your router using Cain & Abel and then capturing the traffic using Wireshark.
- Downloading it on your phone and then using the ASTRO File Manager to save the files to the SD Card, then copying them to the PC.
- Writing a program to pretend to be a phone and connecting to the market place directly and an article such as "REVERSING ANDROID MARKET PROTOCOL" may provide further help as well as a Wordpress plugin for displaying application details and its source.
Decompilation to Dalvik bytecode
Now to extract the Dalvik bytecode from the DEX file one can use the decompiler that is built into the SDK, but this ultimately does not produce output that is easy to read, so there is also Baksmali which with this tool you could even convert it back to a Dalvik Executable using Smali.
This produces output like this.
I have seen examples of people doing this to extend the functionality of existing applications such as Howto enable Gmail notifications for all new mail with smali/baksmali.
Another alternative is DeDexer, though this does not have an assembler.
There is also a tool to decode the manifest and other resources contained within the APK such as XMLs and PNGs called APKTool.
From Google Groups:
"classes.dex" inside. The classes.dex is optimized by the package
manager on first use, and ends up in /data/dalvik-cache/.
"System" apps have the DEX optimization performed ahead of time. The
resulting ".odex" file is stored next to the APK, the classes.dex is
removed from the APK, and the whole thing works without having to put
more stuff in your /data partition.
The optimized DEX files cannot easily be converted back to unoptimized
DEX, and I'm not sure there's any benefit in doing so. Both kinds of
DEX files can be examined with "dexdump".
More detail can be found in dalvik/docs/dexopt.html in the source
tree, or on the web at:
Both DeDexer and baksmali have limited support for ODEX files.
Decompilation to Java bytecode
Given Java code converts to Dalvik machine code it did not take very long for someone to write a Dalvik machine code to Java bytecode converter. In fact there are 2.
UNDX (lack of download link, developer notified) and Dex2Jar.
The UNDX decompiler appeared to be missing some support for some of the Dalvik op(eration) codes so there is an unofficial fork at GitHub.
Decompilation to Java source code
Once you have Java bytecode you can convert it to Java source by using a Java disassembler such as JD-GUI.
Finally applications can be decompiled to a variety of formats right up to the original Java source code. Generally the less processing done on the file the more success you will have in trying to decompile it (i.e. it is easier to go to Dalvik bytecode than to go back to source code as you are relying on less processes going wrong).
Sources & Further Reading: