JScanner: An Open-Source Java Malware Defense Tool

Analyzing the bytecode instructions of a Java program could aid in the identification and separation of the malicious from the non-malicious. The virus scanners of today are not capable of detecting malicious Java programs due to their inability to interpret Java Bytecode. This project demonstrates how to identify malicious Java programs by matching their bytecode instructions to the instructions that the user defines as threatening.

While researching, it was found that Java Bytecode was the instruction set that tells the Java Virtual Machine how to execute a Java program. Java programs can be written differently to accomplish a common goal, but their bytecode instructions will not be identical. Virus scanners have three basic methods of scanning: Signature, Behavioral, and Generic. These methods would not work against Java programs unless they were previously defined as malicious.

To get an understanding of Java Bytecode instructions, the Java Virtual Machine Specification has to be read. The bytecode instructions that represent method invocations were interesting. Method invocations were what caused a Java program to perform a malicious action. It was decided that the best way to detect malicious Java programs were to identify what method invocations were occurring.

Research proves that bytecode analysis is a very efficient way of detecting malicious Java programs. JScanner detected 100% of analyzed malicious Java programs while the 57 commercial antivirus products of VirusTotal, failed to detect the programs. If implemented, JScanner could protect the vulnerable 3 billion devices that run Java, from attack.

Close

Athlete, idealist, and programmer, I have always had an interest in computers at a young age. My first program was written in batch at the age of 8 after my home computer was infected with a fork-virus. Although scary, the virus really caused me to ask many questions about computers and their programs. At age 10, I found myself modifying flash game variables with Visual Basic. By the time I turned 15, I was already programming in multiple languages and experimenting with web, network, and operating system security. Now 17, I believe that using my knowledge to better mankind, will mold me into becoming a better individual.

I am currently living in Killeen, Texas and am a senior at Ellison High School. This upcoming Fall, I plan to attend the University of Alabama to major in Computer Science. My dream is to create widely used programs, that will enhance the security of computers. Steve Jobs has inspired me because his creation of the iPhone is used by and benefits the lives of millions. That is the kind of impact that I would like my work to make. I want to live knowing that my work has aided others in achieving their goals.

Winning the Google Science Fair would help pay for college and allow me to expand my knowledge by communicating with their computer programmers and scientists.

Close

What is this? What does it do? Should I run this? Is this Java program safe to run?

figure2_java.jpeg

These are just a few questions that users ask when they are confronted by an unknown Java applet prompt, or come across unknown compiled Java programs (i.e. class/jar files).

Can Java Bytecode analysis be used to help the user decide if the presented Java program is safe to run? Respectively, can Java Bytecode analysis be used to detect malicious Java programs?

If Java Bytecode instructions which represent malicious method invocations (e.g., deleting files, running commands on the command line interface) are discovered, then a program could be crafted to detect these method invocations. This approach towards detecting malicious Java programs is better known as reverse-engineering. It is a widely used by software engineers and security professionals to understand how an unknown program operates.

Exactly 4 Open-Source malicious Java programs will be created, tested, and their bytecode instructions, analyzed. These malicious programs will not only demonstrate just how easy it is to create them, but also show how dangerous a few lines of code can be. Then, JScanner and the 57 antivirus products of VirusTotal will be used to scan these malicious Java programs. Finally, the results will be put into a spreadsheet to see how JScanner fares against the antivirus products of VirusTotal.

Close

Java is a programming language that was designed to create programs that could be ran on any operating system. It is currently running on over 3 billion devices in the world. Despite its flexibility to run on any operating system, it was the cause of 91% of all cyber attacks in 2013. Even with all efforts to release patches and updates, exploit percentage dropped 34% in 2014. Virus scanners have great difficulty detecting malicious Java programs. Still today, Java remains a huge security issue. The Department of Homeland Security urges users to disable Java “unless it is absolutely necessary”.

In order for a Java program to be ran on the computer of a user, it must first be compiled by the Java compiler. The compiler then generates an instruction set known as Java Bytecode, which is then executed by the Java Virtual Machine (JVM). Java Bytecode is interesting because there can be thousands of different bytecode instruction combinations that will accomplish a common goal. Meaning a programmer may write a program, a thousand different ways. Not to re-mention that the bytecode is executed exactly the same way, no matter what operating system a user is using.

The JVM Specification provides examples, explanations, and tutorials on how Java Bytecode works. While reading the Specification, method invocation bytecode instructions were discovered. The opcodes found in the instructions are as follows: INVOKEVIRTUAL, INVOKEDYNAMIC, and INVOKESTATIC. Opcodes specify what operations need to be performed by the JVM. These instructions tell the JVM how to call specific methods within a Java program. These methods could contain functionality to do thing such as: delete files, connect to servers, run commands on the native command line interface, etc.

Virus scanners have 3 different methods of scanning. The first is called a signature scan. It identifies viruses by a static string sequence of bytes. These sequence of bytes can range from specific code patterns, to names of particular variables. The second is a behavioral scan. Viruses are identified by abnormal actions in code such as: making attempts to modify system files, reformat the hard drive, or adding administrative users onto the computer. The last method of scanning is called a generic scan. Generic scans identify viruses by appearance. These appearances are based off of file hashes, names, sizes, or locations. Much like signature scans, generic scans have to have history or past experiences of the virus that it is looking for. It became very clear that virus scanners have great difficulty detecting malicious Java programs because they simply have no way to create signatures of their code.

After searching for information on how programmers could be able to interact with Java Bytecode in real-time, the ObjectWeb ASM application programming interface (API) was discovered. The ObjectWeb ASM API is a Java library that enables programmers to manipulate bytecode with ease. This means that representation of particular bytecode instructions could be made. Making a Java Malware Defense Tool look not only possible, but promising.

Close

Materials

  • Laptop or Computer

  • Eclipse Luna

  • Java Runtime Environment and Java Development Kit

  • ObjectWeb ASM API

  • Jsoup API

  • Apache Commons API

  • PircBot API

  • JNativeHook API

  • Java Virtual Machine Specification

  • VirusTotal

Experimentation, Part I.

  1. Understand that many Java programs could be written differently to accomplish a common goal, but their bytecode instructions would not be identical. Meaning similarities between the bytecode instructions of the different Java programs would have to be identified.

  2. Created and tested 4 malicious Open-Source Java programs. These programs are:

    1. JStrokeClient, a keylogger client that records keystrokes at specified time intervals and delivers them to a designated JStrokeServer.
       

    2. JStrokeServer, a keylogger server that receives recorded keystrokes from a JStrokeClient. It then creates a text file with the hostname of the JStrokeClient computer and stores the keystrokes in that text file.
      JStroke (3).png

    3. JManager, a remote administration tool that connects to a specified Internet Relay Chat to receive chat messages as commands. It then performs actions based off of commands given. These commands can allow the attacker to do things ranging from getting simple information about the host computer (i.e. operating system name, home directory path), to downloading files from the internet, and running commands on the native command line interface.
      JManager.png

    4. JWorm, a worm that replicates by finding executable jar files on the host computer, and injecting itself into the found jar files. Once infected, if executed by the user, the jar files will continue the replication function.
      JWorm(1).png

  3. Analyzed the 4 malicious Java programs and was able to identify bytecode instruction method invocations that matched the functionality of each program.

Experimentation, Part II.

  1. Developed JScanner to scan and if necessary, safely execute Java applets, Jar files, and Class files for Java Bytecode instructions that the user selects. JScanner has a neat and easy to use graphical user interface (GUI). Under the “Tools” menu the user can decide to scan an applet, application for specified bytecode instructions, or computer for files with jar or class extensions. They also have the option to safely execute an applet or application in a Virtual Machine. If chosen, the execute function will hook into specified applet or application and list all interactions with outside entities. This may be useful if the user is wanting to know in real-time, what the program is doing.

Screenshot from 2015-04-13 22:08:47.pngScreenshot from 2015-04-29 21:03:08.pngScreenshot from 2015-04-29 21:02:12.png

Algorithm(2).png

2. Finally, the 4 malicious Java programs were scanned with JScanner and 57 commercial antivirus products of VirusTotal. This was done to see how well JScanner fared against the antivirus products. In order to provide a little more detail of what actions the programs were performing, they were also executed through JScanner. The scan results were recorded and compared on a Google Sheets spreadsheet.

Close

Screenshot from 2015-04-23 18:08:57.png

When analyzing the bytecode instructions of the malicious Java programs, it was noticed that before every method invocation, there were a series of variables that were defined. These variables were occasionally sent to the method invocations as parameters. This helped identify what the method invocations were doing. For example, if a method invocation was performing a file operation, the parameters that were sent to it would specify the file path. The picture above is the bytecode instructions of a Java program written to delete a text file named “important.txt.” Notice how the bytecode instructions contain a variable with the path to the file “/home/desmond/Desktop/important.txt,” and a method invocation to delete it.

The raw results of each program (above) show that they are using different bytecode instructions to perform abnormal actions. After further investigation, it was decided that these programs are malicious. The programs are using common and uncommon application programming interfaces (APIs) to modify files and send data. Some of the data also proves that the programs are making attempts to run commands on the command line interface and modify multiple files.

The 57 commercial antivirus products of VirusTotal detected 0% of the malicious Java programs. JScanner detected 100% of the malicious Java programs when the user identifies bytecode instructions whose method invocations would perform the following actions:

  • Interacting with files

  • Executing Java code from external sources

  • Connecting to servers

  • Reading/Writing data from external sources

  • Running commands on the command line interface

The chart above accurately represents how well virus scanners detect malicious Java programs. The results from the chart are not limited to these virus scanners; however, there are many others who have the same detection inability. This proves that the virus scanners either have no signatures of the programs or do not understand the behaviors that the programs are performing. If by the time signatures and behaviors are both identified and dispatched from vendors to users in the next update of the antivirus software, it would be too late. Attackers would have already done the damage necessary to compromise the computers of users and big corporations. The devastating outcomes of these compromises to users and big corporations have been seen in recent news.

The 4 malicious Java programs are open-sourced and only demonstrate the most simplistic forms of what other malicious Java programs there are in the wild. Even if the vendors of the antivirus products made static signatures of these programs, an attacker could easily change one line of code or obfuscate the bytecode instructions to avoid detection.

Close

JScanner detected 100% of all malicious Java programs while the 57 commercial antivirus products of VirusTotal detected 0%. Method invocation patterns were discovered and a method to understand these patterns were programmed into JScanner. The commercial antivirus products of VirusTotal do not contain such a method which is why they were not able to detect any threats. These malicious Java programs are only basic examples of what actually exist in the real world. Even if the vendors were to make static signatures of these malicious Java programs, an attacker could easily divert detection. They could do this by obfuscating the bytecode instructions or changing one line of code in the program.

In this experiment, a Java program was developed to detect malicious Java programs by analyzing their bytecode instructions. Through the creation, testing, and bytecode instruction analysis of 4 malicious Java programs, all conditions of the hypothesis were proven true. Specifically:

  • Java Bytecode analyzation can be used to help identify malicious Java programs.

  • A Java program crafted to understand threatening bytecode instructions can be used to detect malicious Java programs.

  • Detection success rate is dependent upon the Java Bytecode instructions that the user defines as threatening.

Screenshot from 2015-04-13 22:12:06.png

The limitations of the detection rates are all based upon the user’s knowledge of Java code. If the user does not understand Java code, it may be difficult for them to select the threats to scan for. The “Threat Selection” graphical user interface allows users to select threats by Java class and method names. Although this selection interface is only understandable to those who program with Java, it is much better than having users to select specific Java Bytecode instructions to use in scanning. A screenshot of the “Threat Selection” graphical user interface is pictured above.

The execution function was implemented to give users who had little knowledge of Java code, an alternative. They are able to understand what a program is doing in plain English. This function should be ran in a Virtual Machine to protect the computer from harm.

In order to improve limitations on those users who do not understand anything about Java, a method to scan for threats in plain English is currently in development. This will enable users the ability to scan for threats by simply selecting text like, “scan for file interactions” or “scan for attempts to execute commands on the command line interface of the computer.”

The results concludes that, if implemented, JScanner could protect the 3 billion devices that run Java, from attack. Though it is a working product, there is always room for improvement. No program is perfect and the fact that it is open-sourced, will allow others to contribute their knowledge and ideas. Thus, giving the project the seed it needs to grow. A key question to ask for future work is, “Can this same method be used for the bytecode instructions of other programming languages?” If so, this could possibly enhance the abilities of the antivirus products today.

Close

Acknowledgements:

I would like to thank the following:

Dr. Albert Lilly, my sophomore and junior year computer science teacher. He helped brainstorm many computer science fair ideas and found competitions for me to compete in. Dr. Lilly has supported me since the beginning of this project on March 21st, 2013.

Mrs. Sarah Brewer, my sophomore and junior year mathematics teacher, for spending countless hours encouraging me, keeping me on track, providing different ways to better present the project, helped with the spelling and grammar, and asked questions that force me to explain my project in English. She introduced me to the Google Science Fair and this project would not be where it is today without her.

Ms. Lauren Chamberlain, my English teacher, for helping with the spelling and grammar of this project.

Dr. Jeff Gray, my adviser, for explaining the Java Virtual Machine to me. He also helped critique my spelling and grammar. He gave me the idea to keep a log of everything I have done with this project.

Mr. Watson Brown, my Cisco Networking teacher, for providing equipment and an isolated network to test my malicious Java programs.

The Alabama School of Mathematics and Science and Killeen Independent School District for allowing me to program and improve JScanner on their computers.

The Alabama Society of Professional Engineers for allowing me to present my project at their meeting held at The University of South Alabama. They judged my presentation and explained how it could be improved.

The Google Science Fair for providing this opportunity, and its past competitors for examples of winning projects.

My friends and family for supporting me throughout this entire project. Without them this project would not have been possible.

All of the programming, research, writing, and wording of this project was done by myself.

 

Application Programming Interfaces and other tools used in the development of this project:

 

Java Bytecode Obfuscators:

 

Github for source-code Management/Hosting:

 

Web/File Hosting Services:

 

VirusTotal and Scan Results:

 

Bibliography and References:

Close