Ever wanted a Java Scratchpad or a way to run Java Snippets?

February 3rd, 2016

I quite often find myself wanting to write quick snippets of java. Similar to little bash scripts.

This usually means I have to:
- launch eclipse
- start a new project
- create a new class
- write public static void main
- try to remember why I started eclipse…..

Well no more. I decided we could do things better. I present:

JPad - It lets you run little snippets of java code and instantly see the result rendered (hopefully) beautifully as either html tables or in a console mode..

Let me know what you think? Do you find it useful? Can we make it better? Definitely!

Apologies but there’s lots of bugs and we need to improve result presentation)

java

Julia Programming Language

November 22nd, 2015

Recently I’ve been looking into Julia, below are my somewhat random jumbled thoughts so far. I’ve also put together a function listing for julia that should help beginners. I should mention upfront that I actually really like the idea of Julia and that it has a lot of promise.

1. The long time taken for packages/recompilation is annoying and it failed on me a few times.

2. It just annoys me that the index starts at 1 - different than most other languages.

3. One of the most successful modules in python is pandas. Why not take this further and make DataFrames a full part of julia i.e. Make nulls, dataframes and DataArray the builtin for all vectors. If julia is going to be for manipulating data, there are always going to be nulls. Why have the duplication between julia arrays and DataFrames, make them one and the same.

4.Has a number of builtin functions that duplicate functionality available in just as short a form:


fill(0.,3)
zeros(3)
ones(3)

They add little and pollute the global namespace. Ken Iverson gives a wonderful talk on reducing a language to the minimal core here:
https://www.youtube.com/watch?v=8kUQWuK1L4w

5. There are many small bugs, inconsistencies and irritations such as:

Should super()/subtypes error on anything? because it does.


julia> super(bb)
ERROR: MethodError: `super` has no method matching super(::Regex)
julia> subtypes(bb)
ERROR: MethodError: `subtypes` has no method matching subtypes(::Regex)
julia> super("2")
ERROR: MethodError: `super` has no method matching super(::ASCIIString)

cd silently fails and isn’t cross platform.


julia> cd("/temp/")
julia> homedir()
"C:\\Users\\DAVE"

It should at least throw an error. Even better would be if julia supported forward slashes regardless of platform. Ensuring julia scripts are cross platform.


run(pipe(`echo world` & `echo hello`, `sort`))

The run command is weird, not like other languages at all. It uses backtick to enclose the command, supposedly to allow easy copy pasting
but what’s the point of that when you cant use piping within a single command, so you will need to break out each command separately anyway. Rather than this string interpolation I could just have used join() myself and have less surprise. This interpolation will also possibly cause issues with user entered commands. (A bit more browsing reveals there is a significant number of people find this a more powerful technique)

Writing to a file isn’t visible. This is very non-intuitive:


julia> f = open("a.txt", "w")
IOStream(<file a.txt>)
julia> write(f, "hello!")
6
julia> readall(f)
""
julia> f = open("a.txt")
IOStream(<file a.txt>)
julia> readall(f)
"hello!"

If possible I’d prefer not to have to even specify “w”, rather when its used by a function,
depending if its read or write, those functions handle the underlying details.

Uncategorized

Html Documentation Generator for JOpt

April 18th, 2014

JOpt Simple is a Java library for parsing command line options. If anyone else is looking for an html documentation generator for jopt here it is HtmlHelpFormatter.java.

We use it in sqlChart - our sql database command line chart generator.

Option (* = required) Description
-?, --help Display a help message and exit.
-D, --database <db_name> The database to use.
-H, --height <output_height> Set the height of the chart output (default: 300)
*-P, –port <port_num> The TCP/IP port number to use for the SQL Server connection.
-W, --width <output_width> Set the width of the chart output (default: 400)

java

Books: Of Mice and Men, Wild, Out of Sight

January 4th, 2014

Of Mice and Men - John Steinbeck

OfMiceAndMen

7/10 Good book though more of a short story really. Not sure why the book is so well known. It would open discussion for a lot of topics, mental health, loneliness, aspirations, desires..so maybe a good reading group book or for kids in school.

Cheryl Strayed - Wild

Wild Cheryl Strayed

8/10

I’ve often wanted to hike the Appalachian Trail which is what made me read this book. It’s actually more a journey of self discovery for a trouble lady that lost her mum. But it is really funny and when I looked it up, it turns out the author was reunited with her half sister when her sister read the book!

Out of Sight - Elmore

6/10 Not a thought provoking book but an above average crime/romance page turner. Was actually turned into a hollywood film with Geroge Clooney, unfortunately near the end I realised that I had seen the film before so kind of spoiled the ending.

personal

Kdb Training

February 25th, 2013

We’ve been working really hard on new video guides to Kdb which covers briefly some of the topics available in the kdb training courses. Some of the articles to help learning kdb include:

Feedback or questions on the articles is more than welcome.

KDB , , , ,

Signing exe and jars in an ant build

February 19th, 2013

Easiest way to sign your java jars and exes from ant:

1. First some prep to speed things up:
- set your domains whois information to match your business
- register with a phonebook company: yell.com or 192.com
This means when they ask for it later, you already have it done.

2. Cheap keys are available at tucows author: https://author.tucows.com/certs.php
This article details the steps and downloading the key (use IE at all times):

3. Once you have it, use these instructions to export it as a file:

4.For signing jars:
This guide steps you through: http://blog.pengoworks.com/index.cfm/2008/2/27/Using-a-PKCS12-key-to-for-signing-JAR-files

5. For signing .exes get JSign and add it’s ant task.

java , , , ,

Best Method of Creating Windows Installer for Java Programs.

January 23rd, 2013

Requirements

  • Installer that adds link to start menu and file associations to windows
  • Running program appears as an exe and has its’ own file name in windows task manager
  • Use the built in JRE where possible or ask user to download if not available (bundling considered too large)

The finished installer can be seen at http://www.timestored.com/qStudio/

Solution

The best solution I found was launch4j and innosetup

launch4j

Launch4j was the only wrapper I found that let me keep a jar and have multiple exe’s that pointed to the same jar. Allowing me to have two programs but not double the size. One config to keep in mind when doing this is to use the chdir option with . to ensure paths work out properly.

InnoSetup

the tricky part with inno setup was getting the file associations to work. Here’s what I used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 
[Icons]  
Name: "{group}\qStudio"; Filename: "{app}\qStudio.exe"
Name: "{group}\sqlDashboards"; Filename: "{app}\sqlDashboards.exe"
 
[Tasks]  
Name: qAssociation; Description: "Associate .q extension"; GroupDescription: File extensions:
Name: kAssociation; Description: "Associate .k extension"; GroupDescription: File extensions:
 
[Registry]
Root: HKCR; Subkey: ".q"; ValueType: string; ValueName: ""; ValueData: "qKDBsource"; Flags: uninsdeletevalue; Tasks: qAssociation 
Root: HKCR; Subkey: ".k"; ValueType: string; ValueName: ""; ValueData: "qKDBsource"; Flags: uninsdeletevalue; Tasks: kAssociation 
Root: HKCR; Subkey: "qKDBsource"; ValueType: string; ValueName: ""; ValueData: "q KDB source"; Flags: uninsdeletekey;
Root: HKCR; Subkey: "qKDBsource\DefaultIcon"; ValueType: string; ValueName: ""; ValueData: "{app}\QFILE.ICO"
Root: HKCR; Subkey: "qKDBsource\shell\open\command"; ValueType: string; ValueName: ""; ValueData: """{app}\qStudio.exe"" ""%1"""

Combining into an ANT script

You want to

  1. Compile your JAR
  2. Create exes for both programs that point to the jar
  3. Place the exe’s into an installer

Like so (you need to install inno setup):

1
2
3
4
5
6
7
 
<launch4j configFile="src/main/resources/launch4j-qstudio.xml" />
<launch4j configFile="src/main/resources/launch4j-sqldashboards.xml" />
 
<exec dir="." executable="${INNO.SETUP.DIR}">
	<arg line="/cc '${basedir}\src\main\resources\windowsInstall.iss'" />
</exec >

Finished Installer

Can be found at http://www.timestored.com/qStudio/

java , , , , , ,

Onto New Things

January 23rd, 2013

Very excited to move onto new work at TimeStored.com
Working on a GUI for KDB called qStudio
I’ll try to post more java stuff here but I’ll hopefully be doing a lot more q article posting on on TimeStored.

Uncategorized

q-grams for fuzzy string matching in KDB

January 31st, 2011

To see if two strings are similar we can break the strings up into qgrams and count how many match between the two. Padding at the front and back is done so that start/end letters have a chance to cause a match without the whole start/end needing to match:


"Elvis" -> "##E" "#El" "Elv" "lvi" "vis" "is%" "s%%"
"Jarvis" -> "##J" "#Ja" "Jar" "arv" "rvi" "vis" "is%" "s%%"
i.e. Three would match here.

What's particularly good about this algorithm is that it easily adapted for use in databases. Here is a version in q/SQL . q is a language created by KX and is tightly tied to a database called KDB+, a trial version is available. The code is based on this paper .

Using .qgrams.cache[] function in q we can see how it works:

q)a:("Donald Knuth";"Leslie Lamport";"Ken Thompson";"Rob Pike";"Linus Torvalds";"Richard Stallman";"Tim Berners-Lee";"Alan
 Turing";"John Von Neumann";"Brian Kernighan";"Grace Hopper";"Ada Lovelace";"Edsger Dijkstra";"Jon Von Neumann";"Herb Simo
n";"Kenneth Iverson");
q).qgrams.cache[3;a]
qstr| pid pn
----| --------
##D | ,0  ,0
#DO | ,0  ,1
DON | ,0  ,2
ONA | ,0  ,3
NAL | ,0  ,4
ALD | 0 4 5 12
LD  | ,0  ,6
D K | ,0  ,7
KN  | ,0  ,8
KNU | ,0  ,9
NUT | ,0  ,10
UTH | ,0  ,11
TH% | ,0  ,12
H%% | ,0  ,13
##L | 1 4 0 0
#LE | ,1  ,1
LES | ,1  ,2
ESL | ,1  ,3
SLI | ,1  ,4
LIE | ,1  ,5
..

A list of strings where the first one was “Donald Knuth” has been broken up.
qstr - is the string broken up
pid - is which string it originally belonged to
pn - is the position of the qgram within the original string

As you can see if we had two tables with qgrams, we could simply join on the qstr column and count the number of overlaps for given strings (pid’s). The pn column can be used if we want to use a “window”to ensure that strings that match occur near each other.

Functions are provided to do this for you, the results are sorted by matchR a crude similarity scoring system (number of matches / length of shortest string):

C:tokyom>q
KDB+ 2.7 2010.11.30 Copyright (C) 1993-2010 Kx Systems
w32/ 2()core 3036MB Admin anonymous 192.168.178.64 PLAY 2011.02.28
 
q)l qgrams.q
q)a:("Donald Knuth";"Leslie Lamport";"Ken Thompson";"Rob Pike";"Linus Torvalds";"Richard Stallman";"Tim Berners-Lee";"Alan
 Turing";"John Von Neumann";"Brian Kernighan";"Grace Hopper";"Ada Lovelace";"Edsger Dijkstra";"Jon Von Neumann";"Herb Simo
n";"Kenneth Iverson");
q) / 3 = qgram size     5 = window size
q).qgrams.fuzzyMatch[3; 5; ("Ken Iverson";"Jonny Newmann"); a]
matchR    qid pid matches strA            strB
------------------------------------------------------------
1         0   15  11      "Ken Iverson"   "Kenneth Iverson"
0.6363636 0   2   7       "Ken Iverson"   "Ken Thompson"
0.6153846 1   13  8       "Jonny Newmann" "Jon Von Neumann"
0.5384615 1   8   7       "Jonny Newmann" "John Von Neumann"
0.2       0   14  2       "Ken Iverson"   "Herb Simon"
0.1538462 1   5   2       "Jonny Newmann" "Richard Stallman"
q)

Assuming that one table will be extremely large and that you’ll be checking a small subset against that, the functions also allow for caching of one of the datasets. Here I have used the person data set from DBpedia. The functions to parse this file are included in the qgrams.q. In fact all example code is included at the bottom of the qgrams.q file commented out.

e.g.

q)a: distinct value loadDBpeople[`:persondata_en.nt]
q)count a
48089
q)\t .qgrams.fuzzyMatch[3; 2; ("isac newton";"albert einstine"); a];
2140
q)ca:.qgrams.cache[3;a]
q)\t .qgrams.fuzzyMatchUsingCache[ 2; ("isac newton";"albert einstine"); a; ca]
421
q)7#.qgrams.fuzzyMatchUsingCache[ 200; ("richie feynmann";"albert einstine"); a; ca]
matchR    qid pid   matches strA              strB                           ..
-----------------------------------------------------------------------------..
2.4       1   14225 36      "albert einstine" "Nyron nilbert gilbert filbert ..
0.8666667 1   8     13      "albert einstine" "Albert Einstein"              ..
0.8181818 1   35952 9       "albert einstine" "Albert Lane"                  ..
0.7777778 1   4010  7       "albert einstine" "Albert II"                    ..
0.75      1   30825 3       "albert einstine" "Wine"                         ..
0.7333333 0   860   11      "richie feynmann" "Richard Phillips Feynman"     ..
0.7272727 1   81    8       "albert einstine" "Albert Pike"                  ..

I’d say there are a number of major speed improvements could be made to the code and the matchR scoring system could definitely be improved. If you do have any good code changes or suggestions feel free to contact me and I’ll post any improved versions here.

KDB , , , , ,

fast java prime number generation

June 7th, 2009

So I found some old prime number generation code and decided to spruce it up. I also wanted to benchmark the BitSet vs a boolean array as I had previously had issues with a boolean array. Six methods tried:

Prime Generators - Source Download

  1. Simp - Simply check for all x’s 0 to maxPrime if x is prime. ie check x isn’t divisible by any number between 2 to x-1.
  2. SimpOdd - same as above, but add 2 to list of known primes, then check all x’s between 0 and maxPrime but stepping in 2’s. And only check for divisibles between 2 to x/2.
  3. S - Sieve - Sieve of Eratosthenes
  4. SS - SkippingSievePG - improve speed by taking advantage that even numbers are never prime except 2. Therefore when sieving using the prime 3, instead of sieving 6,9,12,15,18,21 we can actually sieve 9,15,21 ie jump double our prime each time.
  5. OSS - OddSkippingSievePG - Same as above BUT improve speed/memory making the boolean array represent odd numbers instead of all numbers.
  6. BSOSS - BitSetOddSkippingSievePG - Same as above but use BitSet instead of boolean array


Seconds taken to generate primes between 0 to 10,000,000

speed

Speed of simp’s were so slow as to be unusable for finding large numbers of primes, ie 1000’s of times slower than sieves.


Memory required to generate primes between 0 to 10,000,000

memory

Notice 1,120,000 Bytes needed to store arraylist of integers.

Size of boolean arrays / BitSets in Java

boolean arrays in java use 1 byte per true/false value. BitSets use 1 bit per true/false value (may depend on OS/VM). However BitSets are slower to access. This explains why BitSetOddSkippingSievePG was slower than OddSkippingSievePG but required less memory.

Use a sieve - to generate primes fast

Further Improvements

The idea of reducing the array size by letting the array represent odd numbers could be generalized. Further speed increases are also possible. If you want to know more see wheel factorization or segmented sieves. If you code a quicker prime generator please let me know.

Uncategorized