Skip to content
Fabian Kessler edited this page Oct 16, 2016 · 3 revisions

Current memory use

Loading all 71 language profiles uses 74MB ram to store the data in memory.

How to reduce

Use less languages

The more language profiles you load, the more memory is used. But careful, if you remove languages that might occur, you will get bad detection results.

Change data structure in the library

The List<LanguageProfile> uses 26.74MB ram. But that is an intermediate state only. Once the NgramFrequencyData is created, the profiles can be garbage collected (don't keep a reference). Using String.intern() when loading the language profiles reduces the size to 19.12 MB, but in the NgramFrequencyData it makes no difference because every string (n-gram) exists only once in a HashMap. So never mind that.

At runtime the memory is consumed by the NgramFrequencyData class, the Map<String, double[]> wordLangProbMap.

In a quick test I have replaced this map with a Map<String, float[]>. That means 32 bit per float instead of 64 bit per double. 71 language profiles means the array has 71 values. With this, the memory consumption went down from 74.05 MB to 43.25 MB, that is 58% of the original only. All unit tests passed. Float would be precise enough. It just never was a consideration to save a few bytes (megabytes).

Another option is to use the Trove Java collections library instead of a JDK HashMap. Then a char[] could be used as map key instead of an expensive String. This means either including another library, or copy-pasting some code. See http://trove.starlight-systems.com/overview there is an example with char[] and CharArrayStrategy implements TObjectHashingStrategy.

Java 9 will reduce the amount of memory consumed by strings.

Is this necessary

For Servers

I believe that for most users, this is not a consideration. Most apps run on servers with plenty of ram.

If you still get charged an arm and a leg for ram, you may want to consider Hetzner, a German host, my employer running http://www.nameapi.org/ is a satisfied customer.

For example this machine https://www.hetzner.de/de/hosting/produkte_rootserver/px61ssd 64 GB DDR4 ECC, Intel® Xeon® E3-1275 v5 Quad-Core Skylake, 2 x 480 GB SSD, for EUR 70 monthly. If you're an outside EU customer you get the 19% VAT deducted. No affiliation.

For Android

If you run the language detector on mobile devices, then you may want to look at the fork from user eclectice at https://github.com/eclectice/language-detector (gradle, short text profiles) or another version of the original software as Maven multi module project https://github.com/rmtheis/language-detection

I Fabian have no experience with Android.

How to measure

Use the memory-measurer from https://github.com/DimitrisAndreou/memory-measurer

  1. download the object-explorer.jar
  2. add the jar to the language-detector software project in your IDE In IntelliJ: click File, Project structure, Libraries, green + sign, select the jar from your disk
  3. when running, add this to the VM options: -javaagent:/path/to/object-explorer.jar

This code loads all profiles, creates the NgramFrequencyData, then measures and prints.

import objectexplorer.MemoryMeasurer;
import objectexplorer.ObjectGraphMeasurer;

@Test
public void testMemory() throws IOException {
    List<LanguageProfile> languageProfiles = new LanguageProfileReader().readAllBuiltIn();
    NgramFrequencyData ngramFrequencyData = NgramFrequencyData.create(
            languageProfiles,
            NgramExtractors.standard().getGramLengths() //that is 1, 2 and 3-grams
    );

    assertEquals(languageProfiles.size(), 71);
    int totalGramsAllProfiles = 0;
    for (LanguageProfile languageProfile : languageProfiles) {
        totalGramsAllProfiles += languageProfile.getNumGrams();
    }
    assertEquals(totalGramsAllProfiles, 281920);

    measureAndPrint(languageProfiles);
    measureAndPrint(ngramFrequencyData);
}

private void measureAndPrint(Object o) {
    long memory = MemoryMeasurer.measureBytes(o);
    ObjectGraphMeasurer.Footprint footprint = ObjectGraphMeasurer.measure(o);

    System.out.println("Bytes: "+memory);
    System.out.println("Kilobytes: "+String.format("%.2f", (memory/(double)1024)));
    System.out.println("Megabytes: "+String.format("%.2f", (memory/(double)1024/1024)));
    System.out.println(footprint);
}

The values I got on 2016-10-07 are:

Bytes: 77646712
Kilobytes: 75826.87
Megabytes: 74.05
Footprint{Objects=461627, References=723918, Primitives=[double x 8189850, char x 309517, int x 346202, float]}