Prepared for release

snowplow-archive · Apr 7, 2016 · ca05d7f · ca05d7f
1 parent 59be358
commit ca05d7f
Show file tree

Hide file tree

Showing 4 changed files with 36 additions and 25 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,12 @@
+Version 0.6.0 (2016-04-07)
+--------------------------
+Added force flag (#141)
+Fixed column ordering for Redshift tables across ADDITIONs (#135)
+Added SQL migrations between schema ADDITIONs (#134)
+Replaced argot with scopt (#124)
+Reimplemented --schema-by with proper Jackson converting (#125)
+Removed AWS-related tools from up.playbooks (#146)
+
 Version 0.5.0 (2016-02-11)
 --------------------------
 Bumped schema-ddl to 0.3.0 (#130)

diff --git a/README.md b/README.md
@@ -18,8 +18,8 @@ Schema Guru is used heavily in association with Snowplow's own **[Snowplow] [sno
 Download the latest Schema Guru from Bintray:
 
 ```bash
-$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_0.5.0.zip
-$ unzip schema_guru_0.5.0.zip
+$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_0.6.0.zip
+$ unzip schema_guru_0.6.0.zip
 ```
 
 Assuming you have a recent JVM installed.
@@ -33,31 +33,31 @@ You can use as input either single JSON file or directory with JSON instances (i
 Following command will print JSON Schema to stdout:
 
 ```bash
-$ ./schema-guru-0.5.0 schema {{input}}
+$ ./schema-guru-0.6.0 schema {{input}}
 ```
 
 Also you can specify output file for your schema:
 
 ```bash
-$ ./schema-guru-0.5.0 schema --output {{json_schema_file}} {{input}}
+$ ./schema-guru-0.6.0 schema --output {{json_schema_file}} {{input}}
 ```
 
 You can also switch Schema Guru into **[NDJSON] [ndjson]** mode, where it will look for newline delimited JSONs:
 
 ```bash
-$ ./schema-guru-0.5.0 schema --ndjson {{input}}
+$ ./schema-guru-0.6.0 schema --ndjson {{input}}
 ```
 
 You can specify the enum cardinality tolerance for your fields. It means that *all* fields which are found to have less than the specified cardinality will be specified in the JSON Schema using the `enum` property.
 
 ```bash
-$ ./schema-guru-0.5.0 schema --enum 5 {{input}}
+$ ./schema-guru-0.6.0 schema --enum 5 {{input}}
 ```
 
 If you know that some particular set of values can appear, but don't want to set big enum cardinality, you may want to specify predefined enum set with ``--enum-sets`` multioption, like this:
 
 ```bash
-$ ./schema-guru-0.5.0 schema --enum-sets iso_4217 --enum-sets iso_3166-1_aplha-3 /path/to/instances
+$ ./schema-guru-0.6.0 schema --enum-sets iso_4217 --enum-sets iso_3166-1_aplha-3 /path/to/instances
 ```
 
 Currently Schema Guru includes following built-in enum sets (written as they should appear in CLI):
@@ -76,15 +76,15 @@ If you need to include very specific enum set, you can define it by yourself in
 And pass path to this file instead of enum name:
 
 ```bash
-$ ./schema-guru-0.5.0 schema --enum-sets all --enum-sets /path/to/browsers.json /path/to/instances
+$ ./schema-guru-0.6.0 schema --enum-sets all --enum-sets /path/to/browsers.json /path/to/instances
 ```
 
 Schema Guru will derive `minLength` and `maxLength` properties for strings based on shortest and longest strings.
 But this may be a problem if you process small amount of instances. 
 To avoid this too strict Schema, you can use `--no-length` option.
 
 ```bash
-$ ./schema-guru-0.5.0 schema --no-length /path/to/few-instances
+$ ./schema-guru-0.6.0 schema --no-length /path/to/few-instances
 ```
 
 #### DDL derivation
@@ -96,7 +96,7 @@ Currently we support DDL only for **[Amazon Redshift] [redshift]**, but in futur
 Following command will just save Redshift (default ``--db`` value) DDL to current dir.
 
 ```bash
-$ ./schema-guru-0.5.0 ddl {{input}}
+$ ./schema-guru-0.6.0 ddl {{input}}
 ```
 
 If you specified as input a directory with several Self-describing JSON Schemas belonging to a single REVISION, Schema Guru will also generate a migrations.
@@ -119,13 +119,13 @@ so you can safely alter your tables while they belong to a single REVISION.
 You also can specify directory for output:
 
 ```bash
-$ ./schema-guru-0.5.0 ddl --output {{ddl_dir}} {{input}}
+$ ./schema-guru-0.6.0 ddl --output {{ddl_dir}} {{input}}
 ```
 
 If you're not a Snowplow Platform user, don't use **[Self-describing Schema] [self-describing]** or just don't want anything specific to it you can produce raw schema:
 
 ```bash
-$ ./schema-guru-0.5.0 ddl --raw {{input}}
+$ ./schema-guru-0.6.0 ddl --raw {{input}}
 ```
 
 But bear in mind that Self-describing Schemas bring many benefits. 
@@ -134,7 +134,7 @@ For example, raw Schemas will not preserve an order for your columns (it just im
 You may also want to get JSONPaths file for Redshift's **[COPY] [redshift-copy]** command. It will place ``jsonpaths`` dir alongside with ``sql``:
 
 ```bash
-$ ./schema-guru-0.5.0 ddl --with-json-paths {{input}}
+$ ./schema-guru-0.6.0 ddl --with-json-paths {{input}}
 ```
 
 The most embarrassing part of shifting from dynamic-typed world to static-typed is product types (or union types) like this in JSON Schema: ``["integer", "string"]``.
@@ -147,13 +147,13 @@ Another thing everyone need to consider is default VARCHAR size. If there's no c
 You can also specify this default value:
 
 ```bash
-$ ./schema-guru-0.5.0 ddl --varchar-size 32 {{input}}
+$ ./schema-guru-0.6.0 ddl --varchar-size 32 {{input}}
 ```
 
 You can also specify Redshift Schema for your table. For non-raw mode ``atomic`` used as default.
 
 ```bash
-$ ./schema-guru-0.5.0 ddl --raw --schema business {{input}}
+$ ./schema-guru-0.6.0 ddl --raw --schema business {{input}}
 ```
 
 Some users do not full rely on Schema Guru JSON Schema derivation or DDL generation and edit their DDLs manually.
@@ -171,9 +171,9 @@ $ ./schema-guru-0.6.0 ddl --force {{input}}
 You can access our hosted demo of the Schema Guru web UI at [schemaguru.snplowanalytics.com] [webui-hosted]. To run it locally:
 
 ```bash
-$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_webui_0.4.0.zip
-$ unzip schema_guru_webui_0.4.0.zip
-$ ./schema-guru-webui-0.4.0
+$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_webui_0.6.0.zip
+$ unzip schema_guru_webui_0.6.0.zip
+$ ./schema-guru-webui-0.6.0
 ```
 
 The above will run a Spray web server containing Schema Guru on [0.0.0.0:8000] [webui-local]. Interface and port can be specified by `--interface` and `--port` respectively.
@@ -198,7 +198,9 @@ $ cd sparkjob
 $ inv run_emr my-profile my-bucket/input/ my-bucket/output/ my-bucket/errors/ my-bucket/logs my-ec2-keypair
 ```
 
-If you need some specific options for Spark job, you can specify these in `tasks.py`. The Spark job accepts the same options as the CLI application, but note that `--output` isn't optional and we have a new optional `--errors-path`.
+If you need some specific options for Spark job, you can specify these in `tasks.py`. 
+The Spark job accepts the same options as the CLI application, but note that `--output` isn't optional and we have a new optional `--errors-path`.
+Also, instead of specifying some of predefined enum sets you can just enable it with `--enum-sets` flag, so it has the same behaviour as `--enum-sets all`.
 
 ## Developer Quickstart
 
@@ -279,7 +281,7 @@ Now just create a new Docker app in the **[Elastic Beanstalk Console] [beanstalk
 To produce it you need to specify vendor, name (if segmentation isn't using, see below), and version (optional, default value is 1-0-0).
 
 ```bash
-$ ./schema-guru-0.5.0 schema --vendor {{your_company}} --name {{schema_name}} --schemaver {{version}} {{input}}
+$ ./schema-guru-0.6.0 schema --vendor {{your_company}} --name {{schema_name}} --schemaver {{version}} {{input}}
 ```
 
 ### Schema Segmentation
@@ -312,7 +314,7 @@ and
 
 You can run it as follows:
 ```bash
-$ ./schema-guru-0.5.0 schema --output {{output_dir}} --schema-by $.event {{mixed_jsons_directory}}
+$ ./schema-guru-0.6.0 schema --output {{output_dir}} --schema-by $.event {{mixed_jsons_directory}}
 ```
 
 It will put two (or may be more) JSON Schemas into output dir: Purchased_an_Item.json and Posted_a_comment.json.
@@ -415,7 +417,7 @@ limitations under the License.
 [license-image]: http://img.shields.io/badge/license-Apache--2-blue.svg?style=flat
 [license]: http://www.apache.org/licenses/LICENSE-2.0
 
-[release-image]: http://img.shields.io/badge/release-0.5.0-blue.svg?style=flat
+[release-image]: http://img.shields.io/badge/release-0.6.0-blue.svg?style=flat
 [releases]: https://github.com/snowplow/schema-guru/releases
 
 [json-schema]: http://json-schema.org/

diff --git a/project/BuildSettings.scala b/project/BuildSettings.scala
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014 Snowplow Analytics Ltd. All rights reserved.
+ * Copyright (c) 2016 Snowplow Analytics Ltd. All rights reserved.
  *
  * This program is licensed to you under the Apache License Version 2.0,
  * and you may not use this file except in compliance with the
@@ -20,7 +20,7 @@ object BuildSettings {
   // Common settings for all our projects
   lazy val commonSettings = Seq[Setting[_]](
     organization          :=  "com.snowplowanalytics",
-    version               :=  "0.6.0-M1",
+    version               :=  "0.6.0",
     scalaVersion          :=  "2.10.6",
     crossScalaVersions    :=  Seq("2.10.6", "2.11.7"),
     scalacOptions         :=  Seq("-deprecation", "-encoding", "utf8",

diff --git a/sparkjob/tasks.py b/sparkjob/tasks.py
@@ -20,7 +20,7 @@
 from boto.emr.bootstrap_action import BootstrapAction
 
 DIR_WITH_JAR = "./target/scala-2.10/"
-JAR_FILE  = "schema-guru-sparkjob-0.6.0-M1"
+JAR_FILE  = "schema-guru-sparkjob-0.6.0-rc1"
 
 S3_REGIONS = { 'us-east-1': Location.DEFAULT,
                'us-west-1': Location.USWest,