FeatureEngineeringForTensorflow

It is different from research that there is volumes of data in production environment. Compared to single machine, Spark can work fine in this situation. At the same time, Tensorflow is an efficient tool to explore big data capability. This project is an example how to preprocess data using Spark and transform to TFRecord.

Building

sbt assembly

Running

spark-submit --class com.RawFeature2TFRecord.SingleCategoryFeature.Main \
 --master yarn-cluster \
 --num-executors 10 \
 --driver-memory 10g \
 --executor-memory 10g \
 --executor-cores 1 \
 --conf spark.kryoserializer.buffer.max=512m \
 --conf spark.yarn.maxAppAttempts=1 \
 --files $YOUR_SPARK_DIR/conf/hive-site.xml \
 ./Raw2TFRecord/target/scala-2.11/Raw2TFRecord-assembly-0.1-SNAPSHOT.jar \
 --htable_raw_feature $YOUR_HIVE_TABLE_FEATURE \
 --htable_fea_cls $YOUR_HIVE_TABLE_FEATURE_CLSS \
 --hdfs_tfrecord $YOUR_HDFS_TFRECORD

TABLE FORMAT

$YOUR_HIVE_TABLE_FEATURE is consist of label and features, where different features is separated by commas. $YOU_HIVE_TABLE_FEATURE_CLSS save all of feature classes, such as age, gender, and so on. To make it easier understanding, example as follow:

$YOUR_HIVE_TABLE_FEATURE

0  IW_gametest,LIFE_170010103,LOGIN_210002,AGE_29,GEND_2,PROV_311,CITY_31111,PLAT_iphone
1  AGE_29,GEND_2,PROV_311,CITY_31111,PLAT_iphone

where every feature is made up of feature class and value.

$YOU_HIVE_TABLE_FEATURE_CLSS

PLAT
PROV
AGE
CITY
IW
LIFE
LOGIN

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Raw2TFRecord/src/main/scala		Raw2TFRecord/src/main/scala
project		project
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FeatureEngineeringForTensorflow

Building

Running

TABLE FORMAT

About

Releases

Packages

Languages

AckClinkz/FeatureEngineeringForTensorflow

Folders and files

Latest commit

History

Repository files navigation

FeatureEngineeringForTensorflow

Building

Running

TABLE FORMAT

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages