This repository was archived by the owner on Nov 8, 2018. It is now read-only.
-
Couldn't load subscription status.
- Fork 167
This repository was archived by the owner on Nov 8, 2018. It is now read-only.
Keras and dist-keras results differ #81
Copy link
Copy link
Open
Description
I am trying to build LSTM model on time-series data. I am using MinMaxScaler to change range of features and target variable, then I reshaped data into 3d [samples, timestep, dimensions]
Then I created a neural network model with 3 lstm layers. And after training, I am calculating r2 score on test data.
Same things I have done using dist-keras. But I am getting different results.
mxscaler_f = MinMaxScaler(inputCol='features', outputCol="features_normalized")
mxscaler_model_f = mxscaler_f.fit(dataset)
dataset = mxscaler_model_f.transform(dataset)
mxscaler = MinMaxScaler(inputCol='target', outputCol="adjclose_min")
mxscaler_model = mxscaler.fit(dataset)
dataset = mxscaler_model.transform(dataset)
dataset = dataset.select("features_normalized", "label_index", "adjclose_min")
dataset.cache()
raw_dataset = dataset
nb_features = len(raw_dataset.select("features_normalized").take(1)[0]["features_normalized"])
timesteps = 1
dimension = nb_features
reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (timesteps, dimension))
raw_dataset = reshape_transformer.transform(raw_dataset)
train_len = int(0.7 * raw_dataset.count())
training_set = sqlContext.createDataFrame(raw_dataset.head(train_len), raw_dataset.schema)
test_set = raw_dataset.subtract(training_set)
optimizer = 'adagrad'
loss = 'mse'
model = Sequential()
model.add(LSTM(80, input_shape=(1,nb_features), return_sequences=True))
model.add(LSTM(70, return_sequences=True))
model.add(LSTM(50 , return_sequences=False))
model.add(Dense(1, kernel_initializer='uniform', activation='relu'))
trainer = SingleTrainer(keras_model=model, loss=loss, worker_optimizer=optimizer,
features_col="features_normalized",label_col="adjclose_min", num_epoch=20, batch_size=512)
trained_model = trainer.train(training_set)
test_set = test_set.select("matrix", "adjclose_min", "label_index")
predictor = ModelPredictor(keras_model=trained_model, features_col="matrix")
test_set = predictor.predict(test_set)
newone = test_set.rdd.map(extract).toDF(["adjclose_min","label_index","pred"])
evaluator = RegressionEvaluator(metricName='r2', predictionCol="pred", labelCol="adjclose_min")
score = evaluator.evaluate(newone)
What wrong I am doing?
Metadata
Metadata
Assignees
Labels
No labels