Ikai Lan says

I say things!

Using the bulkloader with Java App Engine

with 32 comments

The latest release of the datastore bulkloader greatly simplifies import and export of data from App Engine applications for developers. We’ll go through a step by step example for using this tool with a Java application. Note that only setting up Remote API is Java specific – everything can be used with Python applications. Unlike certain phone companies, this is one store that doesn’t care what language your application is written in.

Checking for our Prerequisites:

If you already have Python 2.5.x and the Python SDK installed, skip this section.

First off, we’ll need to download the Python SDK. This example assumes we have Python version 2.5.x installed. If you’re not sure what version you have installed, open up a terminal and type “python”. This opens up a Python REPL, with the first line displaying the version of Python you’re using. Here’s example output:

Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

(Yes, Pythonistas, the version on my laptop is ooooooooold).

Download the Python SDK from the following link. As of the writing of this post, the newest version is 1.3.4: Direct link.

Unzip this file. It’ll be easier for you if you place this in your path. Linux and OS X users will append this in their ~/.bash_profile:

PATH="/path/to/where/you/unzipped/appengine:${PATH}"
export PATH

To test that everything is working, type

appcfg.py

You’ll see a page of usage commands that starts out like this:

Usage: appcfg.py [options] <action>

Action must be one of:
create_bulkloader_config: Create a bulkloader.yaml from a running application.
cron_info: Display information about cron jobs.
download_data: Download entities from datastore.
help: Print help for a specific action.
request_logs: Write request logs in Apache common log format.
rollback: Rollback an in-progress update.
update: Create or update an app version.
update_cron: Update application cron definitions.
update_dos: Update application dos definitions.
update_indexes: Update application indexes.
update_queues: Update application task queue definitions.
upload_data: Upload data records to datastore.
vacuum_indexes: Delete unused indexes from application.
Use 'help <action>' for a detailed description.

…. (and so forth)

Now we can go ahead and start using the bulkloader.

Using the bulkloader for Java applications

Before we can begin using the bulkloader, we’ll have to set it up first. Setting up the bulkloader is a three step process. We’ll need to:

1. Add RemoteApi to mapping
2. Generate a bulkloader configuration

Step 1: Add RemoteApi to our URI mapping

We’ll want to edit our web.xml. Add the following lines:

    <servlet>
        <servlet-name>RemoteApi</servlet-name>
        <servlet-class>com.google.apphosting.utils.remoteapi.RemoteApiServlet</servlet-class>
    </servlet>

    <servlet-mapping>
        <servlet-name>RemoteApi</servlet-name>
        <url-pattern>/remote_api</url-pattern>
    </servlet-mapping>

A common pitful with setting up RemoteApi is that developers using frameworks will use a catch-all expression for mapping URIs, and this will stomp over our servlet-mapping. Deploy this application into production. We’ll likely want to put an admin constraint on this.

Step 2: Generate a bulkloader configuration

This step isn’t actually *required*, but it certainly makes our lives easier, especially if we are looking to export existing data. In a brand new application, if we are looking to bootstrap our application with data, we don’t need this step at all. For completeness, however, it’d be best to go over it.

We’ll need to generate a configuration template. This step depends on datastore statistics having been updated with the Entities we’re looking to export. Log in to appspot.com and click “Datastore Statistics” under Datastore in the right hand menu.

If we see something that looks like the following screenshot, we can use this tool.

If we see something that looks like the screenshow below, then we can’t autogenerate a configuration since this is a brand new application – that’s okay, that means we probably don’t have much data to export. We’ll have to wait for App Engine’s background tasks to bulk update our statistics before we’ll be able to complete this step.

Assuming that we have datastore statistics available, we can use appcfg.py in the following manner to generate a configuration file:

appcfg.py create_bulkloader_config --url=http://APPID.appspot.com/remote_api --application=APPID --filename=config.yml

If the datastore isn’t ready, running this command will cause the following error:

[ERROR   ] Unable to download kind stats for all-kinds download.
[ERROR   ] Kind stats are generated periodically by the appserver
[ERROR   ] Kind stats are not available on dev_appserver.

I’m using this on a Guestbook sample application I wrote for a codelab a while ago. The only Entities are Greetings, which consists of a String username, a String comment and a timestamp. This is what my config file looks like:

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Greeting
  connector:

  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      import_transform: transform.key_id_or_name_as_string

    - property: content
      external_name: content
      # Type: String Stats: 7 properties of this type in this kind.

    - property: date
      external_name: date
      # Type: Date/Time Stats: 7 properties of this type in this kind.
      import_transform: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
      export_transform: transform.export_date_time('%Y-%m-%dT%H:%M:%S')

    - property: name
      external_name: name
      # Type: String Stats: 7 properties of this type in this kind.

We care about the connector. Replace that with the following:

- kind: Greeting
    connector: csv

We’ve only filled in the “connector” option. Now we have something we can use to dump data.

Examples of common usages of the bulkloader

Downloading data

We’ve got what we need to dump data. Let’s go ahead and do that now. Issue the following command:

appcfg.py download_data --config_file=config.yml --filename=data.csv --kind=Greeting --url=http://APPID.appspot.com/remote_api --application=APPID

We’ll be asked to provide our email and password credentials. Here’s what my console output looks like:

Downloading data records.
[INFO    ] Logging to bulkloader-log-20100609.162353
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10
[INFO    ] Opening database: bulkloader-progress-20100609.162353.sql3
[INFO    ] Opening database: bulkloader-results-20100609.162353.sql3
[INFO    ] Connecting to java.latest.bootcamp-demo.appspot.com/remote_api
2010-06-09 16:23:57,022 WARNING appengine_rpc.py:399 ssl module not found.
Without the ssl module, the identity of the remote host cannot be verified, and
connections may NOT be secure. To fix this, please install the ssl module from
http://pypi.python.org/pypi/ssl .
To learn more, see http://code.google.com/appengine/kb/general.html#rpcssl .
Please enter login credentials for java.latest.bootcamp-demo.appspot.com
Email: YOUR EMAIL
Password for YOUR_EMAIL:
[INFO    ] Downloading kinds: ['Greeting']
.[INFO    ] Greeting: No descending index on __key__, performing serial download
.
[INFO    ] Have 17 entities, 0 previously transferred
[INFO    ] 17 entities (11304 bytes) transferred in 10.5 seconds

There’s now a CSV file named data.csv in my directory, as well as a bunch of autogenerated bulkloader-* files for resuming if the loader dies midway during the export. My CSV file starts like this:

content,date,name,key
Hey it works!,2010-05-18T22:35:17,Ikai Lan,1
… (More lines of CSV)

The first line is a labeling line – this line designates the order in which properties have been exported. In our case, we’ve exported content, date and name in addition to Entity keys.

Uploading Data

To upload the CSV file back into the datastore, we run the following command:

appcfg.py upload_data --config_file=config.yml --filename=data.csv --url=http://APPID.appspot.com/remote_api --application=APPID --kind=Greeting

This’ll use config.yml and create our entities in the remote datastore.

Adding a new field to datastore entities

One question that is frequently asked in the groups is, “How do I migrate my schema?” This question is generally poorly phrased; App Engine’s datastore is schemaless. That is – it is possible to have Entities of the same Kind with completely different sets of properties. Most of the time, this is a good thing. MySQL, for instance, requires a table lock to do a schema update. By being schema free, migrations can happen lazily, and application developers can check at runtime for whether a Property exists on a given Entity, then create or set the value as needed.

But there are times when this isn’t sufficient. One use case is if we want to change a default value on Entities and grandfather older Entities to the new default value, but we also want the default value to possibly be null. We can do tricks such as creating a new Property, setting an update timestamp, checking for whether the update timestamp is before or after when we made the code change and update conditionally, and so forth. The problem with this approach is that it introduces a TON of complexity into our application, and if we have more than one of these “migrations”, suddenly we’re writing more code to lazily grandfather data and confusing the non-Cylons that work on our team. It’s easier to migrate all the data. So how we do this? Before the new application code goes live, we migrate the schema by adding the new field. The best part about this is that we can do this without locking tables, so writes can continue.

Let’s add a new String field to our Greeting class: homepageUrl. Let’s assume that we want to set a default to http://www.google.com. How would we do this? Let’s update our config.yml file to the following:

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Greeting
  connector: csv

  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      import_transform: transform.key_id_or_name_as_string

    - property: content
      external_name: content
      # Type: String Stats: 7 properties of this type in this kind.

    - property: homepageUrl
      external_name: homepageUrl

    - property: date
      external_name: date
      # Type: Date/Time Stats: 7 properties of this type in this kind.
      import_transform: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
      export_transform: transform.export_date_time('%Y-%m-%dT%H:%M:%S')

    - property: name
      external_name: name
      # Type: String Stats: 7 properties of this type in this kind.

Note that we’ve added a new property with a new external_name. By default, the loader will use a String.

Now let’s add the field to our CSV file:

content,date,name,key,homepageUrl
Hey it works!,2010-05-18T22:35:17,Ikai Lan,1,http://www.google.com
... (more lines)

We’d likely write a script to augment our CSV file. Note that this only works if we have named keys! If we had integer keys before, we’ll end up creating duplicate entities using key names and not integer IDs.

Now we run the bulkloader to upload our entities:

appcfg.py upload_data --config_file=config.yml --filename=data.csv --url=http://APPID.appspot.com/remote_api --application=APPID --kind=Greeting

Once our loader has finished running, we’ll see the new fields on our existing entities.

WARNING: There is a potential race condition here: if an Entity gets updated by our bulkloader in this fashion right as user facing code reads and updates the Entity without the new field, that will leave us with Entities that were grandfathered incorrectly. Fortunately, after we migrate, we can do a query for these Entities and manually update them. It’s slightly annoying, but far less painful than making bulkloader updates transactional.

Bootstrapping the datastore with default Entities

So we’ve covered the use case of using a generated config.yml file to update or load entities into the datastore, but what we haven’t yet covered is bootstrapping a completely new Entity Kind with never before seen data into the datastore.

Let’s add a new Entity Kind, Employee, to our datastore. We’ll preload this data:

name,title
Ikai Lan,Developer Programs Engineer
Patrick Chanezon,Developer Advocate
Wesley Chun,Developer Programs Engineer
Nick Johnson,Developer Programs Engineer
Jason Cooper,Developer Programs Engineer
Christian Schalk,Developer Advocate
Fred Sauer,Developer Advocate

Note that we didn’t add a key. In this case, we don’t care, so it simplifies our config files. Now let’s take a look at the config.yml we need to use:

python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Employee
  connector: csv

  property_map:

    - property: name
      external_name: name

    - property: title
      external_name: title

Now let’s go ahead and upload these entities:

$ appcfg.py upload_data --config_file=new_entity.yml --filename=new_entity.csv  --url=http://APPID.appspot.com/remote_api --kind=Employee
Uploading data records.
[INFO    ] Logging to bulkloader-log-20100610.151326
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10
[INFO    ] Opening database: bulkloader-progress-20100610.151326.sql3
[INFO    ] Connecting to APPID.appspot.com/remote_api
2010-06-10 15:13:27,334 WARNING appengine_rpc.py:399 ssl module not found.
Without the ssl module, the identity of the remote host cannot be verified, and
connections may NOT be secure. To fix this, please install the ssl module from
http://pypi.python.org/pypi/ssl .
To learn more, see http://code.google.com/appengine/kb/general.html#rpcssl .
Please enter login credentials for APPID.appspot.com
Email: your.email@gmail.com
Password for your.email@gmail.com:
[INFO    ] Starting import; maximum 10 entities per post
.
[INFO    ] 7 entites total, 0 previously transferred
[INFO    ] 7 entities (5394 bytes) transferred in 8.6 seconds
[INFO    ] All entities successfully transferred

Boom! We’re done.

There are still a lot of bulkloader topics to discuss – related entities, entity groups, keys, and so forth. Stay tuned.

About these ads

Written by Ikai Lan

June 10, 2010 at 2:52 pm

Posted in Uncategorized

32 Responses

Subscribe to comments with RSS.

  1. Hello Ikai,

    thank you for nice article. I try to follow this but I cant login to with appcfg.py script. It keeps saying Invalid username or password. But I can access the /remote_api with browser on my app. Any idea how to solve this?

    Thank you, Ladislav.

    Ladislav Skokan

    June 16, 2010 at 3:41 am

  2. Thanks Ikai for this helpful tutorial.
    What about the issue with OpenID authentication that Nick Johnson discusses for Python?

    http://blog.notdot.net/2010/06/Using-remote-api-with-OpenID-authentication

    Thanks again
    Lorenzo

    Lorenzo

    June 22, 2010 at 1:49 am

  3. I’m trying to use the bulkuploader for a java program but am running into an interesting issue.

    My PrimaryKey property is a Long, and in java I can explicitly give them id numbers and they show in the data store as “id=xxx”. When I download the data via the appcfg.py I get a reasonably looking data file. If I reupload the same file it actually inserts things into the data store with key “name=xxx” and therefore doubles every one of my entries.

    Do you know how I can tell appcfg.py to use the actual long value and not string value?

    Nate

    June 26, 2010 at 6:32 am

  4. BTW, for anyone who has my issue… The best answer is to create a custom uploader using the file upload example provided on appengine’s java FAQ.

    Nate

    June 29, 2010 at 4:35 pm

  5. are you using google apps for login. if yes, then you must access the website through a http://subdomain.domain.com where domain.com is your google apps domain.

    gaej

    July 12, 2010 at 3:49 am

  6. Hey nate (or anyone), I have this same issue (uploader always seems to give me named keys, not id-referenced keys) issue and am still struggling with it. Can you provide any more detail on how you solved it?

    J-So

    August 6, 2010 at 1:42 am

  7. Try to follow the step: appcfg.py create_bulkloader_config –url=http://APPID.appspot.com/remote_api –application=APPID –filename=config.yml

    Got an error because of authentication failed, the full trace here :

    [INFO ] Logging to bulkloader-log-20100806.140222
    [INFO ] Throttling transfers:
    [INFO ] Bandwidth: 250000 bytes/second
    [INFO ] HTTP connections: 8/second
    [INFO ] Entities inserted/fetched/modified: 20/second
    [INFO ] Batch Size: 10
    [INFO ] Opening database: bulkloader-progress-20100806.140222.sql3
    [INFO ] Opening database: bulkloader-results-20100806.140222.sql3
    [INFO ] Connecting to aboomba3.appspot.com/remote_api
    [ERROR ] Exception during authentication
    Traceback (most recent call last):
    File “/cygdrive/c/google/google_appengine/google/appengine/tools/bulkloader.py”, line 3171, in Run
    self.request_manager.Authenticate()
    File “/cygdrive/c/google/google_appengine/google/appengine/tools/bulkloader.py”, line 1180, in Authenticate
    remote_api_stub.MaybeInvokeAuthentication()
    File “/cygdrive/c/google/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py”, line 542, in MaybeInvokeAuthentication
    datastore_stub._server.Send(datastore_stub._path, payload=None)
    File “/cygdrive/c/google/google_appengine/google/appengine/tools/appengine_rpc.py”, line 346, in Send
    f = self.opener.open(req)
    File “/usr/lib/python2.5/urllib2.py”, line 387, in open
    response = meth(req, response)
    File “/usr/lib/python2.5/urllib2.py”, line 498, in http_response
    ‘http’, request, response, code, msg, hdrs)
    File “/usr/lib/python2.5/urllib2.py”, line 425, in error
    return self._call_chain(*args)
    File “/usr/lib/python2.5/urllib2.py”, line 360, in _call_chain
    result = func(*args)
    File “/usr/lib/python2.5/urllib2.py”, line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    HTTPError: HTTP Error 404: Not Found
    [INFO ] Authentication Failed

    joey

    August 6, 2010 at 1:15 pm

  8. I used a combination of uploading entire chunks of my data via FileUpload (see link below), and explicitly creating my Java objects with the keys that I wanted (which were easily implicitly defined by the data format as the first one would be ‘n’ and every object after it was n++). I would then insert the set of objects in bulk.

    The problem I hit the most was finding the right number of objects per store call. There are specific limits that make this process long and annoying. I ran something locally that would continue trying to upload the chunk of data until it got a good response from the server page.

    It took me something on the order of 6-8 hours to upload about 1.5M tiny objects.

    http://code.google.com/appengine/kb/java.html#fileforms

    Nate

    August 6, 2010 at 2:10 pm

  9. Hey Nate, thanks for the quick reply.

    I think you’re right in your approach – I can’t find a better way to do it.

    WIth my limited (and honestly, rather painful) experience with the bulk upload tool in v1.3.5 of the SDK, so far I’ve discovered:

    * You can only use it for uploading entities with named keys, numeric ids aren’t currently supported. A bunch of folks who are downloading data with numeric ids and then uploading it are discovering that their data is duplicated on upload, this time with named keys, and none of the transform functions seem to fix this. My eventual solution was to use this approach to convert a bunch of my entities to use named keys, and re-factor my code accordingly.

    * The CSV import adaptor is very sensitive to character encoding. As such it’s worth making sure that the data in your CSV conforms exactly to the encoding it is supposed to, and that the adaptor is looking at the right encoding.

    * Although the tool creates a progress db file, and allows you to restore partial imports should they fail, for some reason this only seems to run the next few hundred lines or so of the file you’re trying to import after the failure point and then ends. If you’re trying to import a large number of entities this can be pretty frustrating since it means you need to restart the whole process again. So chunking imports into reasonably sized batches is a good idea.

    Hope this helps someone avoid jumping through the hoops I had to. I would probably invest the time in nate’s approach in future.

    It’s a shame – the absence of a good suite of tools to work natively with data in the data store seems to be unnecessarily holding the whole platform back.

    J-So

    August 6, 2010 at 5:35 pm

  10. Two things.

    1. Many of the tools were written pre java-app-engine and should probably be revisited to account for the different use-cases that are available by using Java. (If Google hadn’t closed their Austin office I would’ve definitely done this in my free time…)

    2. I love hearing that I’m right =P.

    What I have suggested is probably the easiest way to get the work done (no more than 2 hours of coding when you’ve got your master plan) but timing out and long response times ensure that your app will not get more than one JVM for processing work. If you have more data and need faster turn around times you will likely want to upload to the blobstore and create tasks (from the Task API) that do very small amounts of data insertion and can be executed in parallel.

    Nate

    August 6, 2010 at 8:01 pm

  11. Found the reason, it’s because my app has restriction on who can access, if it’s limited to an authdomain, it might not work properly as it won’t even hint for email and pwd.

    joey

    August 7, 2010 at 11:14 pm

  12. Hi ,
    I have one doubt regrading uploading,you specified thai if we want to use Long/Key as primarykey(id) datatype than leave the configuration for that.Its works fine but My constarints is i have two entities for example ,In that two entities i used primarykeyfield value in another table.I define the relationship through coding level not table design ,is it possible to download and upload

    megala

    September 20, 2010 at 4:58 am

  13. Hi ,
    I have one doubt regrading uploading,you specified thai if we want to use Long/Key as primarykey(id) datatype than leave the configuration for that.Its works fine but My constarints is i have two entities for example ,In that two entities i used primarykeyfield value in another table.I define the relationship through coding level not table design ,is it possible to download and upload

    megala

    September 20, 2010 at 5:03 am

  14. Thank you for this very well done article. I hadn’t realized how easy the bulkloader tool would be, so I had been putting it off. I did run into trouble with one thing I had to go find a solution for:

    GAE restricts String datatype to 500 characters, so you’ll run into trouble if you have large text fields (as I did). The solution is to use the Text datatype, provided in Java by com.google.appengine.api.datastore.Text, for those fields. This will also change the generated config.yml, where those fields will have the line “import_transform: db.Text.” If you get the error Invalid code for import_transform. Code: “db.Text”. Details: name ‘db’ is not defined, then you need to add an import to the beginning of the config file. Find the python_preamble section and add the line
    – import: google.appengine.ext.db
    and that should do it. It did for me. Hope it helps, and thanks again for the walk-through Ikai.

    ASTX813

    November 8, 2010 at 2:48 pm

  15. Hi

    Is there a way I can find the size occupied by particular entities resulted from a query, using the Java Datastore Statistics API?

    The current API just provides me ways to find size occupied by entities grouped by kind and/or property type. I would like to know the size occupied by entities which have a particular value of a property. (For example, person entities who have gender(property) as “Male”(value of property).

    Thanks

    escapee

    November 22, 2010 at 11:09 am

  16. For Java import, instead of using the default import_transform statement, use “import_transform: transform.create_foreign_key(‘Kind’, key_is_id=True)”

    It has to be done for all your keys, primary and foreign. It works for me.

    Thanks for the blog post, it was very helpful.

    Wilson Lim

    December 26, 2010 at 6:50 pm

  17. How to fix the code so that you no longer error?

    conchi

    December 28, 2010 at 3:41 am

  18. Really great article, I wish Google would add this to their documentation on the bulk tool. I read through the google docs and had a hard time getting off the ground. This article explained things from the ground up in a very easy to understand manner. My thanks!

    error454

    February 24, 2011 at 6:28 pm

  19. By Google do you mean…me? (See my bio)

    Ikai Lan

    February 24, 2011 at 8:35 pm

  20. Yes, if you could add the *Using the bulkloader for Java applications* section to the existing documentation, I think it would be quite helpful.

    error454

    February 24, 2011 at 8:56 pm

  21. I’m trying to use the bulkloader
    After some successful elementary tests, i’ve added the following lines in my config:

    – property: observations

    external_name: observations

    import_transform: list_from_child_node(‘observations/p’, False)

    export_transform: child_node_from_list(‘p’)

    and I get a parsing error saying ‘list_from_child_node’ is not defined

    but in my installation list_from_child_node is defined in transform.py installed in the google_appengine\google\appengine\ext subdirectory of my AppEngine installation
    I’m searching to understand without success (I’m an experienced developer, but unexperienced Python guy)

    Moissinac

    February 26, 2011 at 2:46 am

  22. Reply to my previous post: the working syntax is:

    import_transform: transform.list_from_child_node(‘observations/p’, False)

    export_transform: Transform.child_node_from_list(‘p’)

    but the one erroneous that I used previously was the the one which is writed by the authar of the transform.py code

    Moissinac

    February 26, 2011 at 5:04 am

  23. Hi!

    First of all, great article! I followed it few months ago, and all worked like a charm. But now I have a problem.

    Recently, we moved our application to the High Replication Datastore. For this, we lost the app-id, created another one in the HR Datastore, and linked the old app-id to the new application, in order to have all the things like before.

    Well, yesterday we changed some fields in some Entities. The problem is that, if we generate the bulk loader file, we are getting the old “database schema”, it is, the old fields. We do not see the changes in the generated yaml file.

    Is this a “propagation” problem or something like that? Do we have to wait until this Python stuff (appcfg.py) can “see” the changes in the Datastore?

    Thanks in advance!!

    Imanol

    March 18, 2011 at 4:09 am

  24. Here is the most basic Java App that can be put onto App Engine as the “backup” version and used to access the remote_api. https://github.com/handstandtech/App-Engine-Java-remote_api

    Just upload it and don’t set it as the default version. Then you can access it via backup.YOURAPPSPOTID.appspot.com and when you run the bulk loader it won’t show up in your default version’s logs.

    Sam Edwards

    March 23, 2011 at 10:24 am

  25. Here is the most basic Java App that can be put onto App Engine as the “backup” version and used to access the remote_api. https://github.com/handstandtech/App-Engine-Java-remote_api

    Just upload it and don’t set it as the default version. Then you can access it via backup.YOURAPPSPOTID.appspot.com and when you run the bulk loader it won’t show up in your default version’s logs.

    Handstand Technologies

    March 23, 2011 at 10:25 am

  26. […] Using the bulkloader with Java App Engine – Ikai Lan says window.fbAsyncInit = function() { FB.init({appId: "135970219784621", status: true, cookie: true, xfbml: true}); }; (function() { var e = document.createElement("script"); e.async = true; e.src = document.location.protocol + "//connect.facebook.net/jv_ID/all.js"; document.getElementById("fb-root").appendChild(e); }()); カテゴリー: Google App Engine, java, クラウド, 開発環境 タグ: コメント (0) トラックバック (0) コメントをどうぞ トラックバックURL […]

  27. I guess this thing does not work with High Replication… It didn’t worked for my bootstrap entities.

    Eduardo Costa

    August 16, 2011 at 3:40 pm

  28. Hey, Really superb article…

    I followed all the steps…

    But when I am generating Config File….I am getting ProtocolBufferDecodeError, “Corrupted”…

    Please suggest me…how to solve this error….?

    Shya,

    October 8, 2011 at 1:53 am

  29. Hi
    Here are some additions : http://jannaud.fr/appengine about more complex situations, like specifying the entity key or doing data manipulation at upload time, … (use google translate or take a look at the code)

    Thomas

    April 28, 2013 at 2:37 am

  30. Hi,
    The above tutorial is for the AppEngine Python SDK.
    appcfg.py is not available for AppEngine Java SDK. In that case we have appcfg.cmd.
    appcfg.cmd does not have an upload_data argument as we have used here:
    python appcfg.py upload_data places.csv Place

    So the question is how do we upload data from an excel file to the datastore using appcfg.cmg?

    Vaibhav

    November 1, 2013 at 10:35 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s